linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fadvise syscall?
@ 2002-03-17  8:39 Jeff Garzik
  2002-03-17  8:56 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Jeff Garzik @ 2002-03-17  8:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

Has anyone ever done an madvise(2)-type syscall for file descriptors?
(or does the capability exist and I'm missing it?)


I was thinking, in playing around with stuff like cp(1) I've found that 
standard read(2) and write(2) of a 4-8K buffer is the fastest solution 
overall, in addition to providing the useful side effect of better error 
reporting, such as ENOSPC report.  Better error reporting than the 
alternative I see anyway, mmap(2).

So... we have madvise, why not fadvise?  I would love the capability for 
applications to provide hints to the OS like madvise, but for file 
descriptors...

    Jeff




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:39 fadvise syscall? Jeff Garzik
@ 2002-03-17  8:56 ` Andrew Morton
  2002-03-17  9:10   ` Jeff Garzik
                     ` (2 more replies)
  2002-03-17 15:13 ` Ken Hirsch
  2002-03-17 17:14 ` Anton Altaparmakov
  2 siblings, 3 replies; 41+ messages in thread
From: Andrew Morton @ 2002-03-17  8:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, linux-fsdevel

Jeff Garzik wrote:
> 
> Has anyone ever done an madvise(2)-type syscall for file descriptors?
> (or does the capability exist and I'm missing it?)

Well, question is: is madvise() any use? :)

> I was thinking, in playing around with stuff like cp(1) I've found that
> standard read(2) and write(2) of a 4-8K buffer is the fastest solution
> overall, in addition to providing the useful side effect of better error
> reporting, such as ENOSPC report.  Better error reporting than the
> alternative I see anyway, mmap(2).

4k to 8k is best on x86 at least.  And if you're actually going to *use*
each byte in the file, the zero-copy characteristics of mmap aren't
worth much at all.
 
> So... we have madvise, why not fadvise?  I would love the capability for
> applications to provide hints to the OS like madvise, but for file
> descriptors...

The one hint which I can think of which would be beneficial would
be an equivalent to MADV_SEQUENTIAL.  Something which says "this
is a big streaming read/write - don't go and evict other stuff because
of it".  O_STREAMING perhaps.  Or working dropbehind heuristics,
although I suspect that explicit controls will always do better.

For MADV_RANDOM, readahead window scaling should get that right.

What else were you thinking of?

-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:56 ` Andrew Morton
@ 2002-03-17  9:10   ` Jeff Garzik
  2002-03-17 23:59     ` Anton Altaparmakov
  2002-03-17 13:41   ` Anton Altaparmakov
  2002-03-17 20:18   ` Richard Gooch
  2 siblings, 1 reply; 41+ messages in thread
From: Jeff Garzik @ 2002-03-17  9:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

Andrew Morton wrote:

>Jeff Garzik wrote:
>
>>Has anyone ever done an madvise(2)-type syscall for file descriptors?
>>(or does the capability exist and I'm missing it?)
>>
>
>Well, question is: is madvise() any use? :)
>
:)

>>was thinking, in playing around with stuff like cp(1) I've found that
>>standard read(2) and write(2) of a 4-8K buffer is the fastest solution
>>overall, in addition to providing the useful side effect of better error
>>reporting, such as ENOSPC report.  Better error reporting than the
>>alternative I see anyway, mmap(2).
>>
>
>4k to 8k is best on x86 at least.  And if you're actually going to *use*
>each byte in the file, the zero-copy characteristics of mmap aren't
>worth much at all.
>

That's exactly what I found through experimentation.

>>So... we have madvise, why not fadvise?  I would love the capability for
>>applications to provide hints to the OS like madvise, but for file
>>descriptors...
>>
>
>The one hint which I can think of which would be beneficial would
>be an equivalent to MADV_SEQUENTIAL.  Something which says "this
>is a big streaming read/write - don't go and evict other stuff because
>of it".  O_STREAMING perhaps.  Or working dropbehind heuristics,
>although I suspect that explicit controls will always do better.
>
>For MADV_RANDOM, readahead window scaling should get that right.
>
>What else were you thinking of?
>

Hints for,
* sequential read
* sequential write
* sequential write, where the application considers the data it's 
writing to be unlikely to be read again any time soon (hopefully 
implying to the page cache that these pages have low value as cacheable 
objects)
* some sort of streaming hints, implying that the application cares a 
lot about maintaining some minimum i/o rate.  note I said hint, not 
requirement.  -not- guaranteed-rate-IO.

I might even go so far as to advocate identifying common usage patterns, 
and creating hint constants for them, even if we don't support them in 
the kernel immediately (if ever).  Makes the interface much more 
future-proof, at the expense of a few integers in a 32-bit numberspace, 
and a few more bytes in the C compiler's symbol table.

    Jeff




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:56 ` Andrew Morton
  2002-03-17  9:10   ` Jeff Garzik
@ 2002-03-17 13:41   ` Anton Altaparmakov
  2002-03-17 14:31     ` Simon Richter
                       ` (4 more replies)
  2002-03-17 20:18   ` Richard Gooch
  2 siblings, 5 replies; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-17 13:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

At 09:10 17/03/02, Jeff Garzik wrote:
>Andrew Morton wrote:
>>Jeff Garzik wrote:
>>>So... we have madvise, why not fadvise?  I would love the capability for
>>>applications to provide hints to the OS like madvise, but for file
>>>descriptors...
>>
>>The one hint which I can think of which would be beneficial would
>>be an equivalent to MADV_SEQUENTIAL.  Something which says "this
>>is a big streaming read/write - don't go and evict other stuff because
>>of it".  O_STREAMING perhaps.  Or working dropbehind heuristics,
>>although I suspect that explicit controls will always do better.
>>
>>For MADV_RANDOM, readahead window scaling should get that right.
>>
>>What else were you thinking of?
>
>Hints for,
>* sequential read
>* sequential write
>* sequential write, where the application considers the data it's writing 
>to be unlikely to be read again any time soon (hopefully implying to the 
>page cache that these pages have low value as cacheable objects)
>* some sort of streaming hints, implying that the application cares a lot 
>about maintaining some minimum i/o rate.  note I said hint, not 
>requirement.  -not- guaranteed-rate-IO.
>
>I might even go so far as to advocate identifying common usage patterns, 
>and creating hint constants for them, even if we don't support them in the 
>kernel immediately (if ever).  Makes the interface much more future-proof, 
>at the expense of a few integers in a 32-bit numberspace, and a few more 
>bytes in the C compiler's symbol table.

We don't need fadvise IMHO. That is what open(2) is for. The streaming 
request you are asking for is just a normal open(2). It will do read ahead 
which is perfect for streaming (of data size << RAM size in its current form).

When you want large data streaming, i.e. you start getting worried about 
memory pressure, then you want open(2) + O_DIRECT. No caching done. Perfect 
for large data streams and we have that already. I agree that you may want 
some form of asynchronous read ahead with passed pages being dropped from 
the cache but that could be just a open(2) + O_SEQUENTIAL (doesn't exist yet).

All of what you are asking for exists in Windows and all the semantics are 
implemented through a very powerful open(2) equivalent. I don't see why we 
shouldn't do the same. It makes more sense to me than inventing yet another 
system call...

The Windows NT/2k/XP CreateFile() call is documented at below URL. Search 
for FILE_FLAG_* and there is a nice big table with all the possible access 
method hints one can give when opening or creating a file. Many of those 
make perfect sense to have in the Linux kernel, too and in fact with 
O_DIRECT we already have some of the functionality Windows offers (there it 
would be FILE_FLAG_NO_BUFFERING)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/filesio_7wmd.asp

Best regards,

Anton


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 13:41   ` Anton Altaparmakov
@ 2002-03-17 14:31     ` Simon Richter
  2002-03-17 14:56       ` Jan Hudec
  2002-03-17 15:00     ` Anton Altaparmakov
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Simon Richter @ 2002-03-17 14:31 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

On Sun, 17 Mar 2002, Anton Altaparmakov wrote:

> All of what you are asking for exists in Windows and all the semantics are
> implemented through a very powerful open(2) equivalent. I don't see why we
> shouldn't do the same. It makes more sense to me than inventing yet another
> system call...

It is easier for application writers to code:

[...]
#ifdef HAVE_FADVISE
	(void)fadvise(fd, FADV_STREAMING);
#endif
[...]

Than to have a forest of #ifdefs to determine which O_* flags are
supported. After all, we still want our programs to run under Solaris. :-)

   Simon

-- 
GPG public key available from http://phobos.fs.tum.de/pgp/Simon.Richter.asc
 Fingerprint: 040E B5F7 84F1 4FBC CEAD  ADC6 18A0 CC8D 5706 A4B4
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 14:31     ` Simon Richter
@ 2002-03-17 14:56       ` Jan Hudec
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Hudec @ 2002-03-17 14:56 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

> It is easier for application writers to code:
> 
> [...]
> #ifdef HAVE_FADVISE
> 	(void)fadvise(fd, FADV_STREAMING);
> #endif
> [...]
> 
> Than to have a forest of #ifdefs to determine which O_* flags are
> supported. After all, we still want our programs to run under Solaris. :-)

#ifndef O_STREAMING
#define O_STREAMING 0
#endif
(and then just use the flag in open)

is still better - it can be done in a header somewhere, once for all opens.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 13:41   ` Anton Altaparmakov
  2002-03-17 14:31     ` Simon Richter
@ 2002-03-17 15:00     ` Anton Altaparmakov
  2002-03-17 19:20     ` Joel Becker
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-17 15:00 UTC (permalink / raw)
  To: Simon Richter; +Cc: Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

At 14:31 17/03/02, Simon Richter wrote:
>On Sun, 17 Mar 2002, Anton Altaparmakov wrote:
>
> > All of what you are asking for exists in Windows and all the semantics are
> > implemented through a very powerful open(2) equivalent. I don't see why we
> > shouldn't do the same. It makes more sense to me than inventing yet another
> > system call...
>
>It is easier for application writers to code:
>
>[...]
>#ifdef HAVE_FADVISE
>         (void)fadvise(fd, FADV_STREAMING);
>#endif
>[...]
>
>Than to have a forest of #ifdefs to determine which O_* flags are
>supported. After all, we still want our programs to run under Solaris. :-)

Ugh. Both of your suggestions look ugly. Using the O_* flags, you just need 
to have a compatibility header file which contains:

#ifndef HAVE_O_SEQUENTIAL
#       define O_SEQUENTIAL     0
#endif

Then in the code you just use O_SEQUENTIAL and if the system doesn't know 
about it it is optimised away at compile time.

Note how nicely this fits in with autoconf/automake where the ./configure 
script can test for O_SEQUENTIAL and if it is not there automatically 
define it to 0. That then means your code is completely free from these 
ugly #ifdefs.

Thanks for making your point as that is ANOTHER argument for using open(2) 
instead of fadvise() [1]. (-;

Cheers,

Anton

[1] Yeah, I know, one could also define fadvise() to nothing in the compat 
header file...


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:39 fadvise syscall? Jeff Garzik
  2002-03-17  8:56 ` Andrew Morton
@ 2002-03-17 15:13 ` Ken Hirsch
  2002-03-17 17:14 ` Anton Altaparmakov
  2 siblings, 0 replies; 41+ messages in thread
From: Ken Hirsch @ 2002-03-17 15:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

There is a posix_fadvise() syscall in the POSIX Advanced Realtime
specification
http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html


I don't know if this has been mentioned on linux-kernel before, but in
January, the Open Group, in cooperation with IEEE, added the POSIX
functionality to their specification and made it available online for free.
It's at
http://www.opengroup.org/onlinepubs/007904975/toc.htm

There are some useful tables at
http://www.unix-systems.org/version3/online.html and they ask that you
register there so that they know how many people are using the
specification.

They don't have a downloadable version of this specification, but they do
for the previous versions:
http://www.opengroup.org/onlinepubs/007908799/download/

Ken Hirsch




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:39 fadvise syscall? Jeff Garzik
  2002-03-17  8:56 ` Andrew Morton
  2002-03-17 15:13 ` Ken Hirsch
@ 2002-03-17 17:14 ` Anton Altaparmakov
  2002-03-17 18:31   ` Mark Mielke
                     ` (2 more replies)
  2 siblings, 3 replies; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-17 17:14 UTC (permalink / raw)
  To: Ken Hirsch; +Cc: linux-kernel, linux-fsdevel

At 15:13 17/03/02, Ken Hirsch wrote:
>There is a posix_fadvise() syscall in the POSIX Advanced Realtime
>specification
>http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html

Posix or not I still don't see why one would want that. You know what you 
are going to be using a file for at open time and you are not going to be 
changing your mind later. If you can show me a single _real_world_ example 
where one would genuinely want to change from one access pattern to another 
without closing/reopening a particular file I would agree that fadvise is a 
good idea but otherwise I think open(2) is the superior approach.

In addition, open(2) allows you to do cool things like O_TEMP which could 
create a file that would never get written to disk at all and on close 
would just disappear again (just an idea, I can see good uses for such 
things, although in a way we already have simillar semantics when one 
creates such files on a tmpfs mount).

Best regards,
Anton

>I don't know if this has been mentioned on linux-kernel before, but in
>January, the Open Group, in cooperation with IEEE, added the POSIX
>functionality to their specification and made it available online for free.
>It's at
>http://www.opengroup.org/onlinepubs/007904975/toc.htm
>
>There are some useful tables at
>http://www.unix-systems.org/version3/online.html and they ask that you
>register there so that they know how many people are using the
>specification.
>
>They don't have a downloadable version of this specification, but they do
>for the previous versions:
>http://www.opengroup.org/onlinepubs/007908799/download/
>
>Ken Hirsch
>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 17:14 ` Anton Altaparmakov
@ 2002-03-17 18:31   ` Mark Mielke
  2002-03-17 18:35   ` Ken Hirsch
  2002-03-17 19:06   ` Anton Altaparmakov
  2 siblings, 0 replies; 41+ messages in thread
From: Mark Mielke @ 2002-03-17 18:31 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Ken Hirsch, linux-kernel, linux-fsdevel

On Sun, Mar 17, 2002 at 05:14:20PM +0000, Anton Altaparmakov wrote:
> At 15:13 17/03/02, Ken Hirsch wrote:
> >There is a posix_fadvise() syscall in the POSIX Advanced Realtime
> >specification
> >http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html
> Posix or not I still don't see why one would want that. You know what you 
> are going to be using a file for at open time and you are not going to be 
> changing your mind later. If you can show me a single _real_world_ example 
> where one would genuinely want to change from one access pattern to another 
> without closing/reopening a particular file I would agree that fadvise is a 
> good idea but otherwise I think open(2) is the superior approach.

Also, at least in theory, open() can begin loading pages the moment it
completes (if the system is sufficiently idle). Calling madvise() "at
some later point" would allow a window during which the kernel could
already be loading the wrong pages, before it is *then* told "oh btw, I
really want *these* pages." As an example (assuming open() doesn't do this
already) I would be pleasantly surprised if open(O_RDONLY | O_SEQUENTIAL)
began loading at least the first page in the file the moment open() was
successful. Then, when we get control back to actually do a read() (we
may have been interrupted during open()) the page is already there.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 17:14 ` Anton Altaparmakov
  2002-03-17 18:31   ` Mark Mielke
@ 2002-03-17 18:35   ` Ken Hirsch
  2002-03-17 19:06   ` Anton Altaparmakov
  2 siblings, 0 replies; 41+ messages in thread
From: Ken Hirsch @ 2002-03-17 18:35 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: linux-kernel, linux-fsdevel

Anton Altaparmakov writes
> Posix or not I still don't see why one would want that. You know what you
> are going to be using a file for at open time and you are not going to be
> changing your mind later. If you can show me a single _real_world_ example
> where one would genuinely want to change from one access pattern to
another
> without closing/reopening a particular file I would agree that fadvise is
a
> good idea but otherwise I think open(2) is the superior approach.
>

Sure, a database manager can change the access pattern on every query.  If
there's an index and not too many records are expected to match, it will use
a random pattern, otherwise it will use sequential access.






^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 17:14 ` Anton Altaparmakov
  2002-03-17 18:31   ` Mark Mielke
  2002-03-17 18:35   ` Ken Hirsch
@ 2002-03-17 19:06   ` Anton Altaparmakov
  2002-03-17 20:19     ` Ken Hirsch
  2002-03-18  0:12     ` Anton Altaparmakov
  2 siblings, 2 replies; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-17 19:06 UTC (permalink / raw)
  To: Ken Hirsch; +Cc: linux-kernel, linux-fsdevel

At 18:35 17/03/02, Ken Hirsch wrote:
>Anton Altaparmakov writes
> > Posix or not I still don't see why one would want that. You know what you
> > are going to be using a file for at open time and you are not going to be
> > changing your mind later. If you can show me a single _real_world_ example
> > where one would genuinely want to change from one access pattern to
>another
> > without closing/reopening a particular file I would agree that fadvise is
>a
> > good idea but otherwise I think open(2) is the superior approach.
> >
>
>Sure, a database manager can change the access pattern on every query.  If
>there's an index and not too many records are expected to match, it will use
>a random pattern, otherwise it will use sequential access.

Last time I heard serious databases use their own memmory 
management/caching in combination with O_DIRECT, i.e. they bypass the 
kernel's buffering system completely. Hence I would deem them irrelevant to 
the problem at hand...

If a database were not to use O_DIRECT I would think it would be using mmap 
so it would have madvise already... but I am not a database expert so take 
this with a pinch of salt...

Best regards,

Anton


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 13:41   ` Anton Altaparmakov
  2002-03-17 14:31     ` Simon Richter
  2002-03-17 15:00     ` Anton Altaparmakov
@ 2002-03-17 19:20     ` Joel Becker
  2002-03-18  7:28     ` Jeff Garzik
  2002-03-18  8:05     ` Joel Becker
  4 siblings, 0 replies; 41+ messages in thread
From: Joel Becker @ 2002-03-17 19:20 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

On Sun, Mar 17, 2002 at 01:41:37PM +0000, Anton Altaparmakov wrote:
> When you want large data streaming, i.e. you start getting worried about 
> memory pressure, then you want open(2) + O_DIRECT. No caching done. Perfect 
> for large data streams and we have that already. I agree that you may want 
> some form of asynchronous read ahead with passed pages being dropped from 
> the cache but that could be just a open(2) + O_SEQUENTIAL (doesn't exist yet).

	O_DIRECT isn't the right thing for large streaming.  You want
readahead and dropbehind.  O_DIRECT takes substantial penalties for its
lack of copy/cacheing.  This works fine in certain circumstances
(applications that keep their own caching), but for something like a
video or mp3, you'll win with working dropbehind easily.

Joel

-- 

Life's Little Instruction Book #444

	"Never underestimate the power of a kind word or deed."

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  8:56 ` Andrew Morton
  2002-03-17  9:10   ` Jeff Garzik
  2002-03-17 13:41   ` Anton Altaparmakov
@ 2002-03-17 20:18   ` Richard Gooch
  2 siblings, 0 replies; 41+ messages in thread
From: Richard Gooch @ 2002-03-17 20:18 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

Jeff Garzik writes:
> Andrew Morton wrote:
> 
> >Jeff Garzik wrote:
> >>So... we have madvise, why not fadvise?  I would love the capability for
> >>applications to provide hints to the OS like madvise, but for file
> >>descriptors...
> >>
> >
> >The one hint which I can think of which would be beneficial would
> >be an equivalent to MADV_SEQUENTIAL.  Something which says "this
> >is a big streaming read/write - don't go and evict other stuff because
> >of it".  O_STREAMING perhaps.  Or working dropbehind heuristics,
> >although I suspect that explicit controls will always do better.
> >
> >For MADV_RANDOM, readahead window scaling should get that right.
> >
> >What else were you thinking of?
> >
> 
> Hints for,
> * sequential read
> * sequential write
> * sequential write, where the application considers the data it's 
> writing to be unlikely to be read again any time soon (hopefully 
> implying to the page cache that these pages have low value as cacheable 
> objects)
> * some sort of streaming hints, implying that the application cares a 
> lot about maintaining some minimum i/o rate.  note I said hint, not 
> requirement.  -not- guaranteed-rate-IO.
> 
> I might even go so far as to advocate identifying common usage
> patterns, and creating hint constants for them, even if we don't
> support them in the kernel immediately (if ever).  Makes the
> interface much more future-proof, at the expense of a few integers
> in a 32-bit numberspace, and a few more bytes in the C compiler's
> symbol table.

Here's one that I'd like (came up recently with these 21600x21600x3
images from NASA:-): MADV_REVERSE_SEQUENTIAL. When converting images
from stupid formats which have the origin in the top-left, to formats
which have the origin in the bottom-left (the way god intended), you
can avoid a massive malloc(3) if you read the input file backwards
(basically through llseek(2) steps).

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 19:06   ` Anton Altaparmakov
@ 2002-03-17 20:19     ` Ken Hirsch
  2002-03-18  0:12     ` Anton Altaparmakov
  1 sibling, 0 replies; 41+ messages in thread
From: Ken Hirsch @ 2002-03-17 20:19 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: linux-kernel, linux-fsdevel

Anton Altaparmakov writes:
> Last time I heard serious databases use their own memmory
> management/caching in combination with O_DIRECT, i.e. they bypass the
> kernel's buffering system completely. Hence I would deem them irrelevant
to
> the problem at hand...
>
> If a database were not to use O_DIRECT I would think it would be using
mmap
> so it would have madvise already... but I am not a database expert so take
> this with a pinch of salt...
>

I don't think that either MySQL or PostgreSQL use O_DIRECT; I just grepped
the source and didn't find it.  They can't use mmap() because it uses up too
much process address space.

It's true that commercial databases mostly do their own scheduling and
caching, and if they are the only thing running on your system and you tune
them right, that works.  But it's not necessarily a good thing.  If there
are other processes on your system, there would be a benefit if the DBMS
could inform the operating system of its intentions.

A posix_fadvise() call would be a start, but you could potentially go beyond
that.   For some interesting ideas, see
Seltzer, M., Small, C., Smith, K., "The Case for Extensible Operating
Systems",
Harvard University Center for Research in Computing Technology TR16 -95
(July 1995).
http://citeseer.nj.nec.com/article/seltzer95case.html

Ken Hirsch



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17  9:10   ` Jeff Garzik
@ 2002-03-17 23:59     ` Anton Altaparmakov
  0 siblings, 0 replies; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-17 23:59 UTC (permalink / raw)
  To: Joel Becker; +Cc: Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

At 19:20 17/03/02, Joel Becker wrote:
>On Sun, Mar 17, 2002 at 01:41:37PM +0000, Anton Altaparmakov wrote:
> > When you want large data streaming, i.e. you start getting worried about
> > memory pressure, then you want open(2) + O_DIRECT. No caching done. 
> Perfect
> > for large data streams and we have that already. I agree that you may want
> > some form of asynchronous read ahead with passed pages being dropped from
> > the cache but that could be just a open(2) + O_SEQUENTIAL (doesn't 
> exist yet).
>
>         O_DIRECT isn't the right thing for large streaming.  You want
>readahead and dropbehind.  O_DIRECT takes substantial penalties for its
>lack of copy/cacheing.  This works fine in certain circumstances
>(applications that keep their own caching), but for something like a
>video or mp3, you'll win with working dropbehind easily.

Oh absolutely. For mp3s, dvds, etc. Note I wrote O_SEQUENTIAL... Perhaps I 
didn't emphasize it enough. In multimedia applications you very well know 
in advance what you want so you can specify it at open(2) time.

Best regards,

Anton


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 19:06   ` Anton Altaparmakov
  2002-03-17 20:19     ` Ken Hirsch
@ 2002-03-18  0:12     ` Anton Altaparmakov
       [not found]       ` <a73ujs$5mc$1@cesium.transmeta.com>
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-18  0:12 UTC (permalink / raw)
  To: Ken Hirsch; +Cc: linux-kernel, linux-fsdevel

At 20:19 17/03/02, Ken Hirsch wrote:
>Anton Altaparmakov writes:
> > Last time I heard serious databases use their own memmory
> > management/caching in combination with O_DIRECT, i.e. they bypass the
> > kernel's buffering system completely. Hence I would deem them irrelevant
>to
> > the problem at hand...
> >
> > If a database were not to use O_DIRECT I would think it would be using
>mmap
> > so it would have madvise already... but I am not a database expert so take
> > this with a pinch of salt...
>
>I don't think that either MySQL or PostgreSQL use O_DIRECT; I just grepped
>the source and didn't find it.  They can't use mmap() because it uses up too
>much process address space.

<flame bait>So you consider these two to be serious databases?</flame bait> 
(-; [1]

>It's true that commercial databases mostly do their own scheduling and
>caching, and if they are the only thing running on your system and you tune
>them right, that works.  But it's not necessarily a good thing.  If there
>are other processes on your system, there would be a benefit if the DBMS
>could inform the operating system of its intentions.
>
>A posix_fadvise() call would be a start, but you could potentially go beyond
>that.

Ok, so basically we want both fadvise() and open(2) semantics, with the 
open(2) being a superset of the fadvise() capabilities (some things no 
longer make sense to be specified once the file is open). They can of 
course both be calling the same common helpers inside the kernel...

fadvise() would probably only be used by databases while open(2) would be 
used by the rest of the world. (-;

Best regards,

Anton

[1] Sorry about the flame bait, couldn't resist...  I know they are both 
very respectable databases and they are free software which is great.


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 13:41   ` Anton Altaparmakov
                       ` (2 preceding siblings ...)
  2002-03-17 19:20     ` Joel Becker
@ 2002-03-18  7:28     ` Jeff Garzik
  2002-03-18  7:55       ` Andrew Morton
  2002-03-22 16:05       ` Pavel Machek
  2002-03-18  8:05     ` Joel Becker
  4 siblings, 2 replies; 41+ messages in thread
From: Jeff Garzik @ 2002-03-18  7:28 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

Anton Altaparmakov wrote:

> We don't need fadvise IMHO. That is what open(2) is for. The streaming 
> request you are asking for is just a normal open(2). It will do read 
> ahead which is perfect for streaming (of data size << RAM size in its 
> current form).
>
> When you want large data streaming, i.e. you start getting worried 
> about memory pressure, then you want open(2) + O_DIRECT. No caching 
> done. Perfect for large data streams and we have that already. I agree 
> that you may want some form of asynchronous read ahead with passed 
> pages being dropped from the cache but that could be just a open(2) + 
> O_SEQUENTIAL (doesn't exist yet).
>
> All of what you are asking for exists in Windows and all the semantics 
> are implemented through a very powerful open(2) equivalent. I don't 
> see why we shouldn't do the same. It makes more sense to me than 
> inventing yet another system call...



I disagree, and here's the main reasons:

* fadvise(2) usefulness extends past open(2).  It may be useful to call 
it at various points during runtime.

* I think putting hints in open(2) is the wrong direction to go.  Hints 
have a potential to be very flexible.  open(2) O_xxx bits are not to be 
squandered lightly, while I see a lot more value in being a little more 
loose and free with the bit assignment for an "fadvise mask" (just a 
list of hint bits).  IMO it should be easier to introduce and retire 
hints, far easier than O_xxx flags.

    Jeff





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  7:28     ` Jeff Garzik
@ 2002-03-18  7:55       ` Andrew Morton
  2002-03-18  8:07         ` Jeff Garzik
  2002-03-18 16:41         ` Richard Gooch
  2002-03-22 16:05       ` Pavel Machek
  1 sibling, 2 replies; 41+ messages in thread
From: Andrew Morton @ 2002-03-18  7:55 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Anton Altaparmakov, linux-kernel, linux-fsdevel

Jeff Garzik wrote:
> 
> * fadvise(2) usefulness extends past open(2).  It may be useful to call
> it at various points during runtime.
> 
> * I think putting hints in open(2) is the wrong direction to go.  Hints
> have a potential to be very flexible.  open(2) O_xxx bits are not to be
> squandered lightly, while I see a lot more value in being a little more
> loose and free with the bit assignment for an "fadvise mask" (just a
> list of hint bits).  IMO it should be easier to introduce and retire
> hints, far easier than O_xxx flags.
> 

Yup.

posix_fadvise() looks to be a fine interface:

int posix_fadvise(int fd, off_t offset, size_t len, int advice);

 DESCRIPTION

     The posix_fadvise() function shall advise the implementation on
     the expected behavior of the application with respect to the data in
     the file associated with the open file descriptor, fd, starting at offset
     and continuing for len bytes. The specified range need not currently
     exist in the file. If len is zero, all data following offset is specified.
     The implementation may use this information to optimize handling
     of the specified data. The posix_fadvise() function shall have no
     effect on the semantics of other operations on the specified data,
     although it may affect the performance of other operations.

     The advice to be applied to the data is specified by the advice
     parameter and may be one of the following values:

     POSIX_FADV_NORMAL 

          Specifies that the application has no advice to give on its
          behavior with respect to the specified data. It is the default
          characteristic if no advice is given for an open file. 
     POSIX_FADV_SEQUENTIAL 

          Specifies that the application expects to access the specified
          data sequentially from lower offsets to higher offsets. 
     POSIX_FADV_RANDOM 

          Specifies that the application expects to access the specified
          data in a random order. 
     POSIX_FADV_WILLNEED 

          Specifies that the application expects to access the specified
          data in the near future. 
     POSIX_FADV_DONTNEED 

          Specifies that the application expects that it will not access
          the specified data in the near future. 
     POSIX_FADV_NOREUSE 

          Specifies that the application expects to access the specified
          data once and then not reuse it thereafter. 

We can usefully implement all of these.  FADV_WILLNEED obsoletes
sys_readahead().

We'll need to cheat a bit on the offset/len thing for NORMAL and
SEQUENTIAL - just apply it to the whole file - we don't want to have to
attach an arbitrary number of silly range objects to each file for this.
(We already cheat a bit this way with msync).

Note that it applies to a file descriptor.  If posix_fadvise(FADV_DONTNEED) is
called against a file descriptor, and someone else has an fd open
against the same file, that other user gets their foot shot off.  That's
OK.

Given this, I don't see a persuasive need to implement a non-standard
interface.  It takes an off_t, so posix_fadvise64() is also needed.

The presence of this interface doesn't imply that we don't need
good dropbehind heuristics for streaming reads and writes.  We
do need those.

I wouldn't suggest that anyone rush out and implement this stuff for 2.5.
There's some decrudding needed in filemap.c first, and many of these
hints need to interact with the 2.6 VM.  Whatever that will be.

A 2.4 implementation could be done any time.  If anyone decides to
do this, please let me know...

-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-17 13:41   ` Anton Altaparmakov
                       ` (3 preceding siblings ...)
  2002-03-18  7:28     ` Jeff Garzik
@ 2002-03-18  8:05     ` Joel Becker
  2002-03-18  8:10       ` Jeff Garzik
  2002-03-18  8:14       ` Andrew Morton
  4 siblings, 2 replies; 41+ messages in thread
From: Joel Becker @ 2002-03-18  8:05 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

On Sun, Mar 17, 2002 at 01:41:37PM +0000, Anton Altaparmakov wrote:
> We don't need fadvise IMHO. That is what open(2) is for. The streaming 
> request you are asking for is just a normal open(2). It will do read ahead 
> which is perfect for streaming (of data size << RAM size in its current form).

	A quick real world example of where fadvise can work well.
Imagine a database appliction that doesn't use O_DIRECT (for whatever
reason, could even be that they don't trust the linux implementation yet
:-).  So, this database gets a query.  That query requires a full table
scan, so it calls fadvise(fd, F_SEQUENTIAL).  Then another query does
row-specific access, and caching helps.  So it wants to turn off
F_SEQUENTIAL.
	Other applications can use this sort of stuff.  DBM could, for
instance.  So might GIMP.  Etc.  Dynamic hints have real world
applications.


Joel


-- 

print STDOUT q
Just another Perl hacker,
unless $spring
	-Larry Wall

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  7:55       ` Andrew Morton
@ 2002-03-18  8:07         ` Jeff Garzik
  2002-03-18  8:17           ` Andrew Morton
  2002-03-18 16:41         ` Richard Gooch
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff Garzik @ 2002-03-18  8:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Anton Altaparmakov, linux-kernel, linux-fsdevel

Andrew Morton wrote:

>posix_fadvise() looks to be a fine interface:
>

>We'll need to cheat a bit on the offset/len thing for NORMAL and
>SEQUENTIAL - just apply it to the whole file - we don't want to have to
>attach an arbitrary number of silly range objects to each file for this.
>(We already cheat a bit this way with msync).
>
yep

>Given this, I don't see a persuasive need to implement a non-standard
>interface.  It takes an off_t, so posix_fadvise64() is also needed.
>
agreed WRT non-standard.

Are we required to have both foo and foo64 variants?  If I had my 
druthers, I would just do the foo64 version.

>
>A 2.4 implementation could be done any time.  If anyone decides to
>do this, please let me know...
>


count me down as interested after my current project...  If someone else 
does it, more power to them...

    Jeff





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:05     ` Joel Becker
@ 2002-03-18  8:10       ` Jeff Garzik
  2002-03-18  8:20         ` Joel Becker
  2002-03-18  8:14       ` Andrew Morton
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff Garzik @ 2002-03-18  8:10 UTC (permalink / raw)
  To: Joel Becker
  Cc: Anton Altaparmakov, Andrew Morton, linux-kernel, linux-fsdevel

Joel Becker wrote:

>Other applications can use this sort of stuff.  DBM could, for
>instance.  So might GIMP.  Etc.  Dynamic hints have real world
>applications.
>

to be fair, fcntl(2) could be used in conjunction with open(2), to do 
dynamic hints.

I prefer to separate the hints from other O_xxx flags, though, so 
posix_fadvise seems to be applicable...

    Jeff





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:05     ` Joel Becker
  2002-03-18  8:10       ` Jeff Garzik
@ 2002-03-18  8:14       ` Andrew Morton
  2002-03-18 14:39         ` Martin K. Petersen
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2002-03-18  8:14 UTC (permalink / raw)
  To: Joel Becker; +Cc: Anton Altaparmakov, Jeff Garzik, linux-kernel, linux-fsdevel

Joel Becker wrote:
> 
> On Sun, Mar 17, 2002 at 01:41:37PM +0000, Anton Altaparmakov wrote:
> > We don't need fadvise IMHO. That is what open(2) is for. The streaming
> > request you are asking for is just a normal open(2). It will do read ahead
> > which is perfect for streaming (of data size << RAM size in its current form).
> 
>         A quick real world example of where fadvise can work well.
> Imagine a database appliction that doesn't use O_DIRECT (for whatever
> reason, could even be that they don't trust the linux implementation yet
> :-).

O_DIRECT is broken against RAID0 (at least) in 2.5 at present.  The
RAID driver gets sent BIOs which straddle two or more chunks and RAID
spits out lots of unpleasant warnings.  Neil has been informed...

>  So, this database gets a query.  That query requires a full table
> scan, so it calls fadvise(fd, F_SEQUENTIAL).  Then another query does
> row-specific access, and caching helps.  So it wants to turn off
> F_SEQUENTIAL.

It'd probably be smarter for the application to hold two fds against
the same file for this sort of access pattern.


-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:07         ` Jeff Garzik
@ 2002-03-18  8:17           ` Andrew Morton
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Morton @ 2002-03-18  8:17 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Anton Altaparmakov, linux-kernel, linux-fsdevel

Jeff Garzik wrote:
> 
> ...
> >Given this, I don't see a persuasive need to implement a non-standard
> >interface.  It takes an off_t, so posix_fadvise64() is also needed.
> >
> agreed WRT non-standard.
> 
> Are we required to have both foo and foo64 variants?  If I had my
> druthers, I would just do the foo64 version.

That would be good.  I can't see a reason why

	#define posix_fadvise posix_fadvise64

would not suffice.  That doesn't mean there isn't one :)

-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:10       ` Jeff Garzik
@ 2002-03-18  8:20         ` Joel Becker
  0 siblings, 0 replies; 41+ messages in thread
From: Joel Becker @ 2002-03-18  8:20 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Joel Becker, Anton Altaparmakov, Andrew Morton, linux-kernel,
	linux-fsdevel

On Mon, Mar 18, 2002 at 03:10:03AM -0500, Jeff Garzik wrote:
> to be fair, fcntl(2) could be used in conjunction with open(2), to do 
> dynamic hints.

	I wasn't speaking to the exact interface, just to the real world
usefulness of hints after open(2).  But yes, surely :-)

Joel


-- 

"Baby, even the losers
 Get luck sometimes.
 Even the losers
 Keep a little bit of pride."

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
       [not found]       ` <a73ujs$5mc$1@cesium.transmeta.com>
@ 2002-03-18  8:58         ` Jan Hudec
  2002-03-18 10:08           ` Jeff Garzik
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Hudec @ 2002-03-18  8:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel

> Followup to:  <5.1.0.14.2.20020318000057.051d30e0@pop.cus.cam.ac.uk>
> By author:    Anton Altaparmakov <aia21@cam.ac.uk>
> In newsgroup: linux.dev.fs.devel
> > 
> > Ok, so basically we want both fadvise() and open(2) semantics, with the 
> > open(2) being a superset of the fadvise() capabilities (some things no 
> > longer make sense to be specified once the file is open). They can of 
> > course both be calling the same common helpers inside the kernel...
> > 
> 
> If they're open() flags, they should probably be controlled with
> fcntl() rather than with a new system call.

Then posix_fadvise interface can be implemented in libc using fcntl.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:58         ` Jan Hudec
@ 2002-03-18 10:08           ` Jeff Garzik
  2002-03-18 17:29             ` Mark Mielke
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff Garzik @ 2002-03-18 10:08 UTC (permalink / raw)
  To: Jan Hudec; +Cc: linux-fsdevel, linux-kernel

Jan Hudec wrote:

>>Followup to:  <5.1.0.14.2.20020318000057.051d30e0@pop.cus.cam.ac.uk>
>>By author:    Anton Altaparmakov <aia21@cam.ac.uk>
>>In newsgroup: linux.dev.fs.devel
>>
>>>Ok, so basically we want both fadvise() and open(2) semantics, with the 
>>>open(2) being a superset of the fadvise() capabilities (some things no 
>>>longer make sense to be specified once the file is open). They can of 
>>>course both be calling the same common helpers inside the kernel...
>>>
>>If they're open() flags, they should probably be controlled with
>>fcntl() rather than with a new system call.
>>
>
>Then posix_fadvise interface can be implemented in libc using fcntl.
>
Indeed it can be...  but it less flexible that way, unless you want to 
add another level of indirection.

It is far better for future-proofing the interface IMO if fadvise is 
implementing directly.  Hints are less important than open O_xxx flags 
or F_xxx flags, because an implementation can safely ignore 100% of the 
fadvise hints, if it so chooses.  One cannot say the same thing for 
open/fcntl flags.

So, different class of fd flags deserves a different syscall, IMO...

    Jeff







^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  8:14       ` Andrew Morton
@ 2002-03-18 14:39         ` Martin K. Petersen
  2002-03-18 19:15           ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Martin K. Petersen @ 2002-03-18 14:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joel Becker, Anton Altaparmakov, Jeff Garzik, linux-kernel,
	linux-fsdevel

>>>>> "Andrew" == Andrew Morton <akpm@zip.com.au> writes:

Andrew> O_DIRECT is broken against RAID0 (at least) in 2.5 at present.
Andrew> The RAID driver gets sent BIOs which straddle two or more
Andrew> chunks and RAID spits out lots of unpleasant warnings.  Neil
Andrew> has been informed...

Yep.  I've been porting my original kiobuf based request splitter to
biobufs.  It's almost there, I've just been extremely busy with
something else for a while.

It's not only when you straddle chunks.  The current code does not
handle requests straddling RAID zones either.

-- 
Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
mkp@linuxcare.com, http://www.linuxcare.com/
SGI XFS for Linux Developer, http://oss.sgi.com/projects/xfs/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  7:55       ` Andrew Morton
  2002-03-18  8:07         ` Jeff Garzik
@ 2002-03-18 16:41         ` Richard Gooch
  2002-03-18 19:00           ` Andrew Morton
  1 sibling, 1 reply; 41+ messages in thread
From: Richard Gooch @ 2002-03-18 16:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, Anton Altaparmakov, linux-kernel, linux-fsdevel

Andrew Morton writes:
> Note that it applies to a file descriptor.  If
> posix_fadvise(FADV_DONTNEED) is called against a file descriptor,
> and someone else has an fd open against the same file, that other
> user gets their foot shot off.  That's OK.

Let me verify that I understand what you're saying. Process A and B
independently open the file. The file is already in the cache (because
other processes regularly read this file). Process A is slowly reading
stuff. Process B does FADV_DONTNEED on the whole file. The pages are
dropped.

You're saying this is OK? How about this DoS attack:
	int fd = open ("/lib/libc.so", O_RDONLY, 0);
	while (1) {
		posix_fadvise (fd, 0, 0, FADVISE_DONTNEED);
		sleep (1);
	}

Let me see that disc head move! Wheeee!

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 10:08           ` Jeff Garzik
@ 2002-03-18 17:29             ` Mark Mielke
  0 siblings, 0 replies; 41+ messages in thread
From: Mark Mielke @ 2002-03-18 17:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jan Hudec, linux-fsdevel, linux-kernel

On Mon, Mar 18, 2002 at 05:08:02AM -0500, Jeff Garzik wrote:
> Jan Hudec wrote:
> >Then posix_fadvise interface can be implemented in libc using fcntl.
> It is far better for future-proofing the interface IMO if fadvise is 
> implementing directly.  Hints are less important than open O_xxx flags 
> or F_xxx flags, because an implementation can safely ignore 100% of the 
> fadvise hints, if it so chooses.  One cannot say the same thing for 
> open/fcntl flags.

There is nothing to say that fadvise(...) shouldn't call fcntl(F_ADVISE, ...).

If it fits in with open(), then it might just fit in with F_GETFL /
F_SETFL as well.

I prefer generalization, especially for non-critical functions that should
not be called 1,000,000 a second, such as fadvise().

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 16:41         ` Richard Gooch
@ 2002-03-18 19:00           ` Andrew Morton
  2002-03-18 19:15             ` Richard Gooch
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2002-03-18 19:00 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Jeff Garzik, Anton Altaparmakov, linux-kernel, linux-fsdevel

Richard Gooch wrote:
> 
> Andrew Morton writes:
> > Note that it applies to a file descriptor.  If
> > posix_fadvise(FADV_DONTNEED) is called against a file descriptor,
> > and someone else has an fd open against the same file, that other
> > user gets their foot shot off.  That's OK.
> 
> Let me verify that I understand what you're saying. Process A and B
> independently open the file. The file is already in the cache (because
> other processes regularly read this file). Process A is slowly reading
> stuff. Process B does FADV_DONTNEED on the whole file. The pages are
> dropped.
> 
> You're saying this is OK? How about this DoS attack:
>         int fd = open ("/lib/libc.so", O_RDONLY, 0);
>         while (1) {
>                 posix_fadvise (fd, 0, 0, FADVISE_DONTNEED);
>                 sleep (1);
>         }
> 
> Let me see that disc head move! Wheeee!
> 

POSIX_FADV_DONTNEED could only unmap pages from the caller's
VMA's, so the problem would only affect other processes which
share the same mm - CLONE_MM threads.

If some other process has a reference on the pages then they
wouldn't get unmapped as a result of this.  It's the same
as madvise(MADV_DONTNEED).

-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 19:00           ` Andrew Morton
@ 2002-03-18 19:15             ` Richard Gooch
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Gooch @ 2002-03-18 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, Anton Altaparmakov, linux-kernel, linux-fsdevel

Andrew Morton writes:
> Richard Gooch wrote:
> > 
> > Andrew Morton writes:
> > > Note that it applies to a file descriptor.  If
> > > posix_fadvise(FADV_DONTNEED) is called against a file descriptor,
> > > and someone else has an fd open against the same file, that other
> > > user gets their foot shot off.  That's OK.
> > 
> > Let me verify that I understand what you're saying. Process A and B
> > independently open the file. The file is already in the cache (because
> > other processes regularly read this file). Process A is slowly reading
> > stuff. Process B does FADV_DONTNEED on the whole file. The pages are
> > dropped.
> > 
> > You're saying this is OK? How about this DoS attack:
> >         int fd = open ("/lib/libc.so", O_RDONLY, 0);
> >         while (1) {
> >                 posix_fadvise (fd, 0, 0, FADVISE_DONTNEED);
> >                 sleep (1);
> >         }
> > 
> > Let me see that disc head move! Wheeee!
> > 
> 
> POSIX_FADV_DONTNEED could only unmap pages from the caller's
> VMA's, so the problem would only affect other processes which
> share the same mm - CLONE_MM threads.
> 
> If some other process has a reference on the pages then they
> wouldn't get unmapped as a result of this.  It's the same
> as madvise(MADV_DONTNEED).

OK, I misparsed what you had said. Good.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 14:39         ` Martin K. Petersen
@ 2002-03-18 19:15           ` Andrew Morton
  2002-03-18 19:42             ` Martin K. Petersen
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2002-03-18 19:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Joel Becker, Anton Altaparmakov, Jeff Garzik, linux-kernel,
	linux-fsdevel

"Martin K. Petersen" wrote:
> 
> >>>>> "Andrew" == Andrew Morton <akpm@zip.com.au> writes:
> 
> Andrew> O_DIRECT is broken against RAID0 (at least) in 2.5 at present.
> Andrew> The RAID driver gets sent BIOs which straddle two or more
> Andrew> chunks and RAID spits out lots of unpleasant warnings.  Neil
> Andrew> has been informed...
> 
> Yep.  I've been porting my original kiobuf based request splitter to
> biobufs.  It's almost there, I've just been extremely busy with
> something else for a while.
> 
> It's not only when you straddle chunks.  The current code does not
> handle requests straddling RAID zones either.

google fails me - where does your kiobuf-based splitter live?

I'm curious to know how this will all work.  Will it take a
large BIO and split it into a number of smaller, newly allocated
BIOs?  That would be kinda sad, IMO - the current bio-per-bh
allocations in the normal I/O path are really expensive, and
it seems wrong to take a large BIO, split it into lots of
teeny ones and then reassemble all the way down at the driver
level.

If that's really the only way in which we can solve this problem,
would it not be better to pass information up to the higher layer,
telling it when the BIO which is currently under assembly cannot
be grown further?  Say, blk_can_i_add_more_stuff_to_this_bio()?

Anyway.  I'm interested.  O_DIRECT is a bit of a weird curiosity,
but I'm working on making these big-BIO code paths *the* way in which
data gets to and from disk.  It needs to be efficient ;)

-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 19:15           ` Andrew Morton
@ 2002-03-18 19:42             ` Martin K. Petersen
  2002-03-19 20:08               ` Eric W. Biederman
  0 siblings, 1 reply; 41+ messages in thread
From: Martin K. Petersen @ 2002-03-18 19:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joel Becker, Anton Altaparmakov, Jeff Garzik, linux-kernel,
	linux-fsdevel

>>>>> "Andrew" == Andrew Morton <akpm@zip.com.au> writes:

Andrew> google fails me - where does your kiobuf-based splitter live?

It's in the kiobuf XFS patches.


Andrew> I'm curious to know how this will all work.  Will it take a
Andrew> large BIO and split it into a number of smaller, newly
Andrew> allocated BIOs?  

For kiobufs I walked the request, cloned a new every time I crossed a
stripe/device boundary and sent it off.  I had my own completion
function with an atomic counter that would call the parent kiobuf's
end_io function when all clones had completed.

So I didn't chop the request into page sized chunks or something like
that.


Andrew> If that's really the only way in which we can solve this
Andrew> problem, would it not be better to pass information up to the
Andrew> higher layer, telling it when the BIO which is currently under
Andrew> assembly cannot be grown further?  Say,
Andrew> blk_can_i_add_more_stuff_to_this_bio()?

We tried different approaches.  One of them was to be able to signal
to upper layers that your I/O was too big and please submit smaller
chunks.  Running with that, however, the I/O size converged against
small requests because you'd often start an I/O - say 4K - from a
stripe boundary.  And that would kill it right off.

So unless the filesystem knows about stripe/device boundaries it's
really hard to get the size signalling right.  And then what happens
when you stack LVM and MD?

In the end, cloning the kiobuf from the above and adjusting
offset/length in the children turned out to be the best approach.

And I suspect that's why Jens kept the clone facility around for bio
bufs :)


Andrew> Anyway.  I'm interested.  O_DIRECT is a bit of a weird
Andrew> curiosity, but I'm working on making these big-BIO code paths
Andrew> *the* way in which data gets to and from disk.  It needs to be
Andrew> efficient ;)

*nod*


I'll try and poke at this again tonight.  Will shoot you the patch
once I get the zoning evil sorted out.

-- 
Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
mkp@linuxcare.com, http://www.linuxcare.com/
SGI XFS for Linux Developer, http://oss.sgi.com/projects/xfs/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18 19:42             ` Martin K. Petersen
@ 2002-03-19 20:08               ` Eric W. Biederman
  2002-03-19 23:38                 ` Martin K. Petersen
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2002-03-19 20:08 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Andrew Morton, Joel Becker, Anton Altaparmakov, Jeff Garzik,
	linux-kernel, linux-fsdevel

"Martin K. Petersen" <mkp@mkp.net> writes:

> >>>>> "Andrew" == Andrew Morton <akpm@zip.com.au> writes:
> 
> Andrew> If that's really the only way in which we can solve this
> Andrew> problem, would it not be better to pass information up to the
> Andrew> higher layer, telling it when the BIO which is currently under
> Andrew> assembly cannot be grown further?  Say,
> Andrew> blk_can_i_add_more_stuff_to_this_bio()?

Please let's extend BIOs and not break them up.
 
> We tried different approaches.  One of them was to be able to signal
> to upper layers that your I/O was too big and please submit smaller
> chunks.  Running with that, however, the I/O size converged against
> small requests because you'd often start an I/O - say 4K - from a
> stripe boundary.  And that would kill it right off.
> 
> So unless the filesystem knows about stripe/device boundaries it's
> really hard to get the size signalling right.  And then what happens
> when you stack LVM and MD?
> 
> In the end, cloning the kiobuf from the above and adjusting
> offset/length in the children turned out to be the best approach.

Unless I am mistaken this interacts very badly with the writing data
out to disk to free up memory, because you must allocate memory to
split the bio.  Which is the last place you want to allocate memory
if you can avoid it.

It's been a while but I believe there was a similiar thread about
splitting request to disk and the idea was shot down for similiar
reasons. 

Eric



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-19 20:08               ` Eric W. Biederman
@ 2002-03-19 23:38                 ` Martin K. Petersen
  0 siblings, 0 replies; 41+ messages in thread
From: Martin K. Petersen @ 2002-03-19 23:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Joel Becker, Anton Altaparmakov, Jeff Garzik,
	linux-kernel, linux-fsdevel

>>>>> "Eric" == Eric W Biederman <ebiederm@xmission.com> writes:

>> In the end, cloning the kiobuf from the above and adjusting
>> offset/length in the children turned out to be the best approach.

Eric> Unless I am mistaken this interacts very badly with the writing
Eric> data out to disk to free up memory, because you must allocate
Eric> memory to split the bio.  Which is the last place you want to
Eric> allocate memory if you can avoid it.

Well.  We have several places in the I/O path already where we need to
allocate memory in order to fulfill an I/O.  

Think RAID1 where you need to turn one request from the filesystem
into several - one for each mirror.  Or RAID5 where a write may cause
several reads/writes so you can mask and write the checksum out.

Also, with journaling filesystems you may very well be in a situation
where pushing a file to disk involves writing transactions to the log
before you can actually free up buffers.

In this case the clones come from the bio slab cache and are thus no
different from any other I/Os.  Furthermore, the clones share the bulk
of their data with the parent, so the overhead isn't that big.

-- 
Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
mkp@linuxcare.com, http://www.linuxcare.com/
SGI XFS for Linux Developer, http://oss.sgi.com/projects/xfs/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-18  7:28     ` Jeff Garzik
  2002-03-18  7:55       ` Andrew Morton
@ 2002-03-22 16:05       ` Pavel Machek
  2002-03-24  6:38         ` Stevie O
  1 sibling, 1 reply; 41+ messages in thread
From: Pavel Machek @ 2002-03-22 16:05 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Anton Altaparmakov, Andrew Morton, linux-kernel, linux-fsdevel

Hi!

> > We don't need fadvise IMHO. That is what open(2) is for. The streaming 
> > request you are asking for is just a normal open(2). It will do read 
> > ahead which is perfect for streaming (of data size << RAM size in its 
> > current form).
> >
> > When you want large data streaming, i.e. you start getting worried 
> > about memory pressure, then you want open(2) + O_DIRECT. No caching 
> > done. Perfect for large data streams and we have that already. I agree 
> > that you may want some form of asynchronous read ahead with passed 
> > pages being dropped from the cache but that could be just a open(2) + 
> > O_SEQUENTIAL (doesn't exist yet).
> >
> > All of what you are asking for exists in Windows and all the semantics 
> > are implemented through a very powerful open(2) equivalent. I don't 
> > see why we shouldn't do the same. It makes more sense to me than 
> > inventing yet another system call...
> 
> 
> 
> I disagree, and here's the main reasons:
> 
> * fadvise(2) usefulness extends past open(2).  It may be useful to call 
> it at various points during runtime.

open(/proc/self/fd/0, O_NEW_FLAGS)?

> * I think putting hints in open(2) is the wrong direction to go.  Hints 
> have a potential to be very flexible.  open(2) O_xxx bits are not to be 
> squandered lightly, while I see a lot more value in being a little more 
> loose and free with the bit assignment for an "fadvise mask" (just a 
> list of hint bits).  IMO it should be easier to introduce and retire 
> hints, far easier than O_xxx flags.

I don't like idea of new syscall when open works just fine. First prove
O_X hints are usefull, then extend them.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-22 16:05       ` Pavel Machek
@ 2002-03-24  6:38         ` Stevie O
  2002-03-24 11:24           ` Pavel Machek
  0 siblings, 1 reply; 41+ messages in thread
From: Stevie O @ 2002-03-24  6:38 UTC (permalink / raw)
  To: Pavel Machek, Jeff Garzik
  Cc: Anton Altaparmakov, Andrew Morton, linux-kernel, linux-fsdevel

At 04:05 PM 3/22/2002 +0000, Pavel Machek wrote:
>> 
>> 
>> I disagree, and here's the main reasons:
>> 
>> * fadvise(2) usefulness extends past open(2).  It may be useful to call 
>> it at various points during runtime.
>
>open(/proc/self/fd/0, O_NEW_FLAGS)?

So to use fadvise(), the system must have /proc mounted?



Not everybody mounts /proc -- it provides a lot of potential information to anybody who can access it ("hmm... they  have a QZ48257 ethernet chipset [cat /proc/pci] -- lets see, sending this specific sequence of bytes in a TCP packet will lock up the receiver...").


--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-24  6:38         ` Stevie O
@ 2002-03-24 11:24           ` Pavel Machek
  2002-03-24 12:52             ` Anton Altaparmakov
  0 siblings, 1 reply; 41+ messages in thread
From: Pavel Machek @ 2002-03-24 11:24 UTC (permalink / raw)
  To: Stevie O
  Cc: Pavel Machek, Jeff Garzik, Anton Altaparmakov, Andrew Morton,
	linux-kernel, linux-fsdevel

Hi!

> >> I disagree, and here's the main reasons:
> >> 
> >> * fadvise(2) usefulness extends past open(2).  It may be useful to call 
> >> it at various points during runtime.
> >
> >open(/proc/self/fd/0, O_NEW_FLAGS)?
> 
> So to use fadvise(), the system must have /proc mounted?

I think it is way more feasible than adding new syscall.
							Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-24 11:24           ` Pavel Machek
@ 2002-03-24 12:52             ` Anton Altaparmakov
  2002-03-25 11:12               ` Pavel Machek
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Altaparmakov @ 2002-03-24 12:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Stevie O, Pavel Machek, Jeff Garzik, Andrew Morton, linux-kernel,
	linux-fsdevel

At 11:24 24/03/02, Pavel Machek wrote:
>Hi!
>
> > >> I disagree, and here's the main reasons:
> > >>
> > >> * fadvise(2) usefulness extends past open(2).  It may be useful to call
> > >> it at various points during runtime.
> > >
> > >open(/proc/self/fd/0, O_NEW_FLAGS)?
> >
> > So to use fadvise(), the system must have /proc mounted?
>
>I think it is way more feasible than adding new syscall.

Sorry but it is silly. (-; What's wrong with open("filename", O_FLAGS); 
followed by fcntl(); if you want to modify them after opening. That is a 
lot cleaner than going via proc in such a way...

posix_fadvise() can then be implemented in userspace and that can go via 
fcntl(). That way we have the best of both worlds.

Best regards,

Anton


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fadvise syscall?
  2002-03-24 12:52             ` Anton Altaparmakov
@ 2002-03-25 11:12               ` Pavel Machek
  0 siblings, 0 replies; 41+ messages in thread
From: Pavel Machek @ 2002-03-25 11:12 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Stevie O, Jeff Garzik, Andrew Morton, linux-kernel, linux-fsdevel

Hi!

> >> >> I disagree, and here's the main reasons:
> >> >>
> >> >> * fadvise(2) usefulness extends past open(2).  It may be useful to 
> >call
> >> >> it at various points during runtime.
> >> >
> >> >open(/proc/self/fd/0, O_NEW_FLAGS)?
> >>
> >> So to use fadvise(), the system must have /proc mounted?
> >
> >I think it is way more feasible than adding new syscall.
> 
> Sorry but it is silly. (-; What's wrong with open("filename", O_FLAGS); 
> followed by fcntl(); if you want to modify them after opening. That is a 
> lot cleaner than going via proc in such a way...
> 
> posix_fadvise() can then be implemented in userspace and that can go via 
> fcntl(). That way we have the best of both worlds.

Agreed, this is better than my proposal.
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2002-03-25 11:12 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-17  8:39 fadvise syscall? Jeff Garzik
2002-03-17  8:56 ` Andrew Morton
2002-03-17  9:10   ` Jeff Garzik
2002-03-17 23:59     ` Anton Altaparmakov
2002-03-17 13:41   ` Anton Altaparmakov
2002-03-17 14:31     ` Simon Richter
2002-03-17 14:56       ` Jan Hudec
2002-03-17 15:00     ` Anton Altaparmakov
2002-03-17 19:20     ` Joel Becker
2002-03-18  7:28     ` Jeff Garzik
2002-03-18  7:55       ` Andrew Morton
2002-03-18  8:07         ` Jeff Garzik
2002-03-18  8:17           ` Andrew Morton
2002-03-18 16:41         ` Richard Gooch
2002-03-18 19:00           ` Andrew Morton
2002-03-18 19:15             ` Richard Gooch
2002-03-22 16:05       ` Pavel Machek
2002-03-24  6:38         ` Stevie O
2002-03-24 11:24           ` Pavel Machek
2002-03-24 12:52             ` Anton Altaparmakov
2002-03-25 11:12               ` Pavel Machek
2002-03-18  8:05     ` Joel Becker
2002-03-18  8:10       ` Jeff Garzik
2002-03-18  8:20         ` Joel Becker
2002-03-18  8:14       ` Andrew Morton
2002-03-18 14:39         ` Martin K. Petersen
2002-03-18 19:15           ` Andrew Morton
2002-03-18 19:42             ` Martin K. Petersen
2002-03-19 20:08               ` Eric W. Biederman
2002-03-19 23:38                 ` Martin K. Petersen
2002-03-17 20:18   ` Richard Gooch
2002-03-17 15:13 ` Ken Hirsch
2002-03-17 17:14 ` Anton Altaparmakov
2002-03-17 18:31   ` Mark Mielke
2002-03-17 18:35   ` Ken Hirsch
2002-03-17 19:06   ` Anton Altaparmakov
2002-03-17 20:19     ` Ken Hirsch
2002-03-18  0:12     ` Anton Altaparmakov
     [not found]       ` <a73ujs$5mc$1@cesium.transmeta.com>
2002-03-18  8:58         ` Jan Hudec
2002-03-18 10:08           ` Jeff Garzik
2002-03-18 17:29             ` Mark Mielke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).