* open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
@ 2003-06-12 11:14 Matti Aarnio
2003-06-12 11:24 ` Christoph Hellwig
2003-06-12 13:17 ` Andries Brouwer
0 siblings, 2 replies; 13+ messages in thread
From: Matti Aarnio @ 2003-06-12 11:14 UTC (permalink / raw)
To: linux-kernel
I have been debugging long and hard a thing where IO is done
with O_DIRECT flag applied to open(2).
Unlike Linux, FreeBSD (where this flag originates, apparently) does
_not_ require that read()/write() happens from page aligned memory
areas, and/or be of page-size multiples in size.
This needs at least wording in open(2) man-page, possibly code
changes in the kernel to support alike behaviour.
/Matti Aarnio
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
@ 2003-06-12 11:24 ` Christoph Hellwig
2003-06-12 13:17 ` Andries Brouwer
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2003-06-12 11:24 UTC (permalink / raw)
To: Matti Aarnio; +Cc: linux-kernel
On Thu, Jun 12, 2003 at 02:14:37PM +0300, Matti Aarnio wrote:
> I have been debugging long and hard a thing where IO is done
> with O_DIRECT flag applied to open(2).
>
> Unlike Linux, FreeBSD (where this flag originates, apparently)
O_DIRECT comes from IRIX.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
2003-06-12 11:24 ` Christoph Hellwig
@ 2003-06-12 13:17 ` Andries Brouwer
2003-06-12 14:58 ` Dave Jones
2003-06-12 23:12 ` Rob van Nieuwkerk
1 sibling, 2 replies; 13+ messages in thread
From: Andries Brouwer @ 2003-06-12 13:17 UTC (permalink / raw)
To: Matti Aarnio; +Cc: linux-kernel
On Thu, Jun 12, 2003 at 02:14:37PM +0300, Matti Aarnio wrote:
> I have been debugging long and hard a thing where IO is done
> with O_DIRECT flag applied to open(2).
>
> Unlike Linux, FreeBSD (where this flag originates, apparently) does
> _not_ require that read()/write() happens from page aligned memory
> areas, and/or be of page-size multiples in size.
>
> This needs at least wording in open(2) man-page
Ha Matti, I was going to suggest you to send a patch to the man page
maintainer, but maybe the wording you ask for is there already and
you just have some outdated version of the manpages?
Andries
O_DIRECT
Try to minimize cache effects of the I/O to and
from this file. In general this will degrade per-
formance, but it is useful in special situations,
such as when applications do their own caching.
File I/O is done directly to/from user space
buffers. The I/O is synchronous, i.e., at the com-
pletion of the read(2) or write(2) system call,
data is guaranteed to have been transferred.
Transfer sizes, and the alignment of user buffer
and file offset must all be multiples of the logi-
cal block size of the file system.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 13:17 ` Andries Brouwer
@ 2003-06-12 14:58 ` Dave Jones
2003-06-12 15:09 ` Matti Aarnio
2003-06-12 23:12 ` Rob van Nieuwkerk
1 sibling, 1 reply; 13+ messages in thread
From: Dave Jones @ 2003-06-12 14:58 UTC (permalink / raw)
To: Andries Brouwer; +Cc: Matti Aarnio, linux-kernel
On Thu, Jun 12, 2003 at 03:17:04PM +0200, Andries Brouwer wrote:
> Transfer sizes, and the alignment of user buffer
> and file offset must all be multiples of the logi-
> cal block size of the file system.
Just to confirm something that I wrote in the post-halloween-2.5 doc,
that doesn't tally with this..
- The size and alignment of O_DIRECT file IO requests now matches that
of the device, not the filesystem. Typically this means that
you can perform O_DIRECT IO with 512-byte granularity rather than 4k.
Is this a case of the man pages not following 2.5 yet, or is this
incorrect ?
Dave
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 14:58 ` Dave Jones
@ 2003-06-12 15:09 ` Matti Aarnio
2003-06-12 23:14 ` Nuno Silva
2003-06-13 21:05 ` Andries Brouwer
0 siblings, 2 replies; 13+ messages in thread
From: Matti Aarnio @ 2003-06-12 15:09 UTC (permalink / raw)
To: Dave Jones, Andries Brouwer, Matti Aarnio, linux-kernel
On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:
> > Transfer sizes, and the alignment of user buffer
> > and file offset must all be multiples of the logi-
> > cal block size of the file system.
>
> Just to confirm something that I wrote in the post-halloween-2.5 doc,
> that doesn't tally with this..
>
> - The size and alignment of O_DIRECT file IO requests now matches that
> of the device, not the filesystem. Typically this means that
> you can perform O_DIRECT IO with 512-byte granularity rather than 4k.
>
> Is this a case of the man pages not following 2.5 yet, or is this
> incorrect ?
I think of three things:
- 2.4 defines rules in most confusing manner
- 2.5 continues that
- We need more complete IRIX's O_DIRECT API:
from open(2):
O_DIRECT
If set, all reads and writes on the resulting file descriptor will
be performed directly to or from the user program buffer, provided
appropriate size and alignment restrictions are met. Refer to the
F_SETFL and F_DIOINFO commands in the fcntl(2) manual entry for
information about how to determine the alignment constraints.
O_DIRECT is a Silicon Graphics extension and is only supported on
local EFS and XFS file systems, and remote BDS file systems.
from fcntl(2):
F_SETFL Set file status flags to the third argument, ....
Flags not understood for a particular descriptor are silently
ignored except for FDIRECT. FDIRECT will return EINVAL if used
on other than an EFS, XFS or BDS file system file.
F_DIOINFO Get information required to perform direct I/O on the specified
fildes. Direct I/O is performed directly to and from a user's
data buffer. Since the kernels buffer cache is no longer
between the two, the user's data buffer must conform to the
same type of constraints as required for accessing a raw disk
partition. The third argument, arg, points to a data type
struct dioattr which is defined in the <fcntl.h> header file
and contains the following members: d_mem is the memory
alignment requirement of the user's data buffer. d_miniosz
specifies block size, minimum I/O request size, and I/O
alignment. Ths size of all I/O requests must be a multiple of
this amount and the value of the seek pointer at the time of
the I/O request must also be an integer multiple of this
amount. d_maxiosz is the maximum I/O request size which can be
performed on the fildes. If an I/O request does not meet these
constraints, the read(2) or write(2) will return with EINVAL.
All I/O requests are kept consistent with any data brought into
the cache with an access through a non-direct I/O file
descriptor. See also F_SETFL above and open(2).
> Dave
/Matti Aarnio
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 13:17 ` Andries Brouwer
2003-06-12 14:58 ` Dave Jones
@ 2003-06-12 23:12 ` Rob van Nieuwkerk
2003-06-13 7:47 ` Arjan van de Ven
1 sibling, 1 reply; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-12 23:12 UTC (permalink / raw)
To: Andries Brouwer; +Cc: Alan Cox, Arjan van de Ven, robn, linux-kernel
Andries Brouwer wrote:
> O_DIRECT
> Try to minimize cache effects of the I/O to and
> from this file. In general this will degrade per-
> formance, but it is useful in special situations,
> such as when applications do their own caching.
> File I/O is done directly to/from user space
> buffers. The I/O is synchronous, i.e., at the com-
> pletion of the read(2) or write(2) system call,
> data is guaranteed to have been transferred.
> Transfer sizes, and the alignment of user buffer
> and file offset must all be multiples of the logi-
> cal block size of the file system.
FYI:
It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
something has changed so that my application needs a O_SYNC too besides
the O_DIRECT to make sure that writes will be synchronous. If I leave
the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
seconds after the write() was done.
Haven't tested with a vanilla 2.4.* kernel yet but will try.
(All modern 2.4.2? kernels I tried will hang for > 30s during boot while
probing the CompactFlash and are because of that kind of useless: my
application needs a 5s boot-time ..)
greetings,
Rob van Nieuwkerk
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 15:09 ` Matti Aarnio
@ 2003-06-12 23:14 ` Nuno Silva
2003-06-13 21:05 ` Andries Brouwer
1 sibling, 0 replies; 13+ messages in thread
From: Nuno Silva @ 2003-06-12 23:14 UTC (permalink / raw)
To: Matti Aarnio; +Cc: Dave Jones, Andries Brouwer, linux-kernel
Hi!
OARS, anybody knows a patch that implements O_DIRECT in 2.4 the same way
that it's implemented in 2.5?
2.5's O_DIRECT is much less restrictive than 2.4's. OTOH 2.5 is still
not recommended for production use. Any way of having the best of both
worlds? :)
Thanks,
nuno Silva
Matti Aarnio wrote:
> On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:
>
>> > Transfer sizes, and the alignment of user buffer
>> > and file offset must all be multiples of the logi-
>> > cal block size of the file system.
>>
>>Just to confirm something that I wrote in the post-halloween-2.5 doc,
>>that doesn't tally with this..
>>
>>- The size and alignment of O_DIRECT file IO requests now matches that
>> of the device, not the filesystem. Typically this means that
>> you can perform O_DIRECT IO with 512-byte granularity rather than 4k.
>>
>>Is this a case of the man pages not following 2.5 yet, or is this
>>incorrect ?
>
>
> I think of three things:
> - 2.4 defines rules in most confusing manner
> - 2.5 continues that
> - We need more complete IRIX's O_DIRECT API:
>
> from open(2):
> O_DIRECT
> If set, all reads and writes on the resulting file descriptor will
> be performed directly to or from the user program buffer, provided
> appropriate size and alignment restrictions are met. Refer to the
> F_SETFL and F_DIOINFO commands in the fcntl(2) manual entry for
> information about how to determine the alignment constraints.
> O_DIRECT is a Silicon Graphics extension and is only supported on
> local EFS and XFS file systems, and remote BDS file systems.
>
>
> from fcntl(2):
> F_SETFL Set file status flags to the third argument, ....
>
> Flags not understood for a particular descriptor are silently
> ignored except for FDIRECT. FDIRECT will return EINVAL if used
> on other than an EFS, XFS or BDS file system file.
>
> F_DIOINFO Get information required to perform direct I/O on the specified
> fildes. Direct I/O is performed directly to and from a user's
> data buffer. Since the kernels buffer cache is no longer
> between the two, the user's data buffer must conform to the
> same type of constraints as required for accessing a raw disk
> partition. The third argument, arg, points to a data type
> struct dioattr which is defined in the <fcntl.h> header file
> and contains the following members: d_mem is the memory
> alignment requirement of the user's data buffer. d_miniosz
> specifies block size, minimum I/O request size, and I/O
> alignment. Ths size of all I/O requests must be a multiple of
> this amount and the value of the seek pointer at the time of
> the I/O request must also be an integer multiple of this
> amount. d_maxiosz is the maximum I/O request size which can be
> performed on the fildes. If an I/O request does not meet these
> constraints, the read(2) or write(2) will return with EINVAL.
> All I/O requests are kept consistent with any data brought into
> the cache with an access through a non-direct I/O file
> descriptor. See also F_SETFL above and open(2).
>
>
>> Dave
>
>
> /Matti Aarnio
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 23:12 ` Rob van Nieuwkerk
@ 2003-06-13 7:47 ` Arjan van de Ven
2003-06-13 8:27 ` Rob van Nieuwkerk
0 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2003-06-13 7:47 UTC (permalink / raw)
To: Rob van Nieuwkerk
Cc: Andries Brouwer, Alan Cox, Arjan van de Ven, linux-kernel
On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> FYI:
> It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> something has changed so that my application needs a O_SYNC too besides
> the O_DIRECT to make sure that writes will be synchronous. If I leave
> the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> seconds after the write() was done.
O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
honor it
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-13 7:47 ` Arjan van de Ven
@ 2003-06-13 8:27 ` Rob van Nieuwkerk
2003-06-13 8:28 ` Arjan van de Ven
0 siblings, 1 reply; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-13 8:27 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Rob van Nieuwkerk, Andries Brouwer, Alan Cox, linux-kernel
Arjan van de Ven wrote:
> On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > FYI:
> > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > something has changed so that my application needs a O_SYNC too besides
> > the O_DIRECT to make sure that writes will be synchronous. If I leave
> > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > seconds after the write() was done.
>
> O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> honor it
Hi Arjan,
Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?
greetings,
Rob van Nieuwkerk
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-13 8:27 ` Rob van Nieuwkerk
@ 2003-06-13 8:28 ` Arjan van de Ven
2003-06-13 9:02 ` Rob van Nieuwkerk
0 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2003-06-13 8:28 UTC (permalink / raw)
To: Rob van Nieuwkerk
Cc: Arjan van de Ven, Andries Brouwer, Alan Cox, linux-kernel
On Fri, Jun 13, 2003 at 10:27:52AM +0200, Rob van Nieuwkerk wrote:
>
> Arjan van de Ven wrote:
> > On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > > FYI:
> > > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > > something has changed so that my application needs a O_SYNC too besides
> > > the O_DIRECT to make sure that writes will be synchronous. If I leave
> > > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > > seconds after the write() was done.
> >
> > O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> > honor it
>
> Hi Arjan,
>
> Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?
yes.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-13 8:28 ` Arjan van de Ven
@ 2003-06-13 9:02 ` Rob van Nieuwkerk
0 siblings, 0 replies; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-13 9:02 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Rob van Nieuwkerk, Andries Brouwer, Alan Cox, linux-kernel
Arjan van de Ven wrote:
> On Fri, Jun 13, 2003 at 10:27:52AM +0200, Rob van Nieuwkerk wrote:
> >
> > Arjan van de Ven wrote:
> > > On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > > > FYI:
> > > > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > > > something has changed so that my application needs a O_SYNC too besides
> > > > the O_DIRECT to make sure that writes will be synchronous. If I leave
> > > > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > > > seconds after the write() was done.
> > >
> > > O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> > > honor it
> >
> > Hi Arjan,
> >
> > Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?
>
> yes.
Hi Arjan,
OK, that would explain why I see an old problem (*) re-appear in my
application that was solved/worked-around by using O_DIRECT when using
2.4.20-18.9.
Just to make sure I understand it correctly, is it like this: ?
"Kernel 2.4.20-18.9 completely ignores the O_DIRECT flag. Not only the
"synchronous writes part" but also you will get read-ahead despite
using O_DIRECT. The 2.4.20-18.9 with O_DIRECT behaviour is similar to
the 2.4.18-27.7.x without O_DIRECT (concerning synchronity of write()
and the number of physical media reads & writes)."
Just curious: what is the reason for ignoring O_DIRECT in 2.4.20-18.9 ?
Interactivity behaviour ?
Greetings,
Rob van Nieuwkerk
(*) I have an application that runs from CompactFlash that uses a Philips
webcam (pwc driver). It turned out that too much CompactFlash access
(in PIO mode) causes the camera(driver?) to stall and never wake up
again :-( I only log 2048 byte records to a raw partition. With
O_DIRECT and proper data aligning I could reduce the CF-access to
exactly 4 512 byte sector writes. This was enough to never trigger
the problem.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
2003-06-12 15:09 ` Matti Aarnio
2003-06-12 23:14 ` Nuno Silva
@ 2003-06-13 21:05 ` Andries Brouwer
1 sibling, 0 replies; 13+ messages in thread
From: Andries Brouwer @ 2003-06-13 21:05 UTC (permalink / raw)
To: Matti Aarnio; +Cc: Dave Jones, Andries Brouwer, linux-kernel
On Thu, Jun 12, 2003 at 06:09:09PM +0300, Matti Aarnio wrote:
> On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:
[all clipped - later]
I was reminded of the following quote:
"The thing that has always disturbed me about O_DIRECT is that the whole
interface is just stupid, and was probably designed by a deranged monkey
on some serious mind-controlling substances."
I'll add that to the BUGS section.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
[not found] <20030612111437.GE28900@mea-ext.zmailer.org.suse.lists.linux.kernel>
@ 2003-06-12 11:26 ` Andi Kleen
0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2003-06-12 11:26 UTC (permalink / raw)
To: Matti Aarnio; +Cc: linux-kernel
Matti Aarnio <matti.aarnio@zmailer.org> writes:
> Unlike Linux, FreeBSD (where this flag originates, apparently) does
It doesn't. It originates from Irix. AFAIK Irix has similar restrictions.
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-06-13 20:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
2003-06-12 11:24 ` Christoph Hellwig
2003-06-12 13:17 ` Andries Brouwer
2003-06-12 14:58 ` Dave Jones
2003-06-12 15:09 ` Matti Aarnio
2003-06-12 23:14 ` Nuno Silva
2003-06-13 21:05 ` Andries Brouwer
2003-06-12 23:12 ` Rob van Nieuwkerk
2003-06-13 7:47 ` Arjan van de Ven
2003-06-13 8:27 ` Rob van Nieuwkerk
2003-06-13 8:28 ` Arjan van de Ven
2003-06-13 9:02 ` Rob van Nieuwkerk
[not found] <20030612111437.GE28900@mea-ext.zmailer.org.suse.lists.linux.kernel>
2003-06-12 11:26 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).