linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
@ 2003-06-12 11:14 Matti Aarnio
  2003-06-12 11:24 ` Christoph Hellwig
  2003-06-12 13:17 ` Andries Brouwer
  0 siblings, 2 replies; 13+ messages in thread
From: Matti Aarnio @ 2003-06-12 11:14 UTC (permalink / raw)
  To: linux-kernel

I have been debugging long and hard a thing where IO is done
with O_DIRECT flag applied to open(2).

Unlike Linux, FreeBSD (where this flag originates, apparently) does
_not_ require that read()/write() happens from page aligned memory
areas, and/or be of page-size multiples in size.

This needs at least wording in  open(2) man-page, possibly code
changes in the kernel to support alike behaviour.

/Matti Aarnio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
@ 2003-06-12 11:24 ` Christoph Hellwig
  2003-06-12 13:17 ` Andries Brouwer
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2003-06-12 11:24 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: linux-kernel

On Thu, Jun 12, 2003 at 02:14:37PM +0300, Matti Aarnio wrote:
> I have been debugging long and hard a thing where IO is done
> with O_DIRECT flag applied to open(2).
> 
> Unlike Linux, FreeBSD (where this flag originates, apparently)

O_DIRECT comes from IRIX.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
  2003-06-12 11:24 ` Christoph Hellwig
@ 2003-06-12 13:17 ` Andries Brouwer
  2003-06-12 14:58   ` Dave Jones
  2003-06-12 23:12   ` Rob van Nieuwkerk
  1 sibling, 2 replies; 13+ messages in thread
From: Andries Brouwer @ 2003-06-12 13:17 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: linux-kernel

On Thu, Jun 12, 2003 at 02:14:37PM +0300, Matti Aarnio wrote:

> I have been debugging long and hard a thing where IO is done
> with O_DIRECT flag applied to open(2).
> 
> Unlike Linux, FreeBSD (where this flag originates, apparently) does
> _not_ require that read()/write() happens from page aligned memory
> areas, and/or be of page-size multiples in size.
> 
> This needs at least wording in  open(2) man-page

Ha Matti, I was going to suggest you to send a patch to the man page
maintainer, but maybe the wording you ask for is there already and
you just have some outdated version of the manpages?

Andries

       O_DIRECT
              Try to minimize cache effects of  the  I/O  to  and
              from  this file.  In general this will degrade per-
              formance, but it is useful in  special  situations,
              such  as  when  applications  do their own caching.
              File  I/O  is  done  directly  to/from  user  space
              buffers.  The I/O is synchronous, i.e., at the com-
              pletion of the read(2)  or  write(2)  system  call,
              data   is  guaranteed  to  have  been  transferred.
              Transfer sizes, and the alignment  of  user  buffer
              and  file offset must all be multiples of the logi-
              cal block size of the file system.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 13:17 ` Andries Brouwer
@ 2003-06-12 14:58   ` Dave Jones
  2003-06-12 15:09     ` Matti Aarnio
  2003-06-12 23:12   ` Rob van Nieuwkerk
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Jones @ 2003-06-12 14:58 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Matti Aarnio, linux-kernel

On Thu, Jun 12, 2003 at 03:17:04PM +0200, Andries Brouwer wrote:
 >               Transfer sizes, and the alignment  of  user  buffer
 >               and  file offset must all be multiples of the logi-
 >               cal block size of the file system.

Just to confirm something that I wrote in the post-halloween-2.5 doc,
that doesn't tally with this..

- The size and alignment of O_DIRECT file IO requests now matches that
  of the device, not the filesystem.  Typically this means that
  you can perform O_DIRECT IO with 512-byte granularity rather than 4k.

Is this a case of the man pages not following 2.5 yet, or is this
incorrect ?

		Dave


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 14:58   ` Dave Jones
@ 2003-06-12 15:09     ` Matti Aarnio
  2003-06-12 23:14       ` Nuno Silva
  2003-06-13 21:05       ` Andries Brouwer
  0 siblings, 2 replies; 13+ messages in thread
From: Matti Aarnio @ 2003-06-12 15:09 UTC (permalink / raw)
  To: Dave Jones, Andries Brouwer, Matti Aarnio, linux-kernel

On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:
>  >               Transfer sizes, and the alignment  of  user  buffer
>  >               and  file offset must all be multiples of the logi-
>  >               cal block size of the file system.
> 
> Just to confirm something that I wrote in the post-halloween-2.5 doc,
> that doesn't tally with this..
> 
> - The size and alignment of O_DIRECT file IO requests now matches that
>   of the device, not the filesystem.  Typically this means that
>   you can perform O_DIRECT IO with 512-byte granularity rather than 4k.
> 
> Is this a case of the man pages not following 2.5 yet, or is this
> incorrect ?

I think of three things:
   - 2.4 defines rules in most confusing manner
   - 2.5 continues that
   - We need more complete IRIX's O_DIRECT API:

from open(2):
     O_DIRECT
         If set, all reads and writes on the resulting file descriptor will
         be performed directly to or from the user program buffer, provided
         appropriate size and alignment restrictions are met.  Refer to the
         F_SETFL and F_DIOINFO commands in the fcntl(2) manual entry for
         information about how to determine the alignment constraints.
         O_DIRECT is a Silicon Graphics extension and is only supported on
         local EFS and XFS file systems, and remote BDS file systems.


from fcntl(2):
     F_SETFL   Set file status flags to the third argument, ....

            Flags not understood for a particular descriptor are silently
            ignored except for FDIRECT. FDIRECT will return EINVAL if used
            on other than an EFS, XFS or BDS file system file.

     F_DIOINFO Get information required to perform direct I/O on the specified
            fildes.  Direct I/O is performed directly to and from a user's
            data buffer. Since the kernels buffer cache is no longer
            between the two, the user's data buffer must conform to the
            same type of constraints as required for accessing a raw disk
            partition.  The third argument, arg, points to a data type
            struct dioattr which is defined in the <fcntl.h> header file
            and contains the following members: d_mem is the memory
            alignment requirement of the user's data buffer. d_miniosz
            specifies block size, minimum I/O request size, and I/O
            alignment.  Ths size of all I/O requests must be a multiple of
            this amount and the value of the seek pointer at the time of
            the I/O request must also be an integer multiple of this
            amount.  d_maxiosz is the maximum I/O request size which can be
            performed on the fildes.  If an I/O request does not meet these
            constraints, the read(2) or write(2) will return with EINVAL.
            All I/O requests are kept consistent with any data brought into
            the cache with an access through a non-direct I/O file
            descriptor.  See also F_SETFL above and open(2).

> 		Dave

/Matti Aarnio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 13:17 ` Andries Brouwer
  2003-06-12 14:58   ` Dave Jones
@ 2003-06-12 23:12   ` Rob van Nieuwkerk
  2003-06-13  7:47     ` Arjan van de Ven
  1 sibling, 1 reply; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-12 23:12 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Alan Cox, Arjan van de Ven, robn, linux-kernel


Andries Brouwer wrote:
>        O_DIRECT
>               Try to minimize cache effects of  the  I/O  to  and
>               from  this file.  In general this will degrade per-
>               formance, but it is useful in  special  situations,
>               such  as  when  applications  do their own caching.
>               File  I/O  is  done  directly  to/from  user  space
>               buffers.  The I/O is synchronous, i.e., at the com-
>               pletion of the read(2)  or  write(2)  system  call,
>               data   is  guaranteed  to  have  been  transferred.
>               Transfer sizes, and the alignment  of  user  buffer
>               and  file offset must all be multiples of the logi-
>               cal block size of the file system.

FYI:
It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
something has changed so that my application needs a O_SYNC too besides
the O_DIRECT to make sure that writes will be synchronous.  If I leave
the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
seconds after the write() was done.

Haven't tested with a vanilla 2.4.* kernel yet but will try.
(All modern 2.4.2? kernels I tried will hang for > 30s during boot while
probing the CompactFlash and are because of that kind of useless: my
application needs a 5s boot-time ..)

	greetings,
	Rob van Nieuwkerk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 15:09     ` Matti Aarnio
@ 2003-06-12 23:14       ` Nuno Silva
  2003-06-13 21:05       ` Andries Brouwer
  1 sibling, 0 replies; 13+ messages in thread
From: Nuno Silva @ 2003-06-12 23:14 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: Dave Jones, Andries Brouwer, linux-kernel

Hi!

OARS, anybody knows a patch that implements O_DIRECT in 2.4 the same way 
that it's implemented in 2.5?

2.5's O_DIRECT is much less restrictive than 2.4's. OTOH 2.5 is still 
not recommended for production use. Any way of having the best of both 
worlds? :)

Thanks,
nuno Silva

Matti Aarnio wrote:
> On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:
> 
>> >               Transfer sizes, and the alignment  of  user  buffer
>> >               and  file offset must all be multiples of the logi-
>> >               cal block size of the file system.
>>
>>Just to confirm something that I wrote in the post-halloween-2.5 doc,
>>that doesn't tally with this..
>>
>>- The size and alignment of O_DIRECT file IO requests now matches that
>>  of the device, not the filesystem.  Typically this means that
>>  you can perform O_DIRECT IO with 512-byte granularity rather than 4k.
>>
>>Is this a case of the man pages not following 2.5 yet, or is this
>>incorrect ?
> 
> 
> I think of three things:
>    - 2.4 defines rules in most confusing manner
>    - 2.5 continues that
>    - We need more complete IRIX's O_DIRECT API:
> 
> from open(2):
>      O_DIRECT
>          If set, all reads and writes on the resulting file descriptor will
>          be performed directly to or from the user program buffer, provided
>          appropriate size and alignment restrictions are met.  Refer to the
>          F_SETFL and F_DIOINFO commands in the fcntl(2) manual entry for
>          information about how to determine the alignment constraints.
>          O_DIRECT is a Silicon Graphics extension and is only supported on
>          local EFS and XFS file systems, and remote BDS file systems.
> 
> 
> from fcntl(2):
>      F_SETFL   Set file status flags to the third argument, ....
> 
>             Flags not understood for a particular descriptor are silently
>             ignored except for FDIRECT. FDIRECT will return EINVAL if used
>             on other than an EFS, XFS or BDS file system file.
> 
>      F_DIOINFO Get information required to perform direct I/O on the specified
>             fildes.  Direct I/O is performed directly to and from a user's
>             data buffer. Since the kernels buffer cache is no longer
>             between the two, the user's data buffer must conform to the
>             same type of constraints as required for accessing a raw disk
>             partition.  The third argument, arg, points to a data type
>             struct dioattr which is defined in the <fcntl.h> header file
>             and contains the following members: d_mem is the memory
>             alignment requirement of the user's data buffer. d_miniosz
>             specifies block size, minimum I/O request size, and I/O
>             alignment.  Ths size of all I/O requests must be a multiple of
>             this amount and the value of the seek pointer at the time of
>             the I/O request must also be an integer multiple of this
>             amount.  d_maxiosz is the maximum I/O request size which can be
>             performed on the fildes.  If an I/O request does not meet these
>             constraints, the read(2) or write(2) will return with EINVAL.
>             All I/O requests are kept consistent with any data brought into
>             the cache with an access through a non-direct I/O file
>             descriptor.  See also F_SETFL above and open(2).
> 
> 
>>		Dave
> 
> 
> /Matti Aarnio
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 23:12   ` Rob van Nieuwkerk
@ 2003-06-13  7:47     ` Arjan van de Ven
  2003-06-13  8:27       ` Rob van Nieuwkerk
  0 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2003-06-13  7:47 UTC (permalink / raw)
  To: Rob van Nieuwkerk
  Cc: Andries Brouwer, Alan Cox, Arjan van de Ven, linux-kernel

On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> FYI:
> It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> something has changed so that my application needs a O_SYNC too besides
> the O_DIRECT to make sure that writes will be synchronous.  If I leave
> the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> seconds after the write() was done.

O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
honor it

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-13  7:47     ` Arjan van de Ven
@ 2003-06-13  8:27       ` Rob van Nieuwkerk
  2003-06-13  8:28         ` Arjan van de Ven
  0 siblings, 1 reply; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-13  8:27 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Rob van Nieuwkerk, Andries Brouwer, Alan Cox, linux-kernel


Arjan van de Ven wrote:
> On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > FYI:
> > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > something has changed so that my application needs a O_SYNC too besides
> > the O_DIRECT to make sure that writes will be synchronous.  If I leave
> > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > seconds after the write() was done.
> 
> O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> honor it

Hi Arjan,

Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?

	greetings,
	Rob van Nieuwkerk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-13  8:27       ` Rob van Nieuwkerk
@ 2003-06-13  8:28         ` Arjan van de Ven
  2003-06-13  9:02           ` Rob van Nieuwkerk
  0 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2003-06-13  8:28 UTC (permalink / raw)
  To: Rob van Nieuwkerk
  Cc: Arjan van de Ven, Andries Brouwer, Alan Cox, linux-kernel

On Fri, Jun 13, 2003 at 10:27:52AM +0200, Rob van Nieuwkerk wrote:
> 
> Arjan van de Ven wrote:
> > On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > > FYI:
> > > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > > something has changed so that my application needs a O_SYNC too besides
> > > the O_DIRECT to make sure that writes will be synchronous.  If I leave
> > > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > > seconds after the write() was done.
> > 
> > O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> > honor it
> 
> Hi Arjan,
> 
> Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?

yes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-13  8:28         ` Arjan van de Ven
@ 2003-06-13  9:02           ` Rob van Nieuwkerk
  0 siblings, 0 replies; 13+ messages in thread
From: Rob van Nieuwkerk @ 2003-06-13  9:02 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Rob van Nieuwkerk, Andries Brouwer, Alan Cox, linux-kernel


Arjan van de Ven wrote:
> On Fri, Jun 13, 2003 at 10:27:52AM +0200, Rob van Nieuwkerk wrote:
> > 
> > Arjan van de Ven wrote:
> > > On Fri, Jun 13, 2003 at 01:12:57AM +0200, Rob van Nieuwkerk wrote:
> > > > FYI:
> > > > It appears that somewhere between RH kernels 2.4.18-27.7.x and 2.4.20-18.9
> > > > something has changed so that my application needs a O_SYNC too besides
> > > > the O_DIRECT to make sure that writes will be synchronous.  If I leave
> > > > the O_SYNC out with 2.4.20-18.9 the write will happen physically 35
> > > > seconds after the write() was done.
> > > 
> > > O_DIRECT is nothing but a hint and the 2.4.20-18.9 kernel decides to not
> > > honor it
> > 
> > Hi Arjan,
> > 
> > Do you mean that the 2.4.20-18.9 kernel always ignores the O_DIRECT flag ?
> 
> yes.

Hi Arjan,

OK, that would explain why I see an old problem (*) re-appear in my 
application that was solved/worked-around by using O_DIRECT when using
2.4.20-18.9.

Just to make sure I understand it correctly, is it like this: ?
   "Kernel 2.4.20-18.9 completely ignores the O_DIRECT flag.  Not only the
    "synchronous writes part" but also you will get read-ahead despite
    using O_DIRECT.  The 2.4.20-18.9 with O_DIRECT behaviour is similar to
    the 2.4.18-27.7.x without O_DIRECT (concerning synchronity of write()
    and the number of physical media reads & writes)."

Just curious: what is the reason for ignoring O_DIRECT in 2.4.20-18.9 ?
Interactivity behaviour ?

	Greetings,
	Rob van Nieuwkerk


(*) I have an application that runs from CompactFlash that uses a Philips
    webcam (pwc driver).  It turned out that too much CompactFlash access
    (in PIO mode) causes the camera(driver?) to stall and never wake up
     again :-(  I only log 2048 byte records to a raw partition.  With
    O_DIRECT and proper data aligning I could reduce the CF-access to
    exactly 4 512 byte sector writes.  This was enough to never trigger
    the problem.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
  2003-06-12 15:09     ` Matti Aarnio
  2003-06-12 23:14       ` Nuno Silva
@ 2003-06-13 21:05       ` Andries Brouwer
  1 sibling, 0 replies; 13+ messages in thread
From: Andries Brouwer @ 2003-06-13 21:05 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: Dave Jones, Andries Brouwer, linux-kernel

On Thu, Jun 12, 2003 at 06:09:09PM +0300, Matti Aarnio wrote:
> On Thu, Jun 12, 2003 at 03:58:14PM +0100, Dave Jones wrote:

[all clipped - later]

I was reminded of the following quote:

  "The thing that has always disturbed me about O_DIRECT is that the whole
  interface is just stupid, and was probably designed by a deranged monkey
  on some serious mind-controlling substances."

I'll add that to the BUGS section.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: open(.. O_DIRECT ..) difference in between Linux and FreeBSD ..
       [not found] <20030612111437.GE28900@mea-ext.zmailer.org.suse.lists.linux.kernel>
@ 2003-06-12 11:26 ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2003-06-12 11:26 UTC (permalink / raw)
  To: Matti Aarnio; +Cc: linux-kernel

Matti Aarnio <matti.aarnio@zmailer.org> writes:

> Unlike Linux, FreeBSD (where this flag originates, apparently) does

It doesn't. It originates from Irix. AFAIK Irix has similar restrictions.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-06-13 20:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-12 11:14 open(.. O_DIRECT ..) difference in between Linux and FreeBSD Matti Aarnio
2003-06-12 11:24 ` Christoph Hellwig
2003-06-12 13:17 ` Andries Brouwer
2003-06-12 14:58   ` Dave Jones
2003-06-12 15:09     ` Matti Aarnio
2003-06-12 23:14       ` Nuno Silva
2003-06-13 21:05       ` Andries Brouwer
2003-06-12 23:12   ` Rob van Nieuwkerk
2003-06-13  7:47     ` Arjan van de Ven
2003-06-13  8:27       ` Rob van Nieuwkerk
2003-06-13  8:28         ` Arjan van de Ven
2003-06-13  9:02           ` Rob van Nieuwkerk
     [not found] <20030612111437.GE28900@mea-ext.zmailer.org.suse.lists.linux.kernel>
2003-06-12 11:26 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).