Allow O_SYNC to be set by fcntl(F_SETFL)
diff mbox series

Message ID 4D6824A4.6030009@nec-labs.com
State New, archived
Headers show
Series
  • Allow O_SYNC to be set by fcntl(F_SETFL)
Related show

Commit Message

Steve Rago Feb. 25, 2011, 9:52 p.m. UTC
This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to 
fixing it).  The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag. 
  Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.


Signed-off-by: Steve Rago <sar@nec-labs.com>
---
  fs/fcntl.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

--
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Andrew Morton April 7, 2011, 9:37 p.m. UTC | #1
(did I ever reply to this?  I meant to ;))

On Fri, 25 Feb 2011 16:52:36 -0500
Steve Rago <sar@nec-labs.com> wrote:

> This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to 
> fixing it).  The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag. 
>   Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.
> 
> 
> Signed-off-by: Steve Rago <sar@nec-labs.com>
> ---
>   fs/fcntl.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index cb10261..afd233a 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -143,7 +143,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
>          return ret;
>   }
> 
> -#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
> +#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)

Does any standard say that we should do this? 
http://pubs.opengroup.org/onlinepubs/007908799/xsh/fcntl.html does, I
guess.

I worry a bit that this change will surprise people.  For example, this
person:
http://koders.com/c/fidA34D8D5EE9AA5D0AB0F3C604678E2E935E5B0246.aspx?s=dupa
is going to wonder why his app suddenly got a lot slower!

Sadly, the kernel silently ignores invalid set bits in `arg', so we
have no reliable way of signaling to the user that our behaviour here
changed.

I wonder if we should sync the file when someone sets O_SYNC this way. 
If we don't then there is a period during which we have an fd which has
O_SYNC set, but it has pending unwritten data.  An O_SYNC fd should
never be in such a state!

Ho hum.  yes, I guess we should apply the patch.  But it would have
been better to not have screwed this up in the first place!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Christoph Hellwig April 8, 2011, 3:14 p.m. UTC | #2
I actually prototypes this patch independetly a while ago, and in
addition to the data writeout when removing O_SYNC there are the
following caveats:

 - O_SYNC is not actually one flag, but two: O_DSYNC and __O_SYNC.
   setfl() needs to make sure __O_SYNC cannot be in f_flags without
   O_DSYNC also beeing present.
 - we need to audit all filesystems that they don't do stupid things
   when the O_SYNC flags appear or disappear during a write, that
   is make sure it is checked in just one place.  The generic write
   code is fine in that respect, but I didn't go through all filesystems
   to verify it yet.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Steve Rago April 8, 2011, 5:39 p.m. UTC | #3
On 04/07/2011 05:37 PM, Andrew Morton wrote:
> (did I ever reply to this?  I meant to ;))
>
> On Fri, 25 Feb 2011 16:52:36 -0500
> Steve Rago<sar@nec-labs.com>  wrote:
>
>> This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to
>> fixing it).  The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag.
>>    Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.
>>
>>
>> Signed-off-by: Steve Rago<sar@nec-labs.com>
>> ---
>>    fs/fcntl.c |    2 +-
>>    1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/fcntl.c b/fs/fcntl.c
>> index cb10261..afd233a 100644
>> --- a/fs/fcntl.c
>> +++ b/fs/fcntl.c
>> @@ -143,7 +143,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
>>           return ret;
>>    }
>>
>> -#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
>> +#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)
>
> Does any standard say that we should do this?
> http://pubs.opengroup.org/onlinepubs/007908799/xsh/fcntl.html does, I
> guess.

It's required by the Single UNIX Specification (POSIX.1).  All other major platforms allow it to be set via fcntl.  See 
bugzilla.kernel.org bug ID #5994.

>
> I worry a bit that this change will surprise people.  For example, this
> person:
> http://koders.com/c/fidA34D8D5EE9AA5D0AB0F3C604678E2E935E5B0246.aspx?s=dupa
> is going to wonder why his app suddenly got a lot slower!
>
> Sadly, the kernel silently ignores invalid set bits in `arg', so we
> have no reliable way of signaling to the user that our behaviour here
> changed.
>
> I wonder if we should sync the file when someone sets O_SYNC this way.
> If we don't then there is a period during which we have an fd which has
> O_SYNC set, but it has pending unwritten data.  An O_SYNC fd should
> never be in such a state!

Why not?  If I write something in non-synchronous mode, then change the file descriptor to synchronous mode, I should 
not make any assumptions about what was written prior to this point.  If I care that much, I'll call fsync.  All that 
matters is that the operating system honors the contract as specified by the system call API.

>
> Ho hum.  yes, I guess we should apply the patch.  But it would have
> been better to not have screwed this up in the first place!
>
>

Agreed.  Thanks for not letting this fall through the cracks.

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Andrew Morton April 8, 2011, 5:56 p.m. UTC | #4
On Fri, 08 Apr 2011 13:39:16 -0400
Steve Rago <sar@nec-labs.com> wrote:

> > I wonder if we should sync the file when someone sets O_SYNC this way.
> > If we don't then there is a period during which we have an fd which has
> > O_SYNC set, but it has pending unwritten data.  An O_SYNC fd should
> > never be in such a state!
> 
> Why not?

Because it's inconsistent.  An O_SYNC fd never has outstanding writeout. 
Except for in this one new and special time window between a setfl()
and the next write().

It's not a big deal, but it's somewhat ugly and merits thinking about.

>  If I write something in non-synchronous mode, then change the file descriptor to synchronous mode, I should 
> not make any assumptions about what was written prior to this point.  If I care that much, I'll call fsync.

Well.  You can call fsync() after every write() too.

>  All that 
> matters is that the operating system honors the contract as specified by the system call API.

There's a lot more to it than that.  Things like
quality-of-implementation and principle-of-least-surprise.  We used to
have a particular relationship between an O_SYNC fd and the state of
the inode which it represents.  With this patch, that relationship no
longer holds.

As I say: not a big deal IMO, but it should be aired and thought about.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Christoph Hellwig April 8, 2011, 9:08 p.m. UTC | #5
On Fri, Apr 08, 2011 at 10:56:02AM -0700, Andrew Morton wrote:
> Because it's inconsistent.  An O_SYNC fd never has outstanding writeout. 
> Except for in this one new and special time window between a setfl()
> and the next write().

It might actually have outstanding writes for as long as it eventually
takes the writeback code to push them out.  O_SYNC only does a range
writeout for the area that was written.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/fs/fcntl.c b/fs/fcntl.c
index cb10261..afd233a 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -143,7 +143,7 @@  SYSCALL_DEFINE1(dup, unsigned int, fildes)
         return ret;
  }

-#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
+#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)

  static int setfl(int fd, struct file * filp, unsigned long arg)
  {