All of lore.kernel.org
 help / color / mirror / Atom feed
* Directory fsync
@ 2011-09-23 15:12 Zhu Han
  2011-09-23 16:33 ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Zhu Han @ 2011-09-23 15:12 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 573 bytes --]

I note below words in the manual of fsync:
       Calling  fsync()  does  not  necessarily  ensure  that  the entry in
the directory containing the file has also reached disk.  For that an
explicit fsync() on a file
       descriptor for the directory is also needed.

I am wondering is directory sync is essential after below steps if I want to
assure the file can be retrieved after system crash?

1) create file A
2) write file A
3) fsync(file A)

--------------------------------> fsync(parent directory) [Is it essential
to make the inode linked to parent directory?]

[-- Attachment #1.2: Type: text/html, Size: 623 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-23 15:12 Directory fsync Zhu Han
@ 2011-09-23 16:33 ` Christoph Hellwig
  2011-09-23 23:09   ` Michael Monnerie
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2011-09-23 16:33 UTC (permalink / raw)
  To: Zhu Han; +Cc: xfs

On Fri, Sep 23, 2011 at 11:12:02PM +0800, Zhu Han wrote:
> I note below words in the manual of fsync:
>        Calling  fsync()  does  not  necessarily  ensure  that  the entry in
> the directory containing the file has also reached disk.  For that an
> explicit fsync() on a file
>        descriptor for the directory is also needed.
> 
> I am wondering is directory sync is essential after below steps if I want to
> assure the file can be retrieved after system crash?
> 
> 1) create file A
> 2) write file A
> 3) fsync(file A)
> 
> --------------------------------> fsync(parent directory) [Is it essential
> to make the inode linked to parent directory?]

As far as standards are concerned it is.  As far as the current XFS
implementation is concerned you don't need it as the file fsync will
also force out all transactions that belong to the create.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-23 16:33 ` Christoph Hellwig
@ 2011-09-23 23:09   ` Michael Monnerie
  2011-09-24  1:20     ` Zhu Han
  2011-09-26  0:28     ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Monnerie @ 2011-09-23 23:09 UTC (permalink / raw)
  To: xfs; +Cc: Christoph Hellwig, Zhu Han


[-- Attachment #1.1: Type: Text/Plain, Size: 1091 bytes --]

On Freitag, 23. September 2011 Christoph Hellwig wrote:
> As far as standards are concerned it is.  As far as the current XFS
> implementation is concerned you don't need it as the file fsync will
> also force out all transactions that belong to the create.

Aren't you giving O_PONIES to the users? ;-)

I understand your description, but we should always tell people to use a 
directory fsync to be sure. Their applications might run on other 
filesystems, or run for 10 years, and maybe XFS's implementation changes 
in between. And maybe in historical kernels even XFS's implementation 
wasn't like it's now?

@schumi: If your application should be able to run in a safe way on 
other filesystems, or other kernel releases, or other unixes, it's best 
to fsync the directory inode too. It's better to use it always, then 
nothing won't break.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-23 23:09   ` Michael Monnerie
@ 2011-09-24  1:20     ` Zhu Han
  2011-10-01 23:20       ` Peter Grandi
  2011-09-26  0:28     ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Zhu Han @ 2011-09-24  1:20 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: Christoph Hellwig, xfs


[-- Attachment #1.1: Type: text/plain, Size: 2110 bytes --]

On Sat, Sep 24, 2011 at 7:09 AM, Michael Monnerie <
michael.monnerie@is.it-management.at> wrote:

> On Freitag, 23. September 2011 Christoph Hellwig wrote:
> > As far as standards are concerned it is.  As far as the current XFS
> > implementation is concerned you don't need it as the file fsync will
> > also force out all transactions that belong to the create.
>
> Aren't you giving O_PONIES to the users? ;-)
>
> I understand your description, but we should always tell people to use a
> directory fsync to be sure. Their applications might run on other
> filesystems, or run for 10 years, and maybe XFS's implementation changes
> in between. And maybe in historical kernels even XFS's implementation
> wasn't like it's now?
>

Thank you all.

I see the importance of following the standard. But I am glad to know the
current implementation of XFS enforce more strict fsync semantic, just as
every application developer wishes.

What I worry is not much applications syncs the directory after new files
are created, even if PostgreSQL[1] and many other NoSQL database.  If the
current implementation forces more strict semantic, it makes our mind much
much more peaceful.

And , not many runtime supports sync of directory, e.g. java ecosystem does
not have such support... So it is very very hard to follow this standard.

For God's sake, the right semantic of fsync should be "The users wants to
assure the file is retrievable after system crash or power failure if fsync
returned successfully".

[1]
http://postgresql.1045698.n5.nabble.com/fsync-reliability-td4330289.html

>
> @schumi: If your application should be able to run in a safe way on
> other filesystems, or other kernel releases, or other unixes, it's best
> to fsync the directory inode too. It's better to use it always, then
> nothing won't break.
>
> --
> mit freundlichen Grüssen,
> Michael Monnerie, Ing. BSc
>
> it-management Internet Services: Protéger
> http://proteger.at [gesprochen: Prot-e-schee]
> Tel: +43 660 / 415 6531
>
> // Haus zu verkaufen: http://zmi.at/langegg/
>

[-- Attachment #1.2: Type: text/html, Size: 2928 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-23 23:09   ` Michael Monnerie
  2011-09-24  1:20     ` Zhu Han
@ 2011-09-26  0:28     ` Dave Chinner
  2011-09-26  0:51       ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2011-09-26  0:28 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: Christoph Hellwig, Zhu Han, xfs

On Sat, Sep 24, 2011 at 01:09:44AM +0200, Michael Monnerie wrote:
> On Freitag, 23. September 2011 Christoph Hellwig wrote:
> > As far as standards are concerned it is.  As far as the current XFS
> > implementation is concerned you don't need it as the file fsync will
> > also force out all transactions that belong to the create.
> 
> Aren't you giving O_PONIES to the users? ;-)
> 
> I understand your description, but we should always tell people to use a 
> directory fsync to be sure. Their applications might run on other 
> filesystems, or run for 10 years, and maybe XFS's implementation changes 
> in between. And maybe in historical kernels even XFS's implementation 
> wasn't like it's now?

XFS's journalling has always behaved this way - *all* transactions
prior to the fsync() triggered log force are guaranteed to be on
disk once the fsync completes. There are no plans to change this
behaviour, either, because we rely on this architectural
characteristic to provide strong ordering of metadata operations in
many places.

All it means is that the directory fsync() is a no-op that only
costs CPU time.

> @schumi: If your application should be able to run in a safe way on 
> other filesystems, or other kernel releases, or other unixes, it's best 
> to fsync the directory inode too. It's better to use it always, then 
> nothing won't break.

*nod*

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-26  0:28     ` Dave Chinner
@ 2011-09-26  0:51       ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2011-09-26  0:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Michael Monnerie, Christoph Hellwig, Zhu Han, xfs

On Mon, Sep 26, 2011 at 10:28:11AM +1000, Dave Chinner wrote:
> All it means is that the directory fsync() is a no-op that only
> costs CPU time.

Currently it also causes a superflous cache flush, but I have a patch
in my QA queue to fix that and reduce the (already tiny) CPU overhead a
bit more.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Directory fsync
  2011-09-24  1:20     ` Zhu Han
@ 2011-10-01 23:20       ` Peter Grandi
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Grandi @ 2011-10-01 23:20 UTC (permalink / raw)
  To: Linux fs XFS

>>> As far as standards are concerned it is.  As far as the
>>> current XFS implementation is concerned you don't need it as
>>> the file fsync will also force out all transactions that
>>> belong to the create.

>> Aren't you giving O_PONIES to the users? ;-) I understand
>> your description, but we should always tell people to use a
>> directory fsync to be sure.

Sometimes users wish unicorns, not just ponies, and sometimes
they really want winged unicorns, not just unicorns...

> I see the importance of following the standard. But I am glad
> to know the current implementation of XFS enforce more strict
> fsync semantic, just as every application developer wishes.

Stricter semantics means potetially more expensive IO and more
complicated kernel implementation with more chances for subtle
bugs.

Unless you are arguing that applications developers demand
O_PONIES and don't care about thsat much application performance
of portability or kernel bug opportunities.

It is a long time since I reminded anyone that the UNIX
filesystem semantics were designed when the whole kernel was
(well) under 64KiB, and that was an interesting constraint.

> What I worry is not much applications syncs the directory
> after new files are created, even if PostgreSQL[1] and many
> other NoSQL database.  If the current implementation forces
> more strict semantic, it makes our mind much much more
> peaceful.

Probably the developer should be a lot less peaceful, because
the safer than required semantics could and perhaps should
disappear tomorrow, and then application would be subtly buggy.

It is not a theoretical issue; there have been a lot of problems
and a huge O_PONIES discussion when the 'ext4' developers went
for an implementation closer to the safety level madnated by the
standard.

Never mind exceptionally silly application developers who tend
to forget that application files might reside on NFS or other
network file systems that are both extremely popular and they
cannot be ignored, and have semantics less safe then POSIX.

Relying on implementations that implement safer behavior than
POSIX seems to me a very bad, lazy (and common) idea.

> [ ... ] a right semantic of fsync should be "The users wants
> to assure the file is retrievable after system crash or power
> failure if fsync returned successfully".

Those would be really bad semantics, because UNIX/POSIX/Linux
filesystem semantics don't allow this silly definition to have a
useful meaning.

The definition seems to be based on ignorance of the really
important and big fact that UNIX/POSIX/Linux files have no
names, and that only directory entries have names, and that a
file can be linked to by zero or many directory entries, and
that for the kernel it can be very expensive to keep track of
all the directory entries (if any) that (hard) link to the file.

A process only needs to 'fsync' a directory if it modified the
directory (for example on entry, not necessarily file, creation
or modification) and it would be really stupid and against all
UNIX/POSIX/Linux logic to impose on the kernel the overhead of
finding and 'fsync'ing all the directories that have entries (if
any!) linking to a file being 'fsync'ed itself.

It is up the user and/or the the applications managing file and
named hard links to them to 'fsync' the file when appropriate,
and if needed (and not necessarily at the same time) any
directories containing the hard links to the file, because which
directory entries should link to a file and where they are can
only be part of the application/user data management logic.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-10-01 23:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-23 15:12 Directory fsync Zhu Han
2011-09-23 16:33 ` Christoph Hellwig
2011-09-23 23:09   ` Michael Monnerie
2011-09-24  1:20     ` Zhu Han
2011-10-01 23:20       ` Peter Grandi
2011-09-26  0:28     ` Dave Chinner
2011-09-26  0:51       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.