All of lore.kernel.org
 help / color / mirror / Atom feed
* mkdir and fsync
@ 2014-09-10 20:55 Samer Al-Kiswany
  2014-09-11  1:29 ` Christoph Hellwig
  2014-09-11 13:29 ` Chris Mason
  0 siblings, 2 replies; 4+ messages in thread
From: Samer Al-Kiswany @ 2014-09-10 20:55 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Thank you for help.

I am seeing a strange behavior when fsync()ing a directory.

Here is what I do

for (i=0; i < 100,000, i++){
	.
      mkdir(p/child_i)
      fsync(p)
}

Btrfs seems to achieve around 100k fsycs/second, which makes me believe it
is not touching the disk during these fsyncs.
After looking at the code, it seems indeed that fsync adds the inode to the
current transaction but does not sync the transaction to disk.

Is this the intended behavior for metadata fsync or is this a bug?
Is this POSIX compliant?

Thank you for your help
-samer



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mkdir and fsync
  2014-09-10 20:55 mkdir and fsync Samer Al-Kiswany
@ 2014-09-11  1:29 ` Christoph Hellwig
  2014-09-11 13:29 ` Chris Mason
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2014-09-11  1:29 UTC (permalink / raw)
  To: Samer Al-Kiswany; +Cc: linux-btrfs

On Wed, Sep 10, 2014 at 01:55:35PM -0700, Samer Al-Kiswany wrote:
> Btrfs seems to achieve around 100k fsycs/second, which makes me believe it
> is not touching the disk during these fsyncs.
> After looking at the code, it seems indeed that fsync adds the inode to the
> current transaction but does not sync the transaction to disk.
> 
> Is this the intended behavior for metadata fsync or is this a bug?
> Is this POSIX compliant?

Posix is basically meaningless for fsync:

    The fsync() function shall request that all data for the open file
    descriptor named by fildes is to be transferred to the storage
    device associated with the file described by fildes. The nature of
    the transfer is implementation-defined. The fsync() function shall
    not return until the system has completed that action or until an
    error is detected.

    [SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the
    fsync() function shall force all currently queued I/O operations
    associated with the file indicated by file descriptor fildes to
    the synchronized I/O completion state. All I/O operations shall
    be completed as defined for synchronized I/O file integrity
    completion. [Option End]

but Linux semantics do expect a filesystem to write out all metadata for
a directory file descriptor on a fsync.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mkdir and fsync
  2014-09-10 20:55 mkdir and fsync Samer Al-Kiswany
  2014-09-11  1:29 ` Christoph Hellwig
@ 2014-09-11 13:29 ` Chris Mason
  2014-09-14 20:56   ` Samer Al-Kiswany
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-09-11 13:29 UTC (permalink / raw)
  To: Samer Al-Kiswany, linux-btrfs

On 09/10/2014 04:55 PM, Samer Al-Kiswany wrote:
> Hi,
> 
> Thank you for help.
> 
> I am seeing a strange behavior when fsync()ing a directory.
> 
> Here is what I do
> 
> for (i=0; i < 100,000, i++){
> 	.
>       mkdir(p/child_i)
>       fsync(p)
> }
> 
> Btrfs seems to achieve around 100k fsycs/second, which makes me believe it
> is not touching the disk during these fsyncs.
> After looking at the code, it seems indeed that fsync adds the inode to the
> current transaction but does not sync the transaction to disk.
> 
> Is this the intended behavior for metadata fsync or is this a bug?
> Is this POSIX compliant?

Which kernel and hardware?  We had some dir fsync handling bugs in the
past which may have been related.

I just did a test here, and we're definitely doing the IO.  Christoph is
right about the requirements for fsync being sloppy.  For btrfs, we do
put directory changes into the log during an fsync, but we may end up
logging only what you fsync.

So this will get child_i:

mkdir(p/child_i)
fsync(p)

This will not:

mkdir(p/child_i)
fsync(some_other_directory_that_isn't_p)

(This is different from ext34)

-chris




^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: mkdir and fsync
  2014-09-11 13:29 ` Chris Mason
@ 2014-09-14 20:56   ` Samer Al-Kiswany
  0 siblings, 0 replies; 4+ messages in thread
From: Samer Al-Kiswany @ 2014-09-14 20:56 UTC (permalink / raw)
  To: 'Chris Mason'; +Cc: linux-btrfs

Thank you Chris,

We use kernel 3.2 in a virtual machine using VirtualBox. We will test it
with the latest kernel version and let you know if we face the same issue.

Thank you
-samer
http://www.ece.ubc.ca/~samera/ 
http://www.cs.wisc.edu/~samera/ 


-----Original Message-----
From: Chris Mason [mailto:clm@fb.com] 
Sent: Thursday, September 11, 2014 6:29 AM
To: Samer Al-Kiswany; linux-btrfs@vger.kernel.org
Subject: Re: mkdir and fsync

On 09/10/2014 04:55 PM, Samer Al-Kiswany wrote:
> Hi,
> 
> Thank you for help.
> 
> I am seeing a strange behavior when fsync()ing a directory.
> 
> Here is what I do
> 
> for (i=0; i < 100,000, i++){
> 	.
>       mkdir(p/child_i)
>       fsync(p)
> }
> 
> Btrfs seems to achieve around 100k fsycs/second, which makes me 
> believe it is not touching the disk during these fsyncs.
> After looking at the code, it seems indeed that fsync adds the inode 
> to the current transaction but does not sync the transaction to disk.
> 
> Is this the intended behavior for metadata fsync or is this a bug?
> Is this POSIX compliant?

Which kernel and hardware?  We had some dir fsync handling bugs in the past
which may have been related.

I just did a test here, and we're definitely doing the IO.  Christoph is
right about the requirements for fsync being sloppy.  For btrfs, we do put
directory changes into the log during an fsync, but we may end up logging
only what you fsync.

So this will get child_i:

mkdir(p/child_i)
fsync(p)

This will not:

mkdir(p/child_i)
fsync(some_other_directory_that_isn't_p)

(This is different from ext34)

-chris





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-14 21:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-10 20:55 mkdir and fsync Samer Al-Kiswany
2014-09-11  1:29 ` Christoph Hellwig
2014-09-11 13:29 ` Chris Mason
2014-09-14 20:56   ` Samer Al-Kiswany

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.