All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
@ 2018-02-19 23:22 Andreas Gruenbacher
  2018-02-20 15:32 ` Bob Peterson
  2018-02-20 19:46 ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2018-02-19 23:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When fsyncing a new file, also fsync the directory the files is in,
recursively.  This is how Linux filesystems should behave nowadays,
even if not mandated by POSIX.

Based on ext4 commits 14ece1028, d59729f4e, and 9f713878f.

Fixes xfstests generic/322, generic/376.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/dir.c    |  3 ++-
 fs/gfs2/dir.h    |  2 +-
 fs/gfs2/file.c   | 37 +++++++++++++++++++++++++++++++++++++
 fs/gfs2/incore.h |  1 +
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 7c21aea..5dbcf9d 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1797,7 +1797,7 @@ static u16 gfs2_inode_ra_len(const struct gfs2_inode *ip)
  */
 
 int gfs2_dir_add(struct inode *inode, const struct qstr *name,
-		 const struct gfs2_inode *nip, struct gfs2_diradd *da)
+		 struct gfs2_inode *nip, struct gfs2_diradd *da)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct buffer_head *bh = da->bh;
@@ -1832,6 +1832,7 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name,
 			ip->i_inode.i_mtime = ip->i_inode.i_ctime = tv;
 			if (S_ISDIR(nip->i_inode.i_mode))
 				inc_nlink(&ip->i_inode);
+			set_bit(GIF_NEWENTRY, &nip->i_flags);
 			mark_inode_dirty(inode);
 			error = 0;
 			break;
diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h
index e1b309c..8c07423 100644
--- a/fs/gfs2/dir.h
+++ b/fs/gfs2/dir.h
@@ -32,7 +32,7 @@ extern struct inode *gfs2_dir_search(struct inode *dir,
 extern int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
 			  const struct gfs2_inode *ip);
 extern int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
-			const struct gfs2_inode *ip, struct gfs2_diradd *da);
+			struct gfs2_inode *ip, struct gfs2_diradd *da);
 static inline void gfs2_dir_no_add(struct gfs2_diradd *da)
 {
 	if (da->bh)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 739a47c..9669991 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -631,6 +631,39 @@ static int gfs2_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int gfs2_sync_parent(struct inode *inode)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	int ret = 0;
+
+	if (!test_bit(GIF_NEWENTRY, &ip->i_flags))
+		return 0;
+	inode = igrab(inode);
+	while (test_bit(GIF_NEWENTRY, &ip->i_flags)) {
+		struct dentry *dentry;
+		struct inode *next;
+
+		clear_bit(GIF_NEWENTRY, &ip->i_flags);
+		dentry = d_find_any_alias(inode);
+		if (!dentry)
+			break;
+		next = igrab(d_inode(dentry->d_parent));
+		dput(dentry);
+		if (!next)
+			break;
+		iput(inode);
+		inode = next;
+		ip = GFS2_I(inode);
+
+		ret = sync_inode_metadata(inode, 1);
+		if (ret)
+			break;
+		gfs2_ail_flush(ip->i_gl, 1);
+	}
+	iput(inode);
+	return ret;
+}
+
 /**
  * gfs2_fsync - sync the dirty data for a file (across the cluster)
  * @file: the file that points to the dentry
@@ -683,6 +716,10 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 		gfs2_ail_flush(ip->i_gl, 1);
 	}
 
+	ret = gfs2_sync_parent(inode);
+	if (ret)
+		return ret;
+
 	if (mapping->nrpages)
 		ret = file_fdatawait_range(file, start, end);
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index e0557b8..e81c5eb 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -386,6 +386,7 @@ enum {
 	GIF_ORDERED		= 4,
 	GIF_FREE_VFS_INODE      = 5,
 	GIF_GLOP_PENDING	= 6,
+	GIF_NEWENTRY		= 7,
 };
 
 struct gfs2_inode {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-19 23:22 [Cluster-devel] [PATCH] gfs2: Fsync parent directories Andreas Gruenbacher
@ 2018-02-20 15:32 ` Bob Peterson
  2018-02-20 19:46 ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2018-02-20 15:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Andreas,

----- Original Message -----
| When fsyncing a new file, also fsync the directory the files is in,
| recursively.  This is how Linux filesystems should behave nowadays,
| even if not mandated by POSIX.
| 
| Based on ext4 commits 14ece1028, d59729f4e, and 9f713878f.
| 
| Fixes xfstests generic/322, generic/376.
| 
| Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| ---

It seems like the patch should be calling gfs2_inode_lookup on the
parent directory or something, rather than a simple i_grab, and
possibly even holding (nw) the parent directory's i_gl glock.
Otherwise, the call to gfs2_ail_flush may reference an i_gl that
might not exist. I'm concerned about other nodes in the cluster
referencing and/or changing the parent directory inode while this
is happening. I'm not sure if it's possible. Maybe Nate has a test
to check cluster coherency for directories as well as files?

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-19 23:22 [Cluster-devel] [PATCH] gfs2: Fsync parent directories Andreas Gruenbacher
  2018-02-20 15:32 ` Bob Peterson
@ 2018-02-20 19:46 ` Christoph Hellwig
  2018-02-20 20:53   ` Andreas Gruenbacher
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2018-02-20 19:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
> When fsyncing a new file, also fsync the directory the files is in,
> recursively.  This is how Linux filesystems should behave nowadays,
> even if not mandated by POSIX.

I think that is bullshit.  Maybe it is what google wants for ext4
non-journal mode which no one else uses anyway. but it certainly
is anything but normal Linux semantics.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-20 19:46 ` Christoph Hellwig
@ 2018-02-20 20:53   ` Andreas Gruenbacher
  2018-02-20 21:51     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Gruenbacher @ 2018-02-20 20:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 20 February 2018 at 20:46, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
>> When fsyncing a new file, also fsync the directory the files is in,
>> recursively.  This is how Linux filesystems should behave nowadays,
>> even if not mandated by POSIX.
>
> I think that is bullshit.  Maybe it is what google wants for ext4
> non-journal mode which no one else uses anyway. but it certainly
> is anything but normal Linux semantics.

Here's some code from xfstest generic/322:

  _mount_flakey
  $XFS_IO_PROG -f -c "pwrite 0 1M" -c "fsync" $SCRATCH_MNT/foo \
    > $seqres.full 2>&1 || _fail "xfs_io failed"
  mv $SCRATCH_MNT/foo $SCRATCH_MNT/bar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
  md5sum $SCRATCH_MNT/bar | _filter_scratch

  _flakey_drop_and_remount

  md5sum $SCRATCH_MNT/bar | _filter_scratch
  _unmount_flakey

Note that there is no fsync for the parent directory ($SCRATCH_MNT),
yet the test obviously expects the directory to be synced as well.
This isn't implemented as in this patch on all filesystems, but the
major ones all show this behavior. So where's the bullshit?

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-20 20:53   ` Andreas Gruenbacher
@ 2018-02-20 21:51     ` Dave Chinner
  2018-02-21 16:11       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2018-02-20 21:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Feb 20, 2018 at 09:53:59PM +0100, Andreas Gruenbacher wrote:
> On 20 February 2018 at 20:46, Christoph Hellwig <hch@infradead.org> wrote:
> > On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
> >> When fsyncing a new file, also fsync the directory the files is in,
> >> recursively.  This is how Linux filesystems should behave nowadays,
> >> even if not mandated by POSIX.
> >
> > I think that is bullshit.  Maybe it is what google wants for ext4
> > non-journal mode which no one else uses anyway. but it certainly
> > is anything but normal Linux semantics.
> 
> Here's some code from xfstest generic/322:
> 
>   _mount_flakey
>   $XFS_IO_PROG -f -c "pwrite 0 1M" -c "fsync" $SCRATCH_MNT/foo \
>     > $seqres.full 2>&1 || _fail "xfs_io failed"
>   mv $SCRATCH_MNT/foo $SCRATCH_MNT/bar
>   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
>   md5sum $SCRATCH_MNT/bar | _filter_scratch
> 
>   _flakey_drop_and_remount
> 
>   md5sum $SCRATCH_MNT/bar | _filter_scratch
>   _unmount_flakey
> 
> Note that there is no fsync for the parent directory ($SCRATCH_MNT),
> yet the test obviously expects the directory to be synced as well.
> This isn't implemented as in this patch on all filesystems, but the
> major ones all show this behavior. So where's the bullshit?

This test is for filesystems that have strictly ordered metadata
journalling. All the filesystems that fstests supports
via _require_metadata_journalling() have strictly ordered metadata
journalling/crash recovery semantics. (i.e. xfs, ext4, btrfs, and
f2fs (IIRC)).

IOWs, if the filesystem is designed with strictly ordered metadata,
then fsync()ing a new file also implies that all references to the
new file are also on stable storage because they happened before the
fsync on the file was issued. i.e. the directory is fsync'd
implicitly because it was modified by the same operation that
created the file. Hence if the file creation is made stable, so must
be the directory modification done during file creation.

This has nothing to do with POSIX or what the "linux standard" is -
this is testing whether the implementation of strictly ordered
metadata journalling is correct or not.  If gfs2 does not have
strictly ordered metadata journalling, then it probably shouldn't
run these tests....

Cheers,

Dave.
-- 
Dave Chinner
dchinner at redhat.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-20 21:51     ` Dave Chinner
@ 2018-02-21 16:11       ` Christoph Hellwig
  2018-02-26 17:17         ` Andreas Gruenbacher
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2018-02-21 16:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Feb 21, 2018 at 08:51:15AM +1100, Dave Chinner wrote:
> IOWs, if the filesystem is designed with strictly ordered metadata,
> then fsync()ing a new file also implies that all references to the
> new file are also on stable storage because they happened before the
> fsync on the file was issued. i.e. the directory is fsync'd
> implicitly because it was modified by the same operation that
> created the file. Hence if the file creation is made stable, so must
> be the directory modification done during file creation.
> 
> This has nothing to do with POSIX or what the "linux standard" is -
> this is testing whether the implementation of strictly ordered
> metadata journalling is correct or not.  If gfs2 does not have
> strictly ordered metadata journalling, then it probably shouldn't
> run these tests....

Exactly.  Also this is not just for new entries but also things like
rename.  So trying to come up with some adjocs hacks here seems
wrong.

That being said as far as I know gfs2 does transactional metadata
updates and has one single global log.  Why doesn't it get these
things right by default?



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fsync parent directories
  2018-02-21 16:11       ` Christoph Hellwig
@ 2018-02-26 17:17         ` Andreas Gruenbacher
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2018-02-26 17:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 21 February 2018 at 17:11, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Feb 21, 2018 at 08:51:15AM +1100, Dave Chinner wrote:
>> IOWs, if the filesystem is designed with strictly ordered metadata,
>> then fsync()ing a new file also implies that all references to the
>> new file are also on stable storage because they happened before the
>> fsync on the file was issued. i.e. the directory is fsync'd
>> implicitly because it was modified by the same operation that
>> created the file. Hence if the file creation is made stable, so must
>> be the directory modification done during file creation.
>>
>> This has nothing to do with POSIX or what the "linux standard" is -
>> this is testing whether the implementation of strictly ordered
>> metadata journalling is correct or not.  If gfs2 does not have
>> strictly ordered metadata journalling, then it probably shouldn't
>> run these tests....
>
> Exactly.  Also this is not just for new entries but also things like
> rename.  So trying to come up with some adjocs hacks here seems
> wrong.
>
> That being said as far as I know gfs2 does transactional metadata
> updates and has one single global log.  Why doesn't it get these
> things right by default?

GFS2 does do metadata journaling. I was under the assumption that
gfs2's ordering model differs, but it turns out that all that was
missing was a log flush in iop->fsync in case the inode is clean but a
log flush hasn't been done for it, yet.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-26 17:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-19 23:22 [Cluster-devel] [PATCH] gfs2: Fsync parent directories Andreas Gruenbacher
2018-02-20 15:32 ` Bob Peterson
2018-02-20 19:46 ` Christoph Hellwig
2018-02-20 20:53   ` Andreas Gruenbacher
2018-02-20 21:51     ` Dave Chinner
2018-02-21 16:11       ` Christoph Hellwig
2018-02-26 17:17         ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.