linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] ovl: fsync parent directory in copy-up
@ 2022-02-26 15:20 Chengguang Xu
  2022-02-26 16:38 ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2022-02-26 15:20 UTC (permalink / raw)
  To: miklos; +Cc: linux-unionfs, Chengguang Xu

Calling fsync for parent directory in copy-up to
ensure the change get synced.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/copy_up.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index e040970408d4..52ca915f04a3 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -944,6 +944,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 {
 	int err;
 	DEFINE_DELAYED_CALL(done);
+	struct file *parent_file = NULL;
 	struct path parentpath;
 	struct ovl_copy_up_ctx ctx = {
 		.parent = parent,
@@ -972,6 +973,12 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 				  AT_STATX_SYNC_AS_STAT);
 		if (err)
 			return err;
+
+		parent_file = ovl_path_open(&parentpath, O_WRONLY);
+		if (IS_ERR(parent_file)) {
+			err = PTR_ERR(parent_file);
+			return err;
+		}
 	}
 
 	/* maybe truncate regular file. this has no effect on dirs */
@@ -998,6 +1005,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			err = ovl_copy_up_meta_inode_data(&ctx);
 		ovl_copy_up_end(dentry);
 	}
+
+	if (!err) {
+		if (parent_file) {
+			vfs_fsync(parent_file, 0);
+			fput(parent_file);
+		}
+	}
+
 	do_delayed_call(&done);
 
 	return err;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-26 15:20 [RFC PATCH] ovl: fsync parent directory in copy-up Chengguang Xu
@ 2022-02-26 16:38 ` Amir Goldstein
  2022-02-27  4:28   ` Chengguang Xu
  2022-03-01 14:49   ` Miklos Szeredi
  0 siblings, 2 replies; 7+ messages in thread
From: Amir Goldstein @ 2022-02-26 16:38 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Miklos Szeredi, overlayfs

On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Calling fsync for parent directory in copy-up to
> ensure the change get synced.

It is not clear to me that this change is really needed
What if the reported problem?

Besides this can impact performance in some workloads.

The difference between parent copy up and file copy up is that
failing to fsync to copied up data and linking/moving the upper file
into place may result in corrupted data after power failure if temp
file data is not synced.

Failing the fsync the parent dir OTOH may result in revert to
lower file data after power failure.

The thing is, although POSIX gives you no such guarantee, with
ext4/xfs fsync of the upper file itself will guarantee that parents
will be present after power failure (see [1]).

This is not true for btrfs, but there are fewer users using overlayfs
over btrfs (at least in the container world).

So while your patch is certainly "correct", for most users its effects
will be only negative - performance penalty without fixing anything.
So I think this change should be opt-in with Kconfig/module/mount option.

Unfortunately, there is currently no way to know whether the underlying
filesystem needs the parent dir fsync or not.
I was trying to promote a VFS API for a weaker version of fsync for
the parent dir [2] but that effort did not converge.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/
[2] https://lore.kernel.org/linux-fsdevel/20190527172655.9287-1-amir73il@gmail.com/

>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  fs/overlayfs/copy_up.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index e040970408d4..52ca915f04a3 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -944,6 +944,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>  {
>         int err;
>         DEFINE_DELAYED_CALL(done);
> +       struct file *parent_file = NULL;
>         struct path parentpath;
>         struct ovl_copy_up_ctx ctx = {
>                 .parent = parent,
> @@ -972,6 +973,12 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>                                   AT_STATX_SYNC_AS_STAT);
>                 if (err)
>                         return err;
> +
> +               parent_file = ovl_path_open(&parentpath, O_WRONLY);
> +               if (IS_ERR(parent_file)) {
> +                       err = PTR_ERR(parent_file);
> +                       return err;
> +               }
>         }
>
>         /* maybe truncate regular file. this has no effect on dirs */
> @@ -998,6 +1005,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>                         err = ovl_copy_up_meta_inode_data(&ctx);
>                 ovl_copy_up_end(dentry);
>         }
> +
> +       if (!err) {
> +               if (parent_file) {
> +                       vfs_fsync(parent_file, 0);
> +                       fput(parent_file);
> +               }
> +       }
> +
>         do_delayed_call(&done);
>
>         return err;
> --
> 2.27.0
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-26 16:38 ` Amir Goldstein
@ 2022-02-27  4:28   ` Chengguang Xu
  2022-02-27  5:16     ` Amir Goldstein
  2022-03-01 14:49   ` Miklos Szeredi
  1 sibling, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2022-02-27  4:28 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs

在 2022/2/27 0:38, Amir Goldstein 写道:
> On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>> Calling fsync for parent directory in copy-up to
>> ensure the change get synced.
> It is not clear to me that this change is really needed
> What if the reported problem?

I found this issue by eyeball scan when I was looking for
the places which need to mark overlay inode dirty in change.

However, I think there are still some real world cases will be impacted 
by this kind of issue,
for example, using docker build to make new docker image and power 
failure makes new
image inconsistant.


>
> Besides this can impact performance in some workloads.
>
> The difference between parent copy up and file copy up is that
> failing to fsync to copied up data and linking/moving the upper file
> into place may result in corrupted data after power failure if temp
> file data is not synced.
>
> Failing the fsync the parent dir OTOH may result in revert to
> lower file data after power failure.
>
> The thing is, although POSIX gives you no such guarantee, with
> ext4/xfs fsync of the upper file itself will guarantee that parents
> will be present after power failure (see [1]).

In the new test case (079) which I posted, I've tried xfs as underlying 
fs and found the parent of
copy-up file didn't present after power failure. Am I missing something?


>
> This is not true for btrfs, but there are fewer users using overlayfs
> over btrfs (at least in the container world).
>
> So while your patch is certainly "correct", for most users its effects
> will be only negative - performance penalty without fixing anything.
> So I think this change should be opt-in with Kconfig/module/mount option.

I prefer to add this as standard option, but if Miklos and you guys all 
think it should be opt-in then
mount option is also acceptable for me.


Thanks,
Chengguang

>
> Unfortunately, there is currently no way to know whether the underlying
> filesystem needs the parent dir fsync or not.
> I was trying to promote a VFS API for a weaker version of fsync for
> the parent dir [2] but that effort did not converge.
>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/
> [2] https://lore.kernel.org/linux-fsdevel/20190527172655.9287-1-amir73il@gmail.com/
>
>> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>> ---
>>   fs/overlayfs/copy_up.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> index e040970408d4..52ca915f04a3 100644
>> --- a/fs/overlayfs/copy_up.c
>> +++ b/fs/overlayfs/copy_up.c
>> @@ -944,6 +944,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>>   {
>>          int err;
>>          DEFINE_DELAYED_CALL(done);
>> +       struct file *parent_file = NULL;
>>          struct path parentpath;
>>          struct ovl_copy_up_ctx ctx = {
>>                  .parent = parent,
>> @@ -972,6 +973,12 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>>                                    AT_STATX_SYNC_AS_STAT);
>>                  if (err)
>>                          return err;
>> +
>> +               parent_file = ovl_path_open(&parentpath, O_WRONLY);
>> +               if (IS_ERR(parent_file)) {
>> +                       err = PTR_ERR(parent_file);
>> +                       return err;
>> +               }
>>          }
>>
>>          /* maybe truncate regular file. this has no effect on dirs */
>> @@ -998,6 +1005,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>>                          err = ovl_copy_up_meta_inode_data(&ctx);
>>                  ovl_copy_up_end(dentry);
>>          }
>> +
>> +       if (!err) {
>> +               if (parent_file) {
>> +                       vfs_fsync(parent_file, 0);
>> +                       fput(parent_file);
>> +               }
>> +       }
>> +
>>          do_delayed_call(&done);
>>
>>          return err;
>> --
>> 2.27.0
>>
>>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-27  4:28   ` Chengguang Xu
@ 2022-02-27  5:16     ` Amir Goldstein
  2022-02-27 13:24       ` Chengguang Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2022-02-27  5:16 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Miklos Szeredi, overlayfs

On Sun, Feb 27, 2022 at 6:28 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> 在 2022/2/27 0:38, Amir Goldstein 写道:
> > On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >> Calling fsync for parent directory in copy-up to
> >> ensure the change get synced.
> > It is not clear to me that this change is really needed
> > What if the reported problem?
>
> I found this issue by eyeball scan when I was looking for
> the places which need to mark overlay inode dirty in change.
>
> However, I think there are still some real world cases will be impacted
> by this kind of issue,
> for example, using docker build to make new docker image and power
> failure makes new
> image inconsistant.

A very good example where the fsync of parent will be counter productive.
The efficient way of building a docker image would be:
1. Write/copy up all the files
2. Writeback all the upper inodes
3. syncfs() upper fs

2 and 3 should happen on overlayfs umount as long as we properly
marked all the copied up overlayfs inodes dirty.

So the question is not if parent dir needs fsync, but if it needs to be
dirtied.

>
>
> >
> > Besides this can impact performance in some workloads.
> >
> > The difference between parent copy up and file copy up is that
> > failing to fsync to copied up data and linking/moving the upper file
> > into place may result in corrupted data after power failure if temp
> > file data is not synced.
> >
> > Failing the fsync the parent dir OTOH may result in revert to
> > lower file data after power failure.
> >
> > The thing is, although POSIX gives you no such guarantee, with
> > ext4/xfs fsync of the upper file itself will guarantee that parents
> > will be present after power failure (see [1]).
>
> In the new test case (079) which I posted, I've tried xfs as underlying
> fs and found the parent of
> copy-up file didn't present after power failure. Am I missing something?
>

I think you are.
The test does not reproduce an inconsistency.
The test reproduced changes that did not persist to storage after
power failure and that is the expected behavior when a user does
not fsync after making changes - unless the 'sync' or 'dirsync' mount
options have been used.

I think your test will become correct if you use -o sync for the overlayfs
mount and your patch will become correct if you do:

+        if (inode_needs_sync(d_inode(dentry)) {
+               parent_file = ovl_path_open(&parentpath, O_DIRECTORY|O_RDONLY);

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-27  5:16     ` Amir Goldstein
@ 2022-02-27 13:24       ` Chengguang Xu
  2022-02-27 13:38         ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Chengguang Xu @ 2022-02-27 13:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs

在 2022/2/27 13:16, Amir Goldstein 写道:
> On Sun, Feb 27, 2022 at 6:28 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>> 在 2022/2/27 0:38, Amir Goldstein 写道:
>>> On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>>>> Calling fsync for parent directory in copy-up to
>>>> ensure the change get synced.
>>> It is not clear to me that this change is really needed
>>> What if the reported problem?
>> I found this issue by eyeball scan when I was looking for
>> the places which need to mark overlay inode dirty in change.
>>
>> However, I think there are still some real world cases will be impacted
>> by this kind of issue,
>> for example, using docker build to make new docker image and power
>> failure makes new
>> image inconsistant.
> A very good example where the fsync of parent will be counter productive.
> The efficient way of building a docker image would be:
> 1. Write/copy up all the files
> 2. Writeback all the upper inodes
> 3. syncfs() upper fs
>
> 2 and 3 should happen on overlayfs umount as long as we properly
> marked all the copied up overlayfs inodes dirty.
>
> So the question is not if parent dir needs fsync, but if it needs to be
> dirtied.
>
>>
>>> Besides this can impact performance in some workloads.
>>>
>>> The difference between parent copy up and file copy up is that
>>> failing to fsync to copied up data and linking/moving the upper file
>>> into place may result in corrupted data after power failure if temp
>>> file data is not synced.
>>>
>>> Failing the fsync the parent dir OTOH may result in revert to
>>> lower file data after power failure.
>>>
>>> The thing is, although POSIX gives you no such guarantee, with
>>> ext4/xfs fsync of the upper file itself will guarantee that parents
>>> will be present after power failure (see [1]).
>> In the new test case (079) which I posted, I've tried xfs as underlying
>> fs and found the parent of
>> copy-up file didn't present after power failure. Am I missing something?
>>
> I think you are.
> The test does not reproduce an inconsistency.
> The test reproduced changes that did not persist to storage after
> power failure and that is the expected behavior when a user does
> not fsync after making changes - unless the 'sync' or 'dirsync' mount
> options have been used.
>
> I think your test will become correct if you use -o sync for the overlayfs
> mount and your patch will become correct if you do:
>
> +        if (inode_needs_sync(d_inode(dentry)) {
> +               parent_file = ovl_path_open(&parentpath, O_DIRECTORY|O_RDONLY);

Why should parent_file open with O_RDONLY flag? Meanwhile, I think the 
fix is not sufifcient for fully supporting 'dirsync' or 'sync' in overlayfs.
Anyway, I think the description of expected behavior above makes sense 
so maybe the topic should turn to implement 'sync' or 'dirsync' mount
options or reject specifying those options in overlayfs.


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-27 13:24       ` Chengguang Xu
@ 2022-02-27 13:38         ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2022-02-27 13:38 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Miklos Szeredi, overlayfs

On Sun, Feb 27, 2022 at 3:25 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> 在 2022/2/27 13:16, Amir Goldstein 写道:
> > On Sun, Feb 27, 2022 at 6:28 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >> 在 2022/2/27 0:38, Amir Goldstein 写道:
> >>> On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >>>> Calling fsync for parent directory in copy-up to
> >>>> ensure the change get synced.
> >>> It is not clear to me that this change is really needed
> >>> What if the reported problem?
> >> I found this issue by eyeball scan when I was looking for
> >> the places which need to mark overlay inode dirty in change.
> >>
> >> However, I think there are still some real world cases will be impacted
> >> by this kind of issue,
> >> for example, using docker build to make new docker image and power
> >> failure makes new
> >> image inconsistant.
> > A very good example where the fsync of parent will be counter productive.
> > The efficient way of building a docker image would be:
> > 1. Write/copy up all the files
> > 2. Writeback all the upper inodes
> > 3. syncfs() upper fs
> >
> > 2 and 3 should happen on overlayfs umount as long as we properly
> > marked all the copied up overlayfs inodes dirty.
> >
> > So the question is not if parent dir needs fsync, but if it needs to be
> > dirtied.
> >
> >>
> >>> Besides this can impact performance in some workloads.
> >>>
> >>> The difference between parent copy up and file copy up is that
> >>> failing to fsync to copied up data and linking/moving the upper file
> >>> into place may result in corrupted data after power failure if temp
> >>> file data is not synced.
> >>>
> >>> Failing the fsync the parent dir OTOH may result in revert to
> >>> lower file data after power failure.
> >>>
> >>> The thing is, although POSIX gives you no such guarantee, with
> >>> ext4/xfs fsync of the upper file itself will guarantee that parents
> >>> will be present after power failure (see [1]).
> >> In the new test case (079) which I posted, I've tried xfs as underlying
> >> fs and found the parent of
> >> copy-up file didn't present after power failure. Am I missing something?
> >>
> > I think you are.
> > The test does not reproduce an inconsistency.
> > The test reproduced changes that did not persist to storage after
> > power failure and that is the expected behavior when a user does
> > not fsync after making changes - unless the 'sync' or 'dirsync' mount
> > options have been used.
> >
> > I think your test will become correct if you use -o sync for the overlayfs
> > mount and your patch will become correct if you do:
> >
> > +        if (inode_needs_sync(d_inode(dentry)) {
> > +               parent_file = ovl_path_open(&parentpath, O_DIRECTORY|O_RDONLY);
>
> Why should parent_file open with O_RDONLY flag? Meanwhile, I think the

It's a directory. What is the meaning of opening it for write?
fsync doesn't need a writable fd.

> fix is not sufifcient for fully supporting 'dirsync' or 'sync' in overlayfs.

Right.

> Anyway, I think the description of expected behavior above makes sense
> so maybe the topic should turn to implement 'sync' or 'dirsync' mount
> options or reject specifying those options in overlayfs.

If anything, I would reject them, unless you know of users that are
going to use this functionality, but maybe you better complete the
syncfs work before going into new adventures ;-)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] ovl: fsync parent directory in copy-up
  2022-02-26 16:38 ` Amir Goldstein
  2022-02-27  4:28   ` Chengguang Xu
@ 2022-03-01 14:49   ` Miklos Szeredi
  1 sibling, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2022-03-01 14:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chengguang Xu, overlayfs

On Sat, 26 Feb 2022 at 17:38, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Feb 26, 2022 at 5:21 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >
> > Calling fsync for parent directory in copy-up to
> > ensure the change get synced.
>
> It is not clear to me that this change is really needed
> What if the reported problem?
>
> Besides this can impact performance in some workloads.
>
> The difference between parent copy up and file copy up is that
> failing to fsync to copied up data and linking/moving the upper file
> into place may result in corrupted data after power failure if temp
> file data is not synced.
>
> Failing the fsync the parent dir OTOH may result in revert to
> lower file data after power failure.
>
> The thing is, although POSIX gives you no such guarantee, with
> ext4/xfs fsync of the upper file itself will guarantee that parents
> will be present after power failure (see [1]).
>
> This is not true for btrfs, but there are fewer users using overlayfs
> over btrfs (at least in the container world).
>
> So while your patch is certainly "correct", for most users its effects
> will be only negative - performance penalty without fixing anything.
> So I think this change should be opt-in with Kconfig/module/mount option.

Probably should be made conditional on ovl_should_sync(ofs).

But before adding more options/complexity the performance impact
itself should be evaluated.   My guess is that it will be minimal,
since directory copy ups will generally be orders of magnitude less
frequent than file copy ups.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-01 14:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-26 15:20 [RFC PATCH] ovl: fsync parent directory in copy-up Chengguang Xu
2022-02-26 16:38 ` Amir Goldstein
2022-02-27  4:28   ` Chengguang Xu
2022-02-27  5:16     ` Amir Goldstein
2022-02-27 13:24       ` Chengguang Xu
2022-02-27 13:38         ` Amir Goldstein
2022-03-01 14:49   ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).