linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] ovl: improving copy-up efficiency for big sparse file
@ 2019-11-01 12:35 Chengguang Xu
  2019-12-12 15:42 ` Amir Goldstein
  0 siblings, 1 reply; 6+ messages in thread
From: Chengguang Xu @ 2019-11-01 12:35 UTC (permalink / raw)
  To: miklos, amir73il; +Cc: linux-unionfs, Chengguang Xu

Current copy-up is not efficient for big sparse file,
It's not only slow but also wasting more disk space
when the target lower file has huge hole inside.
This patch tries to recognize file hole and skip it
during copy-up.

Detail logic of hole detection as below:
When we detect next data position is larger than current
position we will skip that hole, otherwise we copy
data in the size of OVL_COPY_UP_CHUNK_SIZE. Actually,
it may not recognize all kind of holes and sometimes
only skips partial of hole area. However, it will be
enough for most of the use cases.

Additionally, this optimization relies on lseek(2)
SEEK_DATA implementation, so for some specific
filesystems which do not support this feature
will behave as before on copy-up.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
v1->v2:
- Set file size when the hole is in the end of the file.
- Add a code comment for hole copy-up improvement.
- Check SEEK_DATA support before doing hole skip.
- Back to original copy-up when seek data fails(in error case).

v2->v3:
- Detect big continuous holes in an effective way.
- Modify changelog and code comment.
- Set file size in the end of ovl_copy_up_inode().

v3->v4:
- Truncate var name of old_next_data_pos to data_pos.
- Check hole only when data_pos < old_pos.

 fs/overlayfs/copy_up.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b801c6353100..55f1e81507ca 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -123,6 +123,9 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	loff_t old_pos = 0;
 	loff_t new_pos = 0;
 	loff_t cloned;
+	loff_t data_pos = -1;
+	loff_t hole_len;
+	bool skip_hole = false;
 	int error = 0;
 
 	if (len == 0)
@@ -144,7 +147,11 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		goto out;
 	/* Couldn't clone, so now we try to copy the data */
 
-	/* FIXME: copy up sparse files efficiently */
+	/* Check if lower fs supports seek operation */
+	if (old_file->f_mode & FMODE_LSEEK &&
+	    old_file->f_op->llseek)
+		skip_hole = true;
+
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
 		long bytes;
@@ -157,6 +164,36 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 			break;
 		}
 
+		/*
+		 * Fill zero for hole will cost unnecessary disk space
+		 * and meanwhile slow down the copy-up speed, so we do
+		 * an optimization for hole during copy-up, it relies
+		 * on SEEK_DATA implementation in lower fs so if lower
+		 * fs does not support it, copy-up will behave as before.
+		 *
+		 * Detail logic of hole detection as below:
+		 * When we detect next data position is larger than current
+		 * position we will skip that hole, otherwise we copy
+		 * data in the size of OVL_COPY_UP_CHUNK_SIZE. Actually,
+		 * it may not recognize all kind of holes and sometimes
+		 * only skips partial of hole area. However, it will be
+		 * enough for most of the use cases.
+		 */
+
+		if (skip_hole && data_pos < old_pos) {
+			data_pos = vfs_llseek(old_file, old_pos, SEEK_DATA);
+			if (data_pos > old_pos) {
+				hole_len = data_pos - old_pos;
+				len -= hole_len;
+				old_pos = new_pos = data_pos;
+				continue;
+			} else if (data_pos == -ENXIO) {
+				break;
+			} else if (data_pos < 0) {
+				skip_hole = false;
+			}
+		}
+
 		bytes = do_splice_direct(old_file, &old_pos,
 					 new_file, &new_pos,
 					 this_len, SPLICE_F_MOVE);
@@ -483,7 +520,7 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 	}
 
 	inode_lock(temp->d_inode);
-	if (c->metacopy)
+	if (S_ISREG(c->stat.mode))
 		err = ovl_set_size(temp, &c->stat);
 	if (!err)
 		err = ovl_set_attr(temp, &c->stat);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file
  2019-11-01 12:35 [PATCH v4] ovl: improving copy-up efficiency for big sparse file Chengguang Xu
@ 2019-12-12 15:42 ` Amir Goldstein
  2019-12-16 11:58   ` Miklos Szeredi
  0 siblings, 1 reply; 6+ messages in thread
From: Amir Goldstein @ 2019-12-12 15:42 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Miklos Szeredi, overlayfs

On Fri, Nov 1, 2019 at 2:36 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Current copy-up is not efficient for big sparse file,
> It's not only slow but also wasting more disk space
> when the target lower file has huge hole inside.
> This patch tries to recognize file hole and skip it
> during copy-up.
>
> Detail logic of hole detection as below:
> When we detect next data position is larger than current
> position we will skip that hole, otherwise we copy
> data in the size of OVL_COPY_UP_CHUNK_SIZE. Actually,
> it may not recognize all kind of holes and sometimes
> only skips partial of hole area. However, it will be
> enough for most of the use cases.
>
> Additionally, this optimization relies on lseek(2)
> SEEK_DATA implementation, so for some specific
> filesystems which do not support this feature
> will behave as before on copy-up.
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
> v1->v2:
> - Set file size when the hole is in the end of the file.
> - Add a code comment for hole copy-up improvement.
> - Check SEEK_DATA support before doing hole skip.
> - Back to original copy-up when seek data fails(in error case).
>
> v2->v3:
> - Detect big continuous holes in an effective way.
> - Modify changelog and code comment.
> - Set file size in the end of ovl_copy_up_inode().
>
> v3->v4:
> - Truncate var name of old_next_data_pos to data_pos.
> - Check hole only when data_pos < old_pos.
>
>  fs/overlayfs/copy_up.c | 41 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index b801c6353100..55f1e81507ca 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -123,6 +123,9 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>         loff_t old_pos = 0;
>         loff_t new_pos = 0;
>         loff_t cloned;
> +       loff_t data_pos = -1;
> +       loff_t hole_len;
> +       bool skip_hole = false;
>         int error = 0;
>
>         if (len == 0)
> @@ -144,7 +147,11 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>                 goto out;
>         /* Couldn't clone, so now we try to copy the data */
>
> -       /* FIXME: copy up sparse files efficiently */
> +       /* Check if lower fs supports seek operation */
> +       if (old_file->f_mode & FMODE_LSEEK &&
> +           old_file->f_op->llseek)
> +               skip_hole = true;
> +
>         while (len) {
>                 size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
>                 long bytes;
> @@ -157,6 +164,36 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>                         break;
>                 }
>
> +               /*
> +                * Fill zero for hole will cost unnecessary disk space
> +                * and meanwhile slow down the copy-up speed, so we do
> +                * an optimization for hole during copy-up, it relies
> +                * on SEEK_DATA implementation in lower fs so if lower
> +                * fs does not support it, copy-up will behave as before.
> +                *
> +                * Detail logic of hole detection as below:
> +                * When we detect next data position is larger than current
> +                * position we will skip that hole, otherwise we copy
> +                * data in the size of OVL_COPY_UP_CHUNK_SIZE. Actually,
> +                * it may not recognize all kind of holes and sometimes
> +                * only skips partial of hole area. However, it will be
> +                * enough for most of the use cases.
> +                */
> +
> +               if (skip_hole && data_pos < old_pos) {
> +                       data_pos = vfs_llseek(old_file, old_pos, SEEK_DATA);


Miklos,

Now that this change is on overlayfs-next, I realize that it triggers a new
lockdep warning on on of my nested overlay tests.

It's the same old story that was fixed in commit:
6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock

The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
sb_writers is held since ovl_maybe_copy_up() of nested overlay.

Since the lower overlay uses same real fs as nested overlay upper,
this could really deadlock if the lower overlay inode is being modified
(took inode mutex and trying to take real fs sb_writers).

Not a very common case, but still a possible deadlock.

The only way to avoid this deadlock is probably a bit too hacky for your taste:

        /* Skip copy hole optimization for nested overlay */
        if (old->mnt->mnt_sb->s_stack_depth)
                skip_hole = false;

The other way is to use ovl_inode_lock() in ovl_llseek().

Have any preference? Something else?

Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
inode members, but the real inode members from concurrent modification
through overlay.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file
  2019-12-12 15:42 ` Amir Goldstein
@ 2019-12-16 11:58   ` Miklos Szeredi
  2019-12-16 14:03     ` Amir Goldstein
  2019-12-21 11:36     ` Amir Goldstein
  0 siblings, 2 replies; 6+ messages in thread
From: Miklos Szeredi @ 2019-12-16 11:58 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chengguang Xu, overlayfs

On Thu, Dec 12, 2019 at 4:43 PM Amir Goldstein <amir73il@gmail.com> wrote:

> It's the same old story that was fixed in commit:
> 6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock
>
> The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
> sb_writers is held since ovl_maybe_copy_up() of nested overlay.
>
> Since the lower overlay uses same real fs as nested overlay upper,
> this could really deadlock if the lower overlay inode is being modified
> (took inode mutex and trying to take real fs sb_writers).
>
> Not a very common case, but still a possible deadlock.
>
> The only way to avoid this deadlock is probably a bit too hacky for your taste:
>
>         /* Skip copy hole optimization for nested overlay */
>         if (old->mnt->mnt_sb->s_stack_depth)
>                 skip_hole = false;
>
> The other way is to use ovl_inode_lock() in ovl_llseek().
>
> Have any preference? Something else?
>
> Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
> ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
> inode members, but the real inode members from concurrent modification
> through overlay.

Possibly.   I think this whole thing needs a good analysis of i_rwsem
use in overlayfs.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file
  2019-12-16 11:58   ` Miklos Szeredi
@ 2019-12-16 14:03     ` Amir Goldstein
  2019-12-17  6:27       ` Amir Goldstein
  2019-12-21 11:36     ` Amir Goldstein
  1 sibling, 1 reply; 6+ messages in thread
From: Amir Goldstein @ 2019-12-16 14:03 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chengguang Xu, overlayfs

On Mon, Dec 16, 2019 at 1:58 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Dec 12, 2019 at 4:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > It's the same old story that was fixed in commit:
> > 6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock
> >
> > The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
> > sb_writers is held since ovl_maybe_copy_up() of nested overlay.
> >
> > Since the lower overlay uses same real fs as nested overlay upper,
> > this could really deadlock if the lower overlay inode is being modified
> > (took inode mutex and trying to take real fs sb_writers).
> >
> > Not a very common case, but still a possible deadlock.
> >
> > The only way to avoid this deadlock is probably a bit too hacky for your taste:
> >
> >         /* Skip copy hole optimization for nested overlay */
> >         if (old->mnt->mnt_sb->s_stack_depth)
> >                 skip_hole = false;
> >
> > The other way is to use ovl_inode_lock() in ovl_llseek().
> >
> > Have any preference? Something else?
> >
> > Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
> > ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
> > inode members, but the real inode members from concurrent modification
> > through overlay.
>
> Possibly.   I think this whole thing needs a good analysis of i_rwsem
> use in overlayfs.
>

>From what I can find, besides those 3 instances in file.c, there are
two places I know of that access vfs_ functions on the overlay inode:

1. ovl_lookup_real_one(connected, ...), which takes the inode_lock itself
2. ovl_cache_update_ino(path, ...), which is called with inode_lock held

In those places the locking is intentional and looks correct.

And 3 more places that take inode_lock of overlay dir inode:
1. ovl_dir_llseek() - synchronize modifications to od->cache
    and synchronize modifications to real f_pos
2. ovl_dir_fsync() - synchronize modifications to od->upperfile.
3. ovl_dir_release() - synchronize modifications to od->cache.

Those 3 places were written before ovl_inode_lock existed.

real f_pos could be protected by ovl_inode_lock same as ovl_llseek().

od->upperfile could be protected by ovl_inode_lock same as copy up.

od->cache is created/accessed from ovl_iterate() with inode_lock
protection from vfs - I don't know if we want to change that to also take
ovl_inode_lock, so not sure if we have a good reason to change locking
in ovl dir operations.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file
  2019-12-16 14:03     ` Amir Goldstein
@ 2019-12-17  6:27       ` Amir Goldstein
  0 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2019-12-17  6:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chengguang Xu, overlayfs

On Mon, Dec 16, 2019 at 4:03 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Dec 16, 2019 at 1:58 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, Dec 12, 2019 at 4:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > It's the same old story that was fixed in commit:
> > > 6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock
> > >
> > > The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
> > > sb_writers is held since ovl_maybe_copy_up() of nested overlay.
> > >
> > > Since the lower overlay uses same real fs as nested overlay upper,
> > > this could really deadlock if the lower overlay inode is being modified
> > > (took inode mutex and trying to take real fs sb_writers).
> > >
> > > Not a very common case, but still a possible deadlock.
> > >
> > > The only way to avoid this deadlock is probably a bit too hacky for your taste:
> > >
> > >         /* Skip copy hole optimization for nested overlay */
> > >         if (old->mnt->mnt_sb->s_stack_depth)
> > >                 skip_hole = false;
> > >
> > > The other way is to use ovl_inode_lock() in ovl_llseek().
> > >
> > > Have any preference? Something else?
> > >
> > > Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
> > > ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
> > > inode members, but the real inode members from concurrent modification
> > > through overlay.
> >
> > Possibly.   I think this whole thing needs a good analysis of i_rwsem
> > use in overlayfs.
> >
>
> From what I can find, besides those 3 instances in file.c, there are
> two places I know of that access vfs_ functions on the overlay inode:
>
> 1. ovl_lookup_real_one(connected, ...), which takes the inode_lock itself
> 2. ovl_cache_update_ino(path, ...), which is called with inode_lock held
>
> In those places the locking is intentional and looks correct.
>
> And 3 more places that take inode_lock of overlay dir inode:
> 1. ovl_dir_llseek() - synchronize modifications to od->cache
>     and synchronize modifications to real f_pos
> 2. ovl_dir_fsync() - synchronize modifications to od->upperfile.
> 3. ovl_dir_release() - synchronize modifications to od->cache.
>
> Those 3 places were written before ovl_inode_lock existed.
>
> real f_pos could be protected by ovl_inode_lock same as ovl_llseek().
>
> od->upperfile could be protected by ovl_inode_lock same as copy up.
>
> od->cache is created/accessed from ovl_iterate() with inode_lock
> protection from vfs - I don't know if we want to change that to also take
> ovl_inode_lock, so not sure if we have a good reason to change locking
> in ovl dir operations.
>

On second thought, we can convert to protecting modifications of
od->cache and OVL_I(inode)->cache with ovl_inode_lock and then switch
to ->iterate_shared().

ovl_iterate() and ovl_seek_cursor() would walk on an elevated reference of
ovl_dir_cache instead of on od->cache directly.
impure dir cache would have to be refcounted as well.

Am I missing something?

I actually have one use case which may involve many concurrent readdirs
on a large directory, so this could be interesting for me to benchmark.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file
  2019-12-16 11:58   ` Miklos Szeredi
  2019-12-16 14:03     ` Amir Goldstein
@ 2019-12-21 11:36     ` Amir Goldstein
  1 sibling, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2019-12-21 11:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chengguang Xu, overlayfs

On Mon, Dec 16, 2019 at 1:58 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Dec 12, 2019 at 4:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > It's the same old story that was fixed in commit:
> > 6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock
> >
> > The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
> > sb_writers is held since ovl_maybe_copy_up() of nested overlay.
> >
> > Since the lower overlay uses same real fs as nested overlay upper,
> > this could really deadlock if the lower overlay inode is being modified
> > (took inode mutex and trying to take real fs sb_writers).
> >
> > Not a very common case, but still a possible deadlock.
> >
> > The only way to avoid this deadlock is probably a bit too hacky for your taste:
> >
> >         /* Skip copy hole optimization for nested overlay */
> >         if (old->mnt->mnt_sb->s_stack_depth)
> >                 skip_hole = false;
> >
> > The other way is to use ovl_inode_lock() in ovl_llseek().
> >
> > Have any preference? Something else?
> >
> > Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
> > ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
> > inode members, but the real inode members from concurrent modification
> > through overlay.
>

Using ovl_inode_lock() in ovl_write_iter() and ovl_ioctl_set_flags() is not
as simple as in ovl_llseek(). And it is less important because those call
can not be made on a lower overlay.

So I'll send patches to convert ovl_llseek() ovl_dir_llseek() and
ovl_dir_fsync()
to use ovl_inode_lock(), which seems simple and passes the tests.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-21 11:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 12:35 [PATCH v4] ovl: improving copy-up efficiency for big sparse file Chengguang Xu
2019-12-12 15:42 ` Amir Goldstein
2019-12-16 11:58   ` Miklos Szeredi
2019-12-16 14:03     ` Amir Goldstein
2019-12-17  6:27       ` Amir Goldstein
2019-12-21 11:36     ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).