linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ceph: fix copy_file_range error path in short copies
@ 2020-02-05 10:28 Luis Henriques
  2020-02-05 11:16 ` Ilya Dryomov
  0 siblings, 1 reply; 8+ messages in thread
From: Luis Henriques @ 2020-02-05 10:28 UTC (permalink / raw)
  To: Jeff Layton, Sage Weil, Ilya Dryomov, Yan, Zheng, Gregory Farnum
  Cc: ceph-devel, linux-kernel, Luis Henriques, stable

When there's an error in the copying loop but some bytes have already been
copied into the destination file, it is necessary to dirty the caps and
eventually update the MDS with the file metadata (timestamps, size).  This
patch fixes this error path.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/file.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 11929d2bb594..7be47d24edb1 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 			CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
 		if (err) {
 			dout("ceph_osdc_copy_from returned %d\n", err);
-			if (!ret)
+			/*
+			 * If we haven't done any copy yet, just exit with the
+			 * error code; otherwise, return the number of bytes
+			 * already copied, update metadata and dirty caps.
+			 */
+			if (!ret) {
 				ret = err;
-			goto out_caps;
+				goto out_caps;
+			}
+			goto out_early;
 		}
 		len -= object_size;
 		src_off += object_size;
@@ -2118,6 +2125,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 		/* We still need one final local copy */
 		do_final_copy = true;
 
+out_early:
 	file_update_time(dst_file);
 	inode_inc_iversion_raw(dst_inode);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix copy_file_range error path in short copies
  2020-02-05 10:28 [PATCH] ceph: fix copy_file_range error path in short copies Luis Henriques
@ 2020-02-05 11:16 ` Ilya Dryomov
  2020-02-05 19:24   ` Luis Henriques
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Dryomov @ 2020-02-05 11:16 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Jeff Layton, Sage Weil, Yan, Zheng, Gregory Farnum,
	Ceph Development, LKML, stable

On Wed, Feb 5, 2020 at 11:28 AM Luis Henriques <lhenriques@suse.com> wrote:
>
> When there's an error in the copying loop but some bytes have already been
> copied into the destination file, it is necessary to dirty the caps and
> eventually update the MDS with the file metadata (timestamps, size).  This
> patch fixes this error path.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
>  fs/ceph/file.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 11929d2bb594..7be47d24edb1 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>                         CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
>                 if (err) {
>                         dout("ceph_osdc_copy_from returned %d\n", err);
> -                       if (!ret)
> +                       /*
> +                        * If we haven't done any copy yet, just exit with the
> +                        * error code; otherwise, return the number of bytes
> +                        * already copied, update metadata and dirty caps.
> +                        */
> +                       if (!ret) {
>                                 ret = err;
> -                       goto out_caps;
> +                               goto out_caps;
> +                       }
> +                       goto out_early;
>                 }
>                 len -= object_size;
>                 src_off += object_size;
> @@ -2118,6 +2125,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>                 /* We still need one final local copy */
>                 do_final_copy = true;
>
> +out_early:

out_early is misleading, especially given that there already
is out_caps, which just puts caps.  I suggest something like
update_dst_inode.

>         file_update_time(dst_file);
>         inode_inc_iversion_raw(dst_inode);
>

I think this is still buggy.  What follows is this:

        if (endoff > size) {
                int caps_flags = 0;

                /* Let the MDS know about dst file size change */
                if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
                        caps_flags |= CHECK_CAPS_NODELAY;
                if (ceph_inode_set_size(dst_inode, endoff))
                        caps_flags |= CHECK_CAPS_AUTHONLY;
                if (caps_flags)
                        ceph_check_caps(dst_ci, caps_flags, NULL);
        }

with endoff being:

        size = i_size_read(dst_inode);
        endoff = dst_off + len;

So a short copy effectively zero-fills the destination file...

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix copy_file_range error path in short copies
  2020-02-05 11:16 ` Ilya Dryomov
@ 2020-02-05 19:24   ` Luis Henriques
  2020-02-06 10:38     ` [PATCH v2] " Luis Henriques
  0 siblings, 1 reply; 8+ messages in thread
From: Luis Henriques @ 2020-02-05 19:24 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Jeff Layton, Sage Weil, Yan, Zheng, Gregory Farnum,
	Ceph Development, LKML, stable

On Wed, Feb 05, 2020 at 12:16:02PM +0100, Ilya Dryomov wrote:
> On Wed, Feb 5, 2020 at 11:28 AM Luis Henriques <lhenriques@suse.com> wrote:
> >
> > When there's an error in the copying loop but some bytes have already been
> > copied into the destination file, it is necessary to dirty the caps and
> > eventually update the MDS with the file metadata (timestamps, size).  This
> > patch fixes this error path.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Luis Henriques <lhenriques@suse.com>
> > ---
> >  fs/ceph/file.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 11929d2bb594..7be47d24edb1 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> >                         CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
> >                 if (err) {
> >                         dout("ceph_osdc_copy_from returned %d\n", err);
> > -                       if (!ret)
> > +                       /*
> > +                        * If we haven't done any copy yet, just exit with the
> > +                        * error code; otherwise, return the number of bytes
> > +                        * already copied, update metadata and dirty caps.
> > +                        */
> > +                       if (!ret) {
> >                                 ret = err;
> > -                       goto out_caps;
> > +                               goto out_caps;
> > +                       }
> > +                       goto out_early;
> >                 }
> >                 len -= object_size;
> >                 src_off += object_size;
> > @@ -2118,6 +2125,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> >                 /* We still need one final local copy */
> >                 do_final_copy = true;
> >
> > +out_early:
> 
> out_early is misleading, especially given that there already
> is out_caps, which just puts caps.  I suggest something like
> update_dst_inode.
> 
> >         file_update_time(dst_file);
> >         inode_inc_iversion_raw(dst_inode);
> >
> 
> I think this is still buggy.  What follows is this:
> 
>         if (endoff > size) {
>                 int caps_flags = 0;
> 
>                 /* Let the MDS know about dst file size change */
>                 if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
>                         caps_flags |= CHECK_CAPS_NODELAY;
>                 if (ceph_inode_set_size(dst_inode, endoff))
>                         caps_flags |= CHECK_CAPS_AUTHONLY;
>                 if (caps_flags)
>                         ceph_check_caps(dst_ci, caps_flags, NULL);
>         }
> 
> with endoff being:
> 
>         size = i_size_read(dst_inode);
>         endoff = dst_off + len;
> 
> So a short copy effectively zero-fills the destination file...

Ah!  What a surprise!  Yet another bug in copy_file_range.  /me hides

I guess that replacing 'endoff' by 'dst_off' in the 'if' statement above
(including the condition itself) should fix it.  But I start to think that
I'm biased and unable to see the most obvious issues with this code :-/

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] ceph: fix copy_file_range error path in short copies
  2020-02-05 19:24   ` Luis Henriques
@ 2020-02-06 10:38     ` Luis Henriques
  2020-02-06 11:46       ` Jeff Layton
  2020-02-10 18:38       ` Ilya Dryomov
  0 siblings, 2 replies; 8+ messages in thread
From: Luis Henriques @ 2020-02-06 10:38 UTC (permalink / raw)
  To: Jeff Layton, Sage Weil, Ilya Dryomov, Yan, Zheng, Gregory Farnum
  Cc: ceph-devel, linux-kernel, Luis Henriques, stable

When there's an error in the copying loop but some bytes have already been
copied into the destination file, it is necessary to dirty the caps and
eventually update the MDS with the file metadata (timestamps, size).  This
patch fixes this error path.

Another issue this patch fixes is the destination file size being reported
to the MDS.  If we're on the error path but the amount of bytes written
has already changed the destination file size, the offset to use is
dst_off and not endoff.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/file.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 11929d2bb594..f7f8cb6c243f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 			CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
 		if (err) {
 			dout("ceph_osdc_copy_from returned %d\n", err);
-			if (!ret)
+			/*
+			 * If we haven't done any copy yet, just exit with the
+			 * error code; otherwise, return the number of bytes
+			 * already copied, update metadata and dirty caps.
+			 */
+			if (!ret) {
 				ret = err;
-			goto out_caps;
+				goto out_caps;
+			}
+			goto update_dst_inode;
 		}
 		len -= object_size;
 		src_off += object_size;
@@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 		/* We still need one final local copy */
 		do_final_copy = true;
 
+update_dst_inode:
 	file_update_time(dst_file);
 	inode_inc_iversion_raw(dst_inode);
 
-	if (endoff > size) {
+	if (dst_off > size) {
 		int caps_flags = 0;
 
 		/* Let the MDS know about dst file size change */
-		if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
+		if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off))
 			caps_flags |= CHECK_CAPS_NODELAY;
-		if (ceph_inode_set_size(dst_inode, endoff))
+		if (ceph_inode_set_size(dst_inode, dst_off))
 			caps_flags |= CHECK_CAPS_AUTHONLY;
 		if (caps_flags)
 			ceph_check_caps(dst_ci, caps_flags, NULL);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ceph: fix copy_file_range error path in short copies
  2020-02-06 10:38     ` [PATCH v2] " Luis Henriques
@ 2020-02-06 11:46       ` Jeff Layton
  2020-02-10 18:38       ` Ilya Dryomov
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2020-02-06 11:46 UTC (permalink / raw)
  To: Luis Henriques, Sage Weil, Ilya Dryomov, Yan, Zheng, Gregory Farnum
  Cc: ceph-devel, linux-kernel, stable

On Thu, 2020-02-06 at 10:38 +0000, Luis Henriques wrote:
> When there's an error in the copying loop but some bytes have already been
> copied into the destination file, it is necessary to dirty the caps and
> eventually update the MDS with the file metadata (timestamps, size).  This
> patch fixes this error path.
> 
> Another issue this patch fixes is the destination file size being reported
> to the MDS.  If we're on the error path but the amount of bytes written
> has already changed the destination file size, the offset to use is
> dst_off and not endoff.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
>  fs/ceph/file.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 11929d2bb594..f7f8cb6c243f 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>  			CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
>  		if (err) {
>  			dout("ceph_osdc_copy_from returned %d\n", err);
> -			if (!ret)
> +			/*
> +			 * If we haven't done any copy yet, just exit with the
> +			 * error code; otherwise, return the number of bytes
> +			 * already copied, update metadata and dirty caps.
> +			 */
> +			if (!ret) {
>  				ret = err;
> -			goto out_caps;
> +				goto out_caps;
> +			}
> +			goto update_dst_inode;
>  		}
>  		len -= object_size;
>  		src_off += object_size;
> @@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>  		/* We still need one final local copy */
>  		do_final_copy = true;
>  
> +update_dst_inode:
>  	file_update_time(dst_file);
>  	inode_inc_iversion_raw(dst_inode);
>  
> -	if (endoff > size) {
> +	if (dst_off > size) {
>  		int caps_flags = 0;
>  
>  		/* Let the MDS know about dst file size change */
> -		if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
> +		if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off))
>  			caps_flags |= CHECK_CAPS_NODELAY;
> -		if (ceph_inode_set_size(dst_inode, endoff))
> +		if (ceph_inode_set_size(dst_inode, dst_off))
>  			caps_flags |= CHECK_CAPS_AUTHONLY;
>  		if (caps_flags)
>  			ceph_check_caps(dst_ci, caps_flags, NULL);

Looks good to me. Merged into ceph-client/testing. We'll see about
getting it in before 5.6 ships.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ceph: fix copy_file_range error path in short copies
  2020-02-06 10:38     ` [PATCH v2] " Luis Henriques
  2020-02-06 11:46       ` Jeff Layton
@ 2020-02-10 18:38       ` Ilya Dryomov
  2020-02-11 10:32         ` Luis Henriques
  1 sibling, 1 reply; 8+ messages in thread
From: Ilya Dryomov @ 2020-02-10 18:38 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Jeff Layton, Sage Weil, Yan, Zheng, Gregory Farnum,
	Ceph Development, LKML, stable

On Thu, Feb 6, 2020 at 11:38 AM Luis Henriques <lhenriques@suse.com> wrote:
>
> When there's an error in the copying loop but some bytes have already been
> copied into the destination file, it is necessary to dirty the caps and
> eventually update the MDS with the file metadata (timestamps, size).  This
> patch fixes this error path.
>
> Another issue this patch fixes is the destination file size being reported
> to the MDS.  If we're on the error path but the amount of bytes written
> has already changed the destination file size, the offset to use is
> dst_off and not endoff.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
>  fs/ceph/file.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 11929d2bb594..f7f8cb6c243f 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>                         CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
>                 if (err) {
>                         dout("ceph_osdc_copy_from returned %d\n", err);
> -                       if (!ret)
> +                       /*
> +                        * If we haven't done any copy yet, just exit with the
> +                        * error code; otherwise, return the number of bytes
> +                        * already copied, update metadata and dirty caps.
> +                        */
> +                       if (!ret) {
>                                 ret = err;
> -                       goto out_caps;
> +                               goto out_caps;
> +                       }
> +                       goto update_dst_inode;
>                 }
>                 len -= object_size;
>                 src_off += object_size;
> @@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>                 /* We still need one final local copy */
>                 do_final_copy = true;
>
> +update_dst_inode:
>         file_update_time(dst_file);
>         inode_inc_iversion_raw(dst_inode);
>
> -       if (endoff > size) {
> +       if (dst_off > size) {
>                 int caps_flags = 0;
>
>                 /* Let the MDS know about dst file size change */
> -               if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
> +               if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off))
>                         caps_flags |= CHECK_CAPS_NODELAY;
> -               if (ceph_inode_set_size(dst_inode, endoff))
> +               if (ceph_inode_set_size(dst_inode, dst_off))
>                         caps_flags |= CHECK_CAPS_AUTHONLY;
>                 if (caps_flags)
>                         ceph_check_caps(dst_ci, caps_flags, NULL);

Hi Luis,

I think this function still has short copy and file size issues:

- do_splice_direct() may write fewer bytes than requested, including
  nothing at all (i.e. return 0).  While we don't care about the second
  call much, handling the first call is crucial because proceeding to
  the copy-from loop with src/dst_off not at the object boundary will
  corrupt the destination file.

- size is set after caps are acquired for the first time and never
  updated.  But caps are dropped before do_splice_direct(), so by the
  time we get to dst_off > size check, it may be stale.  Again, data
  loss if e.g. old-size < dst_off < new-size because the destination
  file will get truncated...

Also, src/dst_oloc need to be freed with ceph_oloc_destroy() to avoid
leaking memory on namespace layouts.

It seems clear that this function needs to be split, with the new
loop around do_splice_direct() and the copy-from loop each going into
a separate functions with clear pre- and post-conditions.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ceph: fix copy_file_range error path in short copies
  2020-02-10 18:38       ` Ilya Dryomov
@ 2020-02-11 10:32         ` Luis Henriques
  2020-02-11 11:04           ` Ilya Dryomov
  0 siblings, 1 reply; 8+ messages in thread
From: Luis Henriques @ 2020-02-11 10:32 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Jeff Layton, Sage Weil, Yan, Zheng, Gregory Farnum,
	Ceph Development, LKML, stable

On Mon, Feb 10, 2020 at 07:38:10PM +0100, Ilya Dryomov wrote:
> On Thu, Feb 6, 2020 at 11:38 AM Luis Henriques <lhenriques@suse.com> wrote:
> >
> > When there's an error in the copying loop but some bytes have already been
> > copied into the destination file, it is necessary to dirty the caps and
> > eventually update the MDS with the file metadata (timestamps, size).  This
> > patch fixes this error path.
> >
> > Another issue this patch fixes is the destination file size being reported
> > to the MDS.  If we're on the error path but the amount of bytes written
> > has already changed the destination file size, the offset to use is
> > dst_off and not endoff.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Luis Henriques <lhenriques@suse.com>
> > ---
> >  fs/ceph/file.c | 18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 11929d2bb594..f7f8cb6c243f 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> >                         CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
> >                 if (err) {
> >                         dout("ceph_osdc_copy_from returned %d\n", err);
> > -                       if (!ret)
> > +                       /*
> > +                        * If we haven't done any copy yet, just exit with the
> > +                        * error code; otherwise, return the number of bytes
> > +                        * already copied, update metadata and dirty caps.
> > +                        */
> > +                       if (!ret) {
> >                                 ret = err;
> > -                       goto out_caps;
> > +                               goto out_caps;
> > +                       }
> > +                       goto update_dst_inode;
> >                 }
> >                 len -= object_size;
> >                 src_off += object_size;
> > @@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> >                 /* We still need one final local copy */
> >                 do_final_copy = true;
> >
> > +update_dst_inode:
> >         file_update_time(dst_file);
> >         inode_inc_iversion_raw(dst_inode);
> >
> > -       if (endoff > size) {
> > +       if (dst_off > size) {
> >                 int caps_flags = 0;
> >
> >                 /* Let the MDS know about dst file size change */
> > -               if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
> > +               if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off))
> >                         caps_flags |= CHECK_CAPS_NODELAY;
> > -               if (ceph_inode_set_size(dst_inode, endoff))
> > +               if (ceph_inode_set_size(dst_inode, dst_off))
> >                         caps_flags |= CHECK_CAPS_AUTHONLY;
> >                 if (caps_flags)
> >                         ceph_check_caps(dst_ci, caps_flags, NULL);
> 
> Hi Luis,
> 
> I think this function still has short copy and file size issues:
> 
> - do_splice_direct() may write fewer bytes than requested, including
>   nothing at all (i.e. return 0).  While we don't care about the second
>   call much, handling the first call is crucial because proceeding to
>   the copy-from loop with src/dst_off not at the object boundary will
>   corrupt the destination file.
> 
> - size is set after caps are acquired for the first time and never
>   updated.  But caps are dropped before do_splice_direct(), so by the
>   time we get to dst_off > size check, it may be stale.  Again, data
>   loss if e.g. old-size < dst_off < new-size because the destination
>   file will get truncated...
> 
> Also, src/dst_oloc need to be freed with ceph_oloc_destroy() to avoid
> leaking memory on namespace layouts.
> 
> It seems clear that this function needs to be split, with the new
> loop around do_splice_direct() and the copy-from loop each going into
> a separate functions with clear pre- and post-conditions.

Right, it makes sense to refactor this function and fix all these issues
you're pointing.  It'll be a pain because a lot of parameters will need to
be handed over into these new functions (maybe a new 'struct copy_desc'
can help making it a bit less messy).  Anyway, I'll try to spend some time
working on that and see what I can come up with.

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ceph: fix copy_file_range error path in short copies
  2020-02-11 10:32         ` Luis Henriques
@ 2020-02-11 11:04           ` Ilya Dryomov
  0 siblings, 0 replies; 8+ messages in thread
From: Ilya Dryomov @ 2020-02-11 11:04 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Jeff Layton, Sage Weil, Yan, Zheng, Gregory Farnum,
	Ceph Development, LKML, stable

On Tue, Feb 11, 2020 at 11:32 AM Luis Henriques <lhenriques@suse.com> wrote:
>
> On Mon, Feb 10, 2020 at 07:38:10PM +0100, Ilya Dryomov wrote:
> > On Thu, Feb 6, 2020 at 11:38 AM Luis Henriques <lhenriques@suse.com> wrote:
> > >
> > > When there's an error in the copying loop but some bytes have already been
> > > copied into the destination file, it is necessary to dirty the caps and
> > > eventually update the MDS with the file metadata (timestamps, size).  This
> > > patch fixes this error path.
> > >
> > > Another issue this patch fixes is the destination file size being reported
> > > to the MDS.  If we're on the error path but the amount of bytes written
> > > has already changed the destination file size, the offset to use is
> > > dst_off and not endoff.
> > >
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Luis Henriques <lhenriques@suse.com>
> > > ---
> > >  fs/ceph/file.c | 18 +++++++++++++-----
> > >  1 file changed, 13 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 11929d2bb594..f7f8cb6c243f 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -2104,9 +2104,16 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> > >                         CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
> > >                 if (err) {
> > >                         dout("ceph_osdc_copy_from returned %d\n", err);
> > > -                       if (!ret)
> > > +                       /*
> > > +                        * If we haven't done any copy yet, just exit with the
> > > +                        * error code; otherwise, return the number of bytes
> > > +                        * already copied, update metadata and dirty caps.
> > > +                        */
> > > +                       if (!ret) {
> > >                                 ret = err;
> > > -                       goto out_caps;
> > > +                               goto out_caps;
> > > +                       }
> > > +                       goto update_dst_inode;
> > >                 }
> > >                 len -= object_size;
> > >                 src_off += object_size;
> > > @@ -2118,16 +2125,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> > >                 /* We still need one final local copy */
> > >                 do_final_copy = true;
> > >
> > > +update_dst_inode:
> > >         file_update_time(dst_file);
> > >         inode_inc_iversion_raw(dst_inode);
> > >
> > > -       if (endoff > size) {
> > > +       if (dst_off > size) {
> > >                 int caps_flags = 0;
> > >
> > >                 /* Let the MDS know about dst file size change */
> > > -               if (ceph_quota_is_max_bytes_approaching(dst_inode, endoff))
> > > +               if (ceph_quota_is_max_bytes_approaching(dst_inode, dst_off))
> > >                         caps_flags |= CHECK_CAPS_NODELAY;
> > > -               if (ceph_inode_set_size(dst_inode, endoff))
> > > +               if (ceph_inode_set_size(dst_inode, dst_off))
> > >                         caps_flags |= CHECK_CAPS_AUTHONLY;
> > >                 if (caps_flags)
> > >                         ceph_check_caps(dst_ci, caps_flags, NULL);
> >
> > Hi Luis,
> >
> > I think this function still has short copy and file size issues:
> >
> > - do_splice_direct() may write fewer bytes than requested, including
> >   nothing at all (i.e. return 0).  While we don't care about the second
> >   call much, handling the first call is crucial because proceeding to
> >   the copy-from loop with src/dst_off not at the object boundary will
> >   corrupt the destination file.
> >
> > - size is set after caps are acquired for the first time and never
> >   updated.  But caps are dropped before do_splice_direct(), so by the
> >   time we get to dst_off > size check, it may be stale.  Again, data
> >   loss if e.g. old-size < dst_off < new-size because the destination
> >   file will get truncated...
> >
> > Also, src/dst_oloc need to be freed with ceph_oloc_destroy() to avoid
> > leaking memory on namespace layouts.
> >
> > It seems clear that this function needs to be split, with the new
> > loop around do_splice_direct() and the copy-from loop each going into
> > a separate functions with clear pre- and post-conditions.
>
> Right, it makes sense to refactor this function and fix all these issues
> you're pointing.  It'll be a pain because a lot of parameters will need to
> be handed over into these new functions (maybe a new 'struct copy_desc'
> can help making it a bit less messy).  Anyway, I'll try to spend some time
> working on that and see what I can come up with.

Yeah, this code really needs more work and extensive verification.

I'm dropping this patch because it's only a partial fix.  Backporting
it alone, with known data corruption issues remaining (and without the
copy-from2 patch that fixes another data corruption issue that is much
easier to hit), doesn't make sense.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-11 11:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 10:28 [PATCH] ceph: fix copy_file_range error path in short copies Luis Henriques
2020-02-05 11:16 ` Ilya Dryomov
2020-02-05 19:24   ` Luis Henriques
2020-02-06 10:38     ` [PATCH v2] " Luis Henriques
2020-02-06 11:46       ` Jeff Layton
2020-02-10 18:38       ` Ilya Dryomov
2020-02-11 10:32         ` Luis Henriques
2020-02-11 11:04           ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).