All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading
@ 2018-07-04  6:13 Fam Zheng
  2018-07-04  6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng
  2018-07-04  6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng
  0 siblings, 2 replies; 5+ messages in thread
From: Fam Zheng @ 2018-07-04  6:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake,
	John Snow, Stefan Hajnoczi

Qcow2 allocates new clusters after the end of the file. If it is the destinaton
of copy offloading, we must adjust dst->bs->total_sectors. Otherwise, further
reads will drop to the "beyond EOF" code path and return zeroes, which problem
is caught by iotests 222.

Follow the logic in the normal write code and update bs->total_sectors after
I/O is done.

While at it, add a few convenient trace points to aid future debug experiences
in the topic.

Fam Zheng (2):
  block: Fix dst total_sectors after copy offloading
  block: Add copy offloading trace points

 block/file-posix.c | 2 ++
 block/io.c         | 6 ++++++
 block/iscsi.c      | 3 +++
 block/trace-events | 6 ++++++
 4 files changed, 17 insertions(+)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading
  2018-07-04  6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng
@ 2018-07-04  6:13 ` Fam Zheng
  2018-07-04  7:44   ` Kevin Wolf
  2018-07-04  6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng
  1 sibling, 1 reply; 5+ messages in thread
From: Fam Zheng @ 2018-07-04  6:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake,
	John Snow, Stefan Hajnoczi

This was noticed by the new image fleecing tests case 222. The issue is
apparent and we should just do the same right things as in
bdrv_aligned_pwritev.

Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/io.c b/block/io.c
index 1a2272fad3..8e02f4ab95 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
                                                   dst, dst_offset,
                                                   bytes, flags);
     }
+    if (ret == 0) {
+        int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE);
+        dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector);
+    }
     tracked_request_end(&src_req);
     tracked_request_end(&dst_req);
     bdrv_dec_in_flight(src->bs);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points
  2018-07-04  6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng
  2018-07-04  6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng
@ 2018-07-04  6:13 ` Fam Zheng
  1 sibling, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2018-07-04  6:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake,
	John Snow, Stefan Hajnoczi

A few trace points that can help reveal what is happening in a copy
offloading I/O path.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/file-posix.c | 2 ++
 block/io.c         | 2 ++
 block/iscsi.c      | 3 +++
 block/trace-events | 6 ++++++
 4 files changed, 13 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 829ee538d8..d3b1609410 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1488,6 +1488,8 @@ static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
         ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
                                       aiocb->aio_fd2, &out_off,
                                       bytes, 0);
+        trace_copy_file_range(aiocb->bs, aiocb->aio_fildes, in_off,
+                              aiocb->aio_fd2, out_off, bytes, 0, ret);
         if (ret == 0) {
             /* No progress (e.g. when beyond EOF), let the caller fall back to
              * buffer I/O. */
diff --git a/block/io.c b/block/io.c
index 8e02f4ab95..df930b982f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2964,6 +2964,7 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
                                          uint64_t bytes, BdrvRequestFlags flags)
 {
+    trace_bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, bytes, flags);
     return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
                                        bytes, flags, true);
 }
@@ -2976,6 +2977,7 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
                                        uint64_t bytes, BdrvRequestFlags flags)
 {
+    trace_bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
     return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
                                        bytes, flags, false);
 }
diff --git a/block/iscsi.c b/block/iscsi.c
index ead2bd5aa7..118555d051 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -44,6 +44,7 @@
 #include "qapi/qmp/qstring.h"
 #include "crypto/secret.h"
 #include "scsi/utils.h"
+#include "trace.h"
 
 /* Conflict between scsi/utils.h and libiscsi! :( */
 #define SCSI_XFER_NONE ISCSI_XFER_NONE
@@ -2396,6 +2397,8 @@ retry:
     }
 
 out_unlock:
+
+    trace_iscsi_xcopy(src_lun, src_offset, dst_lun, dst_offset, bytes, r);
     g_free(iscsi_task.task);
     qemu_mutex_unlock(&dst_lun->mutex);
     g_free(iscsi_task.err_str);
diff --git a/block/trace-events b/block/trace-events
index c35287b48a..1a25a997f2 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -15,6 +15,8 @@ bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs
 bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %"PRId64
+bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" flags 0x%x"
+bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" flags 0x%x"
 
 # block/stream.c
 stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
@@ -57,6 +59,7 @@ qmp_block_stream(void *bs, void *job) "bs %p job %p"
 # block/file-posix.c
 paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d"
 paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d"
+copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
 
 # block/qcow2.c
 qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
@@ -150,3 +153,6 @@ nvme_free_req_queue_wait(void *q) "q %p"
 nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d"
 nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 0x%"PRIx64
 nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] %p pages %d"
+
+# block/iscsi.c
+iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, uint64_t bytes, int ret) "src_lun %p offset %"PRIu64" dst_lun %p offset %"PRIu64" bytes %"PRIu64" ret %d"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading
  2018-07-04  6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng
@ 2018-07-04  7:44   ` Kevin Wolf
  2018-07-04  8:42     ` Fam Zheng
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2018-07-04  7:44 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, John Snow,
	Stefan Hajnoczi

Am 04.07.2018 um 08:13 hat Fam Zheng geschrieben:
> This was noticed by the new image fleecing tests case 222. The issue is
> apparent and we should just do the same right things as in
> bdrv_aligned_pwritev.
> 
> Reported-by: John Snow <jsnow@redhat.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>

> --- a/block/io.c
> +++ b/block/io.c
> @@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>                                                    dst, dst_offset,
>                                                    bytes, flags);
>      }
> +    if (ret == 0) {
> +        int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE);
> +        dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector);
> +    }

I think it's time to extract this stuff to a common function. I was
already aware that a write request that extends the image isn't behaving
exactly the same as bdrv_co_truncate(), and this one is bound to diverge
from the other two instances as well.

This is what bdrv_aligned_pwritev() does after the write:

    atomic_inc(&bs->write_gen);
    bdrv_set_dirty(bs, offset, bytes);

    stat64_max(&bs->wr_highest_offset, offset + bytes);

    if (ret >= 0) {
        bs->total_sectors = MAX(bs->total_sectors, end_sector);
        ret = 0;
    }

Before the write, it also calls bs->before_write_notifiers.

This is what bdrv_co_truncate() does after truncating the image:

    ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not refresh total sector count");
    } else {
        offset = bs->total_sectors * BDRV_SECTOR_SIZE;
    }
    bdrv_dirty_bitmap_truncate(bs, offset);
    bdrv_parent_cb_resize(bs);
    atomic_inc(&bs->write_gen);

This means that we probably have at least the following bugs in
bdrv_co_copy_range_internal():

1. bs->write_gen is not increased, a following flush is ignored
2. Dirty bitmaps are not dirtied
3. Dirty bitmaps are not resized when extending the image
4. bs->wr_highest_offset is not adjusted correctly
5. bs->total_sectors is not resized (the bug this patch fixes)
6. The parent node isn't notified about an image size change

Of these, 3. and 6. also seem to be bugs in bdrv_aligned_pwritev().

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading
  2018-07-04  7:44   ` Kevin Wolf
@ 2018-07-04  8:42     ` Fam Zheng
  0 siblings, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2018-07-04  8:42 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, John Snow,
	Stefan Hajnoczi

On Wed, 07/04 09:44, Kevin Wolf wrote:
> Am 04.07.2018 um 08:13 hat Fam Zheng geschrieben:
> > This was noticed by the new image fleecing tests case 222. The issue is
> > apparent and we should just do the same right things as in
> > bdrv_aligned_pwritev.
> > 
> > Reported-by: John Snow <jsnow@redhat.com>
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> >                                                    dst, dst_offset,
> >                                                    bytes, flags);
> >      }
> > +    if (ret == 0) {
> > +        int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE);
> > +        dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector);
> > +    }
> 
> I think it's time to extract this stuff to a common function. I was
> already aware that a write request that extends the image isn't behaving
> exactly the same as bdrv_co_truncate(), and this one is bound to diverge
> from the other two instances as well.
> 
> This is what bdrv_aligned_pwritev() does after the write:
> 
>     atomic_inc(&bs->write_gen);
>     bdrv_set_dirty(bs, offset, bytes);
> 
>     stat64_max(&bs->wr_highest_offset, offset + bytes);
> 
>     if (ret >= 0) {
>         bs->total_sectors = MAX(bs->total_sectors, end_sector);
>         ret = 0;
>     }
> 
> Before the write, it also calls bs->before_write_notifiers.
> 
> This is what bdrv_co_truncate() does after truncating the image:
> 
>     ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>     if (ret < 0) {
>         error_setg_errno(errp, -ret, "Could not refresh total sector count");
>     } else {
>         offset = bs->total_sectors * BDRV_SECTOR_SIZE;
>     }
>     bdrv_dirty_bitmap_truncate(bs, offset);
>     bdrv_parent_cb_resize(bs);
>     atomic_inc(&bs->write_gen);
> 
> This means that we probably have at least the following bugs in
> bdrv_co_copy_range_internal():
> 
> 1. bs->write_gen is not increased, a following flush is ignored
> 2. Dirty bitmaps are not dirtied
> 3. Dirty bitmaps are not resized when extending the image
> 4. bs->wr_highest_offset is not adjusted correctly
> 5. bs->total_sectors is not resized (the bug this patch fixes)
> 6. The parent node isn't notified about an image size change
> 
> Of these, 3. and 6. also seem to be bugs in bdrv_aligned_pwritev().
> 

Indeed, great insight. I'll work on v2.

Fam

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-04  8:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-04  6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng
2018-07-04  6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng
2018-07-04  7:44   ` Kevin Wolf
2018-07-04  8:42     ` Fam Zheng
2018-07-04  6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.