All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist)
@ 2021-09-04 16:24 Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 01/11] block/reqlist: drop extra assertion Vladimir Sementsov-Ogievskiy
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

Hi all!

This is a new (third :)) variant of solving the problem in qcow2 that we
lack synchronisation in reallocating host clusters with refcount=0 and
guest reads/writes that may still be in progress on these clusters.

Previous version was
"[PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless)"
Supersedes: <20210422163046.442932-1-vsementsov@virtuozzo.com>

Key features of new solution:

1. Simpler data structure: requests list instead of "dynamic refcount"
   concept. What is good is that request list data structure is
   implemented and used in other my series. So we can improve and reuse
   it.

2. Don't care about discrads: my previous attempts were complicated by
   trying to postpone discards when we have in-flight operations on
   cluster which refcounts becomes 0. Don't bother with it anymore.
   Nothing bad in discard: it similar as when guest does simultaneous
   writes into same cluster: we don't care to serialize these writes.
   So keep discards as is.

So now the whole fix may be described in one sentence: don't consider a
host cluster "free" (for allocation) if there are still some in-flight
guest operations on it.

What patches are:

1-2: simple additions to reqlist (see also *)

3: test. It's unchanged since previous solution

4-5: simpler refactorings

6-7: implement guest requests tracking in qcow2

8: refactoring for [9]
9: BUG FIX: use request tracking to avoid reallocating clusters under
   guest operation

10-11: improvement if request tracking to avoid extra small critical
sections that were added in [7]

[*]:
For this series to work we also need
 "[PATCH v2 06/19] block: intoduce reqlist":
Based-on: <20210827181808.311670-7-vsementsov@virtuozzo.com>

Still, note that we use only simple core of reqlist and don't use its
coroutine-related features. That probably means that I should split a
kind of BlockReqListBase data structure out of BlockReqList, and then
will have separate

CoBlockReqList for "[PATCH v2 00/19] Make image fleecing more usable"
series and Qcow2ReqList for this series..

But that a side question.

Vladimir Sementsov-Ogievskiy (11):
  block/reqlist: drop extra assertion
  block/reqlist: add reqlist_new_req() and reqlist_free_req()
  iotests: add qcow2-discard-during-rewrite
  qcow2: introduce qcow2_parse_compressed_cluster_descriptor()
  qcow2: refactor qcow2_co_preadv_task() to have one return
  qcow2: prepare for tracking guest io requests in data_file
  qcow2: track guest io requests in data_file
  qcow2: introduce is_cluster_free() helper
  qcow2: don't reallocate host clusters under guest operation
  block/reqlist: implement reqlist_mark_req_invalid()
  qcow2: use reqlist_mark_req_invalid()

 block/qcow2.h                                 |  15 ++-
 include/block/reqlist.h                       |  33 ++++++
 block/qcow2-cluster.c                         |  45 ++++++-
 block/qcow2-refcount.c                        |  34 +++++-
 block/qcow2.c                                 | 112 +++++++++++++-----
 block/reqlist.c                               |  25 +++-
 .../tests/qcow2-discard-during-rewrite        |  72 +++++++++++
 .../tests/qcow2-discard-during-rewrite.out    |  21 ++++
 8 files changed, 310 insertions(+), 47 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
 create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out

-- 
2.29.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v7 01/11] block/reqlist: drop extra assertion
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 02/11] block/reqlist: add reqlist_new_req() and reqlist_free_req() Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We are going to reuse reqlist in conditions where intersecting requests
are possible in the list. So, drop the assertion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/reqlist.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/reqlist.c b/block/reqlist.c
index 5e320ba649..c580752db7 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -19,8 +19,6 @@
 void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
                       int64_t bytes)
 {
-    assert(!reqlist_find_conflict(reqs, offset, bytes));
-
     *req = (BlockReq) {
         .offset = offset,
         .bytes = bytes,
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 02/11] block/reqlist: add reqlist_new_req() and reqlist_free_req()
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 01/11] block/reqlist: drop extra assertion Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 03/11] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

Add convenient helpers for heap allocation of requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index b904d80216..32dc87666f 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -64,4 +64,24 @@ void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
  */
 void coroutine_fn reqlist_remove_req(BlockReq *req);
 
+/* Allocate, initialize and add to the list new request. */
+static inline BlockReq *reqlist_new_req(BlockReqList *reqs, uint64_t offset,
+                                        uint64_t bytes)
+{
+    BlockReq *req = g_new(BlockReq, 1);
+
+    reqlist_init_req(reqs, req, offset, bytes);
+
+    return req;
+}
+
+/* Remove request and g_free it. */
+static inline void reqlist_free_req(BlockReq *req)
+{
+    if (req) {
+        reqlist_remove_req(req);
+        g_free(req);
+    }
+}
+
 #endif /* REQLIST_H */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 03/11] iotests: add qcow2-discard-during-rewrite
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 01/11] block/reqlist: drop extra assertion Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 02/11] block/reqlist: add reqlist_new_req() and reqlist_free_req() Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 04/11] qcow2: introduce qcow2_parse_compressed_cluster_descriptor() Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

Simple test:
 - start writing to allocated cluster A
 - discard this cluster
 - write to another unallocated cluster B (it's allocated in same place
   where A was allocated)
 - continue writing to A

For now last action pollutes cluster B which is a bug fixed by the
following commit.

For now, add test to "disabled" group, so that it doesn't run
automatically.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 .../tests/qcow2-discard-during-rewrite        | 72 +++++++++++++++++++
 .../tests/qcow2-discard-during-rewrite.out    | 21 ++++++
 2 files changed, 93 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
 create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out

diff --git a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
new file mode 100755
index 0000000000..7f0d8a107a
--- /dev/null
+++ b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
@@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+# group: quick disabled
+#
+# Test discarding (and reusing) host cluster during writing data to it.
+#
+# Copyright (c) 2021 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=vsementsov@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./../common.rc
+. ./../common.filter
+
+_supported_fmt qcow2
+_supported_proto file fuse
+_supported_os Linux
+
+size=1M
+_make_test_img $size
+
+(
+cat <<EOF
+write -P 1 0 64K
+
+break pwritev A
+aio_write -P 2 0 64K
+wait_break A
+
+discard 0 64K
+write -P 3 128K 64K
+read -P 3 128K 64K
+
+break pwritev_done B
+resume A
+wait_break B
+resume B
+
+read -P 0 0 64K
+read -P 3 128K 64K
+EOF
+) | $QEMU_IO blkdebug::$TEST_IMG | _filter_qemu_io
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out
new file mode 100644
index 0000000000..8e75b2fbff
--- /dev/null
+++ b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out
@@ -0,0 +1,21 @@
+QA output created by qcow2-discard-during-rewrite
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+blkdebug: Suspended request 'A'
+discard 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 131072
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 131072
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+blkdebug: Resuming request 'A'
+blkdebug: Suspended request 'B'
+blkdebug: Resuming request 'B'
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 131072
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 04/11] qcow2: introduce qcow2_parse_compressed_cluster_descriptor()
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 03/11] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 05/11] qcow2: refactor qcow2_co_preadv_task() to have one return Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

This functionality will be reused later. Let's make a separate function
now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h |  4 ++++
 block/qcow2.c | 21 ++++++++++++++++-----
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0fe5f74ed3..4859ca3d0d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -914,6 +914,10 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                           uint64_t offset,
                                           int compressed_size,
                                           uint64_t *host_offset);
+void qcow2_parse_compressed_cluster_descriptor(BDRVQcow2State *s,
+                                               uint64_t cluster_descriptor,
+                                               uint64_t *coffset,
+                                               int *csize);
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
 void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m);
diff --git a/block/qcow2.c b/block/qcow2.c
index 9f1b6461c8..2095188b6f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4691,6 +4691,19 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
     return ret;
 }
 
+void qcow2_parse_compressed_cluster_descriptor(BDRVQcow2State *s,
+                                               uint64_t cluster_descriptor,
+                                               uint64_t *coffset,
+                                               int *csize)
+{
+    int nb_csectors;
+
+    *coffset = cluster_descriptor & s->cluster_offset_mask;
+    nb_csectors = ((cluster_descriptor >> s->csize_shift) & s->csize_mask) + 1;
+    *csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
+        (*coffset & ~QCOW2_COMPRESSED_SECTOR_MASK);
+}
+
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
                            uint64_t cluster_descriptor,
@@ -4700,15 +4713,13 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
                            size_t qiov_offset)
 {
     BDRVQcow2State *s = bs->opaque;
-    int ret = 0, csize, nb_csectors;
+    int ret = 0, csize;
     uint64_t coffset;
     uint8_t *buf, *out_buf;
     int offset_in_cluster = offset_into_cluster(s, offset);
 
-    coffset = cluster_descriptor & s->cluster_offset_mask;
-    nb_csectors = ((cluster_descriptor >> s->csize_shift) & s->csize_mask) + 1;
-    csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
-        (coffset & ~QCOW2_COMPRESSED_SECTOR_MASK);
+    qcow2_parse_compressed_cluster_descriptor(s, cluster_descriptor, &coffset,
+                                              &csize);
 
     buf = g_try_malloc(csize);
     if (!buf) {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 05/11] qcow2: refactor qcow2_co_preadv_task() to have one return
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 04/11] qcow2: introduce qcow2_parse_compressed_cluster_descriptor() Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 06/11] qcow2: prepare for tracking guest io requests in data_file Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We are going to add an action before return for this function. Refactor
function now to make further patch simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2095188b6f..7fbcc600da 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2262,6 +2262,7 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                              size_t qiov_offset)
 {
     BDRVQcow2State *s = bs->opaque;
+    int ret;
 
     switch (subc_type) {
     case QCOW2_SUBCLUSTER_ZERO_PLAIN:
@@ -2274,28 +2275,31 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
-        return bdrv_co_preadv_part(bs->backing, offset, bytes,
-                                   qiov, qiov_offset, 0);
+        ret = bdrv_co_preadv_part(bs->backing, offset, bytes,
+                                  qiov, qiov_offset, 0);
+        break;
 
     case QCOW2_SUBCLUSTER_COMPRESSED:
-        return qcow2_co_preadv_compressed(bs, host_offset,
-                                          offset, bytes, qiov, qiov_offset);
+        ret = qcow2_co_preadv_compressed(bs, host_offset,
+                                         offset, bytes, qiov, qiov_offset);
+        break;
 
     case QCOW2_SUBCLUSTER_NORMAL:
         if (bs->encrypted) {
-            return qcow2_co_preadv_encrypted(bs, host_offset,
-                                             offset, bytes, qiov, qiov_offset);
+            ret = qcow2_co_preadv_encrypted(bs, host_offset,
+                                            offset, bytes, qiov, qiov_offset);
+        } else {
+            BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+            ret = bdrv_co_preadv_part(s->data_file, host_offset,
+                                      bytes, qiov, qiov_offset, 0);
         }
-
-        BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-        return bdrv_co_preadv_part(s->data_file, host_offset,
-                                   bytes, qiov, qiov_offset, 0);
+        break;
 
     default:
         g_assert_not_reached();
     }
 
-    g_assert_not_reached();
+    return ret;
 }
 
 static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 06/11] qcow2: prepare for tracking guest io requests in data_file
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 05/11] qcow2: refactor qcow2_co_preadv_task() to have one return Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 07/11] qcow2: track " Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We are going to fix a bug of reallocating host cluster that are under
guest operation. For this we need to track these operations. Guest io
operations in data_file has 3 entry points:

  qcow2_get_host_offset()
  qcow2_alloc_host_offset()
  qcow2_alloc_compressed_cluster_offset()

These functions provides the offset in data_file. So for now, add a
possibility for these function to start a BlockReq.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h         | 11 ++++++++---
 block/qcow2-cluster.c | 45 ++++++++++++++++++++++++++++++++++++++++---
 block/qcow2.c         | 17 ++++++++--------
 3 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4859ca3d0d..7b9fafc6ec 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -29,6 +29,7 @@
 #include "qemu/coroutine.h"
 #include "qemu/units.h"
 #include "block/block_int.h"
+#include "block/reqlist.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -420,6 +421,8 @@ typedef struct BDRVQcow2State {
      * is to convert the image with the desired compression type set.
      */
     Qcow2CompressionType compression_type;
+
+    BlockReqList guest_reqs;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -906,14 +909,16 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
                           unsigned int *bytes, uint64_t *host_offset,
-                          QCow2SubclusterType *subcluster_type);
+                          QCow2SubclusterType *subcluster_type,
+                          BlockReq **req);
 int qcow2_alloc_host_offset(BlockDriverState *bs, uint64_t offset,
                             unsigned int *bytes, uint64_t *host_offset,
-                            QCowL2Meta **m);
+                            QCowL2Meta **m, BlockReq **req);
 int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                           uint64_t offset,
                                           int compressed_size,
-                                          uint64_t *host_offset);
+                                          uint64_t *host_offset,
+                                          BlockReq **req);
 void qcow2_parse_compressed_cluster_descriptor(BDRVQcow2State *s,
                                                uint64_t cluster_descriptor,
                                                uint64_t *coffset,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index bd0597842f..9887f80dcc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -567,11 +567,17 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
  * file. The subcluster type is stored in *subcluster_type.
  * Compressed clusters are always processed one by one.
  *
+ * On success if req is non-NULL and resulting subcluster type is
+ * QCOW2_SUBCLUSTER_NORMAL or QCOW2_SUBCLUSTER_COMPRESSED req is allocated and
+ * initialized. For other cluster types req is set to NULL.
+ * On failure req is untouched.
+ *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
                           unsigned int *bytes, uint64_t *host_offset,
-                          QCow2SubclusterType *subcluster_type)
+                          QCow2SubclusterType *subcluster_type,
+                          BlockReq **req)
 {
     BDRVQcow2State *s = bs->opaque;
     unsigned int l2_index, sc_index;
@@ -721,6 +727,21 @@ out:
 
     *subcluster_type = type;
 
+    if (req) {
+        if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+            uint64_t coffset;
+            int csize;
+
+            qcow2_parse_compressed_cluster_descriptor(s, *host_offset, &coffset,
+                                                      &csize);
+            *req = reqlist_new_req(&s->guest_reqs, coffset, csize);
+        } else if (type == QCOW2_SUBCLUSTER_NORMAL) {
+            *req = reqlist_new_req(&s->guest_reqs, *host_offset, *bytes);
+        } else {
+            *req = NULL;
+        }
+    }
+
     return 0;
 
 fail:
@@ -809,11 +830,16 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
  * already allocated at the offset, return an error.
  *
  * Return 0 on success and -errno in error cases
+ *
+ * On success if req is non-NULL req is allocated and initialized.
+ * On failure req is untouched.
+ *
  */
 int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                           uint64_t offset,
                                           int compressed_size,
-                                          uint64_t *host_offset)
+                                          uint64_t *host_offset,
+                                          BlockReq **req)
 {
     BDRVQcow2State *s = bs->opaque;
     int l2_index, ret;
@@ -868,6 +894,11 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
+
+    if (req) {
+        *req = reqlist_new_req(&s->guest_reqs, *host_offset, compressed_size);
+    }
+
     return 0;
 }
 
@@ -1740,10 +1771,14 @@ out:
  * is queued and will be reentered when the dependency has completed.
  *
  * Return 0 on success and -errno in error cases
+ *
+ * On success if req is non-NULL req is allocated and initialized.
+ * On failure req is untouched.
+ *
  */
 int qcow2_alloc_host_offset(BlockDriverState *bs, uint64_t offset,
                             unsigned int *bytes, uint64_t *host_offset,
-                            QCowL2Meta **m)
+                            QCowL2Meta **m, BlockReq **req)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t start, remaining;
@@ -1850,6 +1885,10 @@ again:
     assert(offset_into_cluster(s, *host_offset) ==
            offset_into_cluster(s, offset));
 
+    if (req) {
+        *req = reqlist_new_req(&s->guest_reqs, *host_offset, *bytes);
+    }
+
     return 0;
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 7fbcc600da..8aa5679fe9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1833,6 +1833,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
 #endif
 
     qemu_co_queue_init(&s->thread_task_queue);
+    QLIST_INIT(&s->guest_reqs);
 
     return ret;
 
@@ -2090,7 +2091,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
 
     bytes = MIN(INT_MAX, count);
-    ret = qcow2_get_host_offset(bs, offset, &bytes, &host_offset, &type);
+    ret = qcow2_get_host_offset(bs, offset, &bytes, &host_offset, &type, NULL);
     qemu_co_mutex_unlock(&s->lock);
     if (ret < 0) {
         return ret;
@@ -2335,7 +2336,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         qemu_co_mutex_lock(&s->lock);
         ret = qcow2_get_host_offset(bs, offset, &cur_bytes,
-                                    &host_offset, &type);
+                                    &host_offset, &type, NULL);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
             goto out;
@@ -2629,7 +2630,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         qemu_co_mutex_lock(&s->lock);
 
         ret = qcow2_alloc_host_offset(bs, offset, &cur_bytes,
-                                      &host_offset, &l2meta);
+                                      &host_offset, &l2meta, NULL);
         if (ret < 0) {
             goto out_locked;
         }
@@ -3170,7 +3171,7 @@ static int coroutine_fn preallocate_co(BlockDriverState *bs, uint64_t offset,
     while (bytes) {
         cur_bytes = MIN(bytes, QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size));
         ret = qcow2_alloc_host_offset(bs, offset, &cur_bytes,
-                                      &host_offset, &meta);
+                                      &host_offset, &meta, NULL);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Allocating clusters failed");
             goto out;
@@ -3976,7 +3977,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         offset -= head;
         bytes = s->subcluster_size;
         nr = s->subcluster_size;
-        ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
+        ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type, NULL);
         if (ret < 0 ||
             (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
              type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
@@ -4051,7 +4052,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         cur_write_flags = write_flags;
 
         ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes,
-                                    &copy_offset, &type);
+                                    &copy_offset, &type, NULL);
         if (ret < 0) {
             goto out;
         }
@@ -4138,7 +4139,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
          * the refcnt, without copying user data.
          * Or if src->bs == dst->bs->backing->bs, we could copy by discarding. */
         ret = qcow2_alloc_host_offset(bs, dst_offset, &cur_bytes,
-                                      &host_offset, &l2meta);
+                                      &host_offset, &l2meta, NULL);
         if (ret < 0) {
             goto fail;
         }
@@ -4593,7 +4594,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
 
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_alloc_compressed_cluster_offset(bs, offset, out_len,
-                                                &cluster_offset);
+                                                &cluster_offset, NULL);
     if (ret < 0) {
         qemu_co_mutex_unlock(&s->lock);
         goto fail;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 07/11] qcow2: track guest io requests in data_file
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 06/11] qcow2: prepare for tracking guest io requests in data_file Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 08/11] qcow2: introduce is_cluster_free() helper Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We are going to fix a bug of reallocating host cluster that are under
guest operation. For this we need to track these operations.

So, we create BlockReq objects during guest writing and reading data.

That's important for synchronization with further host clusters
reallocation code that we create BlockReq object in same s->lock
critical section where we get an offset.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 58 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8aa5679fe9..aefe6558b6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2212,6 +2212,8 @@ typedef struct Qcow2AioTask {
     QEMUIOVector *qiov;
     uint64_t qiov_offset;
     QCowL2Meta *l2meta; /* only for write */
+
+    BlockReq *req;
 } Qcow2AioTask;
 
 static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
@@ -2224,7 +2226,8 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
                                        uint64_t bytes,
                                        QEMUIOVector *qiov,
                                        size_t qiov_offset,
-                                       QCowL2Meta *l2meta)
+                                       QCowL2Meta *l2meta,
+                                       BlockReq *req)
 {
     Qcow2AioTask local_task;
     Qcow2AioTask *task = pool ? g_new(Qcow2AioTask, 1) : &local_task;
@@ -2239,6 +2242,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
         .bytes = bytes,
         .qiov_offset = qiov_offset,
         .l2meta = l2meta,
+        .req = req,
     };
 
     trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
@@ -2260,7 +2264,8 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                              uint64_t host_offset,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
-                                             size_t qiov_offset)
+                                             size_t qiov_offset,
+                                             BlockReq *req)
 {
     BDRVQcow2State *s = bs->opaque;
     int ret;
@@ -2300,6 +2305,12 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
         g_assert_not_reached();
     }
 
+    if (req) {
+        WITH_QEMU_LOCK_GUARD(&s->lock) {
+            reqlist_free_req(req);
+        }
+    }
+
     return ret;
 }
 
@@ -2311,7 +2322,7 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
 
     return qcow2_co_preadv_task(t->bs, t->subcluster_type,
                                 t->host_offset, t->offset, t->bytes,
-                                t->qiov, t->qiov_offset);
+                                t->qiov, t->qiov_offset, t->req);
 }
 
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
@@ -2327,6 +2338,8 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     AioTaskPool *aio = NULL;
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
+        BlockReq *req = NULL;
+
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         if (s->crypto) {
@@ -2336,7 +2349,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         qemu_co_mutex_lock(&s->lock);
         ret = qcow2_get_host_offset(bs, offset, &cur_bytes,
-                                    &host_offset, &type, NULL);
+                                    &host_offset, &type, &req);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
             goto out;
@@ -2354,7 +2367,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
             }
             ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, type,
                                  host_offset, offset, cur_bytes,
-                                 qiov, qiov_offset, NULL);
+                                 qiov, qiov_offset, NULL, req);
             if (ret < 0) {
                 goto out;
             }
@@ -2523,7 +2536,8 @@ static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
                                               uint64_t offset, uint64_t bytes,
                                               QEMUIOVector *qiov,
                                               uint64_t qiov_offset,
-                                              QCowL2Meta *l2meta)
+                                              QCowL2Meta *l2meta,
+                                              BlockReq *req)
 {
     int ret;
     BDRVQcow2State *s = bs->opaque;
@@ -2582,6 +2596,9 @@ out_unlocked:
 
 out_locked:
     qcow2_handle_l2meta(bs, &l2meta, false);
+
+    reqlist_free_req(req);
+
     qemu_co_mutex_unlock(&s->lock);
 
     qemu_vfree(crypt_buf);
@@ -2597,7 +2614,7 @@ static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
 
     return qcow2_co_pwritev_task(t->bs, t->host_offset,
                                  t->offset, t->bytes, t->qiov, t->qiov_offset,
-                                 t->l2meta);
+                                 t->l2meta, t->req);
 }
 
 static coroutine_fn int qcow2_co_pwritev_part(
@@ -2615,6 +2632,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
     trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
+        BlockReq *req;
 
         l2meta = NULL;
 
@@ -2630,7 +2648,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         qemu_co_mutex_lock(&s->lock);
 
         ret = qcow2_alloc_host_offset(bs, offset, &cur_bytes,
-                                      &host_offset, &l2meta, NULL);
+                                      &host_offset, &l2meta, &req);
         if (ret < 0) {
             goto out_locked;
         }
@@ -2638,6 +2656,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         ret = qcow2_pre_write_overlap_check(bs, 0, host_offset,
                                             cur_bytes, true);
         if (ret < 0) {
+            reqlist_free_req(req);
             goto out_locked;
         }
 
@@ -2648,7 +2667,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         }
         ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_task_entry, 0,
                              host_offset, offset,
-                             cur_bytes, qiov, qiov_offset, l2meta);
+                             cur_bytes, qiov, qiov_offset, l2meta, req);
         l2meta = NULL; /* l2meta is consumed by qcow2_co_pwritev_task() */
         if (ret < 0) {
             goto fail_nometa;
@@ -4045,6 +4064,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
     qemu_co_mutex_lock(&s->lock);
 
     while (bytes != 0) {
+        BlockReq *req = NULL;
         uint64_t copy_offset = 0;
         QCow2SubclusterType type;
         /* prepare next request */
@@ -4052,7 +4072,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         cur_write_flags = write_flags;
 
         ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes,
-                                    &copy_offset, &type, NULL);
+                                    &copy_offset, &type, &req);
         if (ret < 0) {
             goto out;
         }
@@ -4080,6 +4100,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
             break;
 
         case QCOW2_SUBCLUSTER_COMPRESSED:
+            reqlist_free_req(req);
             ret = -ENOTSUP;
             goto out;
 
@@ -4096,6 +4117,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
                                       dst, dst_offset,
                                       cur_bytes, read_flags, cur_write_flags);
         qemu_co_mutex_lock(&s->lock);
+        reqlist_free_req(req);
         if (ret < 0) {
             goto out;
         }
@@ -4129,6 +4151,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
     qemu_co_mutex_lock(&s->lock);
 
     while (bytes != 0) {
+        BlockReq *req;
 
         l2meta = NULL;
 
@@ -4139,7 +4162,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
          * the refcnt, without copying user data.
          * Or if src->bs == dst->bs->backing->bs, we could copy by discarding. */
         ret = qcow2_alloc_host_offset(bs, dst_offset, &cur_bytes,
-                                      &host_offset, &l2meta, NULL);
+                                      &host_offset, &l2meta, &req);
         if (ret < 0) {
             goto fail;
         }
@@ -4147,6 +4170,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
         ret = qcow2_pre_write_overlap_check(bs, 0, host_offset, cur_bytes,
                                             true);
         if (ret < 0) {
+            reqlist_free_req(req);
             goto fail;
         }
 
@@ -4154,6 +4178,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
         ret = bdrv_co_copy_range_to(src, src_offset, s->data_file, host_offset,
                                     cur_bytes, read_flags, write_flags);
         qemu_co_mutex_lock(&s->lock);
+        reqlist_free_req(req);
         if (ret < 0) {
             goto fail;
         }
@@ -4565,6 +4590,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
     ssize_t out_len;
     uint8_t *buf, *out_buf;
     uint64_t cluster_offset;
+    BlockReq *req = NULL;
 
     assert(bytes == s->cluster_size || (bytes < s->cluster_size &&
            (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS)));
@@ -4594,7 +4620,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
 
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_alloc_compressed_cluster_offset(bs, offset, out_len,
-                                                &cluster_offset, NULL);
+                                                &cluster_offset, &req);
     if (ret < 0) {
         qemu_co_mutex_unlock(&s->lock);
         goto fail;
@@ -4614,6 +4640,11 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
 success:
     ret = 0;
 fail:
+    if (req) {
+        WITH_QEMU_LOCK_GUARD(&s->lock) {
+            reqlist_free_req(req);
+        }
+    }
     qemu_vfree(buf);
     g_free(out_buf);
     return ret;
@@ -4676,7 +4707,8 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
         }
 
         ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_compressed_task_entry,
-                             0, 0, offset, chunk_size, qiov, qiov_offset, NULL);
+                             0, 0, offset, chunk_size, qiov, qiov_offset, NULL,
+                             NULL);
         if (ret < 0) {
             break;
         }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 08/11] qcow2: introduce is_cluster_free() helper
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 07/11] qcow2: track " Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 09/11] qcow2: don't reallocate host clusters under guest operation Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We are going to change the concept of "free host cluster", so let's
clarify it now and add a helper, which we will modify later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8e649b008e..13b1fed43e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -961,13 +961,32 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,
 /* cluster allocation functions */
 
 
+/*
+ * Cluster is free when its refcount is 0
+ *
+ * Return < 0 if failed to get refcount
+ *          0 if cluster is not free
+ *          1 if cluster is free
+ */
+static int is_cluster_free(BlockDriverState *bs, int64_t cluster_index)
+{
+    int ret;
+    uint64_t refcount;
+
+    ret = qcow2_get_refcount(bs, cluster_index, &refcount);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return refcount == 0;
+}
 
 /* return < 0 if error */
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
                                     uint64_t max)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t i, nb_clusters, refcount;
+    uint64_t i, nb_clusters;
     int ret;
 
     /* We can't allocate clusters if they may still be queued for discard. */
@@ -979,11 +998,11 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
 retry:
     for(i = 0; i < nb_clusters; i++) {
         uint64_t next_cluster_index = s->free_cluster_index++;
-        ret = qcow2_get_refcount(bs, next_cluster_index, &refcount);
 
+        ret = is_cluster_free(bs, next_cluster_index);
         if (ret < 0) {
             return ret;
-        } else if (refcount != 0) {
+        } else if (!ret) {
             goto retry;
         }
     }
@@ -1030,7 +1049,7 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
                                 int64_t nb_clusters)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t cluster_index, refcount;
+    uint64_t cluster_index;
     uint64_t i;
     int ret;
 
@@ -1043,10 +1062,10 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
         /* Check how many clusters there are free */
         cluster_index = offset >> s->cluster_bits;
         for(i = 0; i < nb_clusters; i++) {
-            ret = qcow2_get_refcount(bs, cluster_index++, &refcount);
+            ret = is_cluster_free(bs, cluster_index++);
             if (ret < 0) {
                 return ret;
-            } else if (refcount != 0) {
+            } else if (!ret) {
                 break;
             }
         }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 09/11] qcow2: don't reallocate host clusters under guest operation
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 08/11] qcow2: introduce is_cluster_free() helper Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 10/11] block/reqlist: implement reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We have the following bug:

1. Start write to qcow2. Assume guest cluster G and corresponding host
   cluster is H.

2. The write requests come to the point of data writing to .file. The
   write to .file is started and qcow2 mutex is unlocked.

3. At this time refcount of H becomes 0. For example, it may be due to
   discard operation on qcow2 node, or rewriting compressed data by
   normal write, or some operation with snapshots..

4. Next, some operations occurs and leads to allocation of H for some
   other needs: it may be another write-to-qcow2-node operation, or
   allocation of L2 table or some other data or metadata cluster
   allocation.

5. So, at this point H is used for something other. Assume, L2 table is
   written into H.

6. And now, our write from [2] finishes. And pollutes L2 table in H.
   That's a bug.

To fix the bug we now have guest operations tracking. The remaining
thing to do is to handle these in-flight operations in
is_cluster_free() function which is the core of allocating host
clusters.

iotest qcow2-discard-during-rewrite is enabled, as it works now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c                                | 7 +++++--
 tests/qemu-iotests/tests/qcow2-discard-during-rewrite | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 13b1fed43e..e71bb9b089 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -962,7 +962,7 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,
 
 
 /*
- * Cluster is free when its refcount is 0
+ * Cluster is free when its refcount is 0 and there is no in-flight writes
  *
  * Return < 0 if failed to get refcount
  *          0 if cluster is not free
@@ -970,6 +970,7 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,
  */
 static int is_cluster_free(BlockDriverState *bs, int64_t cluster_index)
 {
+    BDRVQcow2State *s = bs->opaque;
     int ret;
     uint64_t refcount;
 
@@ -978,7 +979,9 @@ static int is_cluster_free(BlockDriverState *bs, int64_t cluster_index)
         return ret;
     }
 
-    return refcount == 0;
+    return refcount == 0 &&
+        !reqlist_find_conflict(&s->guest_reqs, cluster_index * s->cluster_size,
+                               s->cluster_size);
 }
 
 /* return < 0 if error */
diff --git a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
index 7f0d8a107a..2e2e0d2cb0 100755
--- a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
+++ b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# group: quick disabled
+# group: quick
 #
 # Test discarding (and reusing) host cluster during writing data to it.
 #
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 10/11] block/reqlist: implement reqlist_mark_req_invalid()
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 09/11] qcow2: don't reallocate host clusters under guest operation Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-04 16:24 ` [PATCH v7 11/11] qcow2: use reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
  2021-09-22  8:24 ` [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

We do lock qcow2 s->lock only to remove request from the reqlist.
That's quite inefficient. Let's implement atomic operation to avoid
extra critical section.

So new interface is:

1. Instead of reqlist_free_req() user may call atomic
   reqlist_mark_req_invalid().

2. At some moment under mutex user calls reqlist_free_invalid_reqs() to
   free RAM.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h | 13 +++++++++++++
 block/reqlist.c         | 23 ++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index 32dc87666f..24d6d93a6e 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -26,6 +26,7 @@
 typedef struct BlockReq {
     int64_t offset;
     int64_t bytes;
+    bool valid;
 
     CoQueue wait_queue; /* coroutines blocked on this req */
     QLIST_ENTRY(BlockReq) list;
@@ -84,4 +85,16 @@ static inline void reqlist_free_req(BlockReq *req)
     }
 }
 
+/*
+ * Invalid request will be ignored when searching for conflicts.
+ * The function modifies .valid atomically and intended for use when we
+ * want to avoid using mutex.
+ * If you use this function don't forget to also call
+ * reqlist_free_invalid_reqs() sometimes, so that list doesn't grow endlessly.
+ */
+void reqlist_mark_req_invalid(BlockReq *req);
+
+/* Remove all invalid requests to free RAM space */
+void reqlist_free_invalid_reqs(BlockReqList *reqs);
+
 #endif /* REQLIST_H */
diff --git a/block/reqlist.c b/block/reqlist.c
index c580752db7..641307d80d 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -14,6 +14,8 @@
 
 #include "qemu/osdep.h"
 
+#include "qemu/atomic.h"
+
 #include "block/reqlist.h"
 
 void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
@@ -22,6 +24,7 @@ void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
     *req = (BlockReq) {
         .offset = offset,
         .bytes = bytes,
+        .valid = true,
     };
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(reqs, req, list);
@@ -33,7 +36,9 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
     BlockReq *r;
 
     QLIST_FOREACH(r, reqs, list) {
-        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {
+        if (r->valid &&
+            offset + bytes > r->offset && offset < r->offset + r->bytes)
+        {
             return r;
         }
     }
@@ -72,3 +77,19 @@ void coroutine_fn reqlist_remove_req(BlockReq *req)
     QLIST_REMOVE(req, list);
     qemu_co_queue_restart_all(&req->wait_queue);
 }
+
+void reqlist_mark_req_invalid(BlockReq *req)
+{
+    qatomic_set(&req->valid, false);
+}
+
+void reqlist_free_invalid_reqs(BlockReqList *reqs)
+{
+    BlockReq *r, *next;
+
+    QLIST_FOREACH_SAFE(r, reqs, list, next) {
+        if (!r->valid) {
+            reqlist_free_req(r);
+        }
+    }
+}
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 11/11] qcow2: use reqlist_mark_req_invalid()
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 10/11] block/reqlist: implement reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
@ 2021-09-04 16:24 ` Vladimir Sementsov-Ogievskiy
  2021-09-22  8:24 ` [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-04 16:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, vsementsov, hreitz, kwolf

Instead of small critical sections which wants only to remove a
request from the list let's use new atomic interface. And don't forget
to call reqlist_free_invalid_reqs() when we are in a critical section
anyway, to not overflow the RAM with invalid requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index aefe6558b6..f2094c1ecc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2306,9 +2306,7 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
     }
 
     if (req) {
-        WITH_QEMU_LOCK_GUARD(&s->lock) {
-            reqlist_free_req(req);
-        }
+        reqlist_mark_req_invalid(req);
     }
 
     return ret;
@@ -2348,6 +2346,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         }
 
         qemu_co_mutex_lock(&s->lock);
+        reqlist_free_invalid_reqs(&s->guest_reqs);
         ret = qcow2_get_host_offset(bs, offset, &cur_bytes,
                                     &host_offset, &type, &req);
         qemu_co_mutex_unlock(&s->lock);
@@ -2769,6 +2768,8 @@ static void qcow2_close(BlockDriverState *bs)
 
     qcow2_refcount_close(bs);
     qcow2_free_snapshots(bs);
+
+    reqlist_free_invalid_reqs(&s->guest_reqs);
 }
 
 static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs,
@@ -4619,6 +4620,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
     }
 
     qemu_co_mutex_lock(&s->lock);
+    reqlist_free_invalid_reqs(&s->guest_reqs);
     ret = qcow2_alloc_compressed_cluster_offset(bs, offset, out_len,
                                                 &cluster_offset, &req);
     if (ret < 0) {
@@ -4641,9 +4643,7 @@ success:
     ret = 0;
 fail:
     if (req) {
-        WITH_QEMU_LOCK_GUARD(&s->lock) {
-            reqlist_free_req(req);
-        }
+        reqlist_mark_req_invalid(req);
     }
     qemu_vfree(buf);
     g_free(out_buf);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist)
  2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2021-09-04 16:24 ` [PATCH v7 11/11] qcow2: use reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
@ 2021-09-22  8:24 ` Vladimir Sementsov-Ogievskiy
  11 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-22  8:24 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf

Ping.

Hi Kevin!

Could you look at description in cover-letter and 07-09, and say do you like the approach in general? Then I can resend with reqlist included and without "Based-on".

Or do you still prefer the solution with RWLock?

I don't like RWLock-based blocking solution, because:

1. Mirror does discards in parallel with other requests, so blocking discard may decrease mirror performance
2. Backup (in push-backup-with-fleecing scheme) will do discards on temporary image, again blocking discard is bad in this case
3. block-commit and block-stream will at some moment do discard-after-copy to not waste disk space
4. It doesn't seem possible to pass ownership on RWLock to task coroutine (as we can't lock it in one coroutine and release in another)


04.09.2021 19:24, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> This is a new (third :)) variant of solving the problem in qcow2 that we
> lack synchronisation in reallocating host clusters with refcount=0 and
> guest reads/writes that may still be in progress on these clusters.
> 
> Previous version was
> "[PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless)"
> Supersedes: <20210422163046.442932-1-vsementsov@virtuozzo.com>
> 
> Key features of new solution:
> 
> 1. Simpler data structure: requests list instead of "dynamic refcount"
>     concept. What is good is that request list data structure is
>     implemented and used in other my series. So we can improve and reuse
>     it.
> 
> 2. Don't care about discrads: my previous attempts were complicated by
>     trying to postpone discards when we have in-flight operations on
>     cluster which refcounts becomes 0. Don't bother with it anymore.
>     Nothing bad in discard: it similar as when guest does simultaneous
>     writes into same cluster: we don't care to serialize these writes.
>     So keep discards as is.
> 
> So now the whole fix may be described in one sentence: don't consider a
> host cluster "free" (for allocation) if there are still some in-flight
> guest operations on it.
> 
> What patches are:
> 
> 1-2: simple additions to reqlist (see also *)
> 
> 3: test. It's unchanged since previous solution
> 
> 4-5: simpler refactorings
> 
> 6-7: implement guest requests tracking in qcow2
> 
> 8: refactoring for [9]
> 9: BUG FIX: use request tracking to avoid reallocating clusters under
>     guest operation
> 
> 10-11: improvement if request tracking to avoid extra small critical
> sections that were added in [7]
> 
> [*]:
> For this series to work we also need
>   "[PATCH v2 06/19] block: intoduce reqlist":
> Based-on: <20210827181808.311670-7-vsementsov@virtuozzo.com>
> 
> Still, note that we use only simple core of reqlist and don't use its
> coroutine-related features. That probably means that I should split a
> kind of BlockReqListBase data structure out of BlockReqList, and then
> will have separate
> 
> CoBlockReqList for "[PATCH v2 00/19] Make image fleecing more usable"
> series and Qcow2ReqList for this series..
> 
> But that a side question.
> 
> Vladimir Sementsov-Ogievskiy (11):
>    block/reqlist: drop extra assertion
>    block/reqlist: add reqlist_new_req() and reqlist_free_req()
>    iotests: add qcow2-discard-during-rewrite
>    qcow2: introduce qcow2_parse_compressed_cluster_descriptor()
>    qcow2: refactor qcow2_co_preadv_task() to have one return
>    qcow2: prepare for tracking guest io requests in data_file
>    qcow2: track guest io requests in data_file
>    qcow2: introduce is_cluster_free() helper
>    qcow2: don't reallocate host clusters under guest operation
>    block/reqlist: implement reqlist_mark_req_invalid()
>    qcow2: use reqlist_mark_req_invalid()
> 
>   block/qcow2.h                                 |  15 ++-
>   include/block/reqlist.h                       |  33 ++++++
>   block/qcow2-cluster.c                         |  45 ++++++-
>   block/qcow2-refcount.c                        |  34 +++++-
>   block/qcow2.c                                 | 112 +++++++++++++-----
>   block/reqlist.c                               |  25 +++-
>   .../tests/qcow2-discard-during-rewrite        |  72 +++++++++++
>   .../tests/qcow2-discard-during-rewrite.out    |  21 ++++
>   8 files changed, 310 insertions(+), 47 deletions(-)
>   create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
>   create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-22  8:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-04 16:24 [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 01/11] block/reqlist: drop extra assertion Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 02/11] block/reqlist: add reqlist_new_req() and reqlist_free_req() Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 03/11] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 04/11] qcow2: introduce qcow2_parse_compressed_cluster_descriptor() Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 05/11] qcow2: refactor qcow2_co_preadv_task() to have one return Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 06/11] qcow2: prepare for tracking guest io requests in data_file Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 07/11] qcow2: track " Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 08/11] qcow2: introduce is_cluster_free() helper Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 09/11] qcow2: don't reallocate host clusters under guest operation Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 10/11] block/reqlist: implement reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
2021-09-04 16:24 ` [PATCH v7 11/11] qcow2: use reqlist_mark_req_invalid() Vladimir Sementsov-Ogievskiy
2021-09-22  8:24 ` [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist) Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.