QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Alberto Garcia <berto@igalia.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Alberto Garcia <berto@igalia.com>,
	qemu-block@nongnu.org, Derek Su <dereksu@qnap.com>,
	Max Reitz <mreitz@redhat.com>
Subject: [PATCH v9 20/34] qcow2: Add subcluster support to calculate_l2_meta()
Date: Sun, 28 Jun 2020 13:02:29 +0200
Message-ID: <12ee527c0e8a80694fd249a38f106927062e3b44.1593342067.git.berto@igalia.com> (raw)
In-Reply-To: <cover.1593342067.git.berto@igalia.com>

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
        then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
        for any copy-on-write. However we still need to update the L2
        bitmap to reflect possible changes in the allocation status of
        subclusters #4 to #31. Because of this, this function checks
        if all the overwritten subclusters are already allocated and
        in this case it returns without creating a new QCowL2Meta
        structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-cluster.c | 163 +++++++++++++++++++++++++++++++++---------
 1 file changed, 131 insertions(+), 32 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ed7b92dbb2..59dd9bda29 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -387,7 +387,6 @@ fail:
  * If the L2 entry is invalid return -errno and set @type to
  * QCOW2_SUBCLUSTER_INVALID.
  */
-G_GNUC_UNUSED
 static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
                                            uint64_t l2_entry,
                                            uint64_t l2_bitmap,
@@ -1110,56 +1109,148 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 0 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-                              uint64_t host_cluster_offset,
-                              uint64_t guest_offset, unsigned bytes,
-                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_cluster_offset,
+                             uint64_t guest_offset, unsigned bytes,
+                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    int l2_index = offset_to_l2_slice_index(s, guest_offset);
-    uint64_t l2_entry;
+    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry, l2_bitmap;
     unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
+    int i;
+    bool skip_cow = keep_old;
 
     assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-    /* Return if there's no COW (all clusters are normal and we keep them) */
-    if (keep_old) {
-        int i;
-        for (i = 0; i < nb_clusters; i++) {
-            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
-                break;
+    /* Check the type of all affected subclusters */
+    for (i = 0; i < nb_clusters; i++) {
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+        l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+        if (skip_cow) {
+            unsigned write_from = MAX(cow_start_to, i << s->cluster_bits);
+            unsigned write_to = MIN(cow_end_from, (i + 1) << s->cluster_bits);
+            int first_sc = offset_to_sc_index(s, write_from);
+            int last_sc = offset_to_sc_index(s, write_to - 1);
+            int cnt = qcow2_get_subcluster_range_type(bs, l2_entry, l2_bitmap,
+                                                      first_sc, &type);
+            /* Is any of the subclusters of type != QCOW2_SUBCLUSTER_NORMAL ? */
+            if (type != QCOW2_SUBCLUSTER_NORMAL || first_sc + cnt <= last_sc) {
+                skip_cow = false;
             }
+        } else {
+            /* If we can't skip the cow we can still look for invalid entries */
+            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, 0);
         }
-        if (i == nb_clusters) {
-            return;
+        if (type == QCOW2_SUBCLUSTER_INVALID) {
+            int l1_index = offset_to_l1_index(s, guest_offset);
+            uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+            qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster "
+                                    "entry found (L2 offset: %#" PRIx64
+                                    ", L2 index: %#x)",
+                                    l2_offset, l2_index + i);
+            return -EIO;
         }
     }
 
+    if (skip_cow) {
+        return 0;
+    }
+
     /* Get the L2 entry of the first cluster */
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_start_from = cow_start_to;
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+            cow_start_from = 0;
+            break;
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            if (has_subclusters(s)) {
+                /* Skip all leading zero and unallocated subclusters */
+                uint32_t alloc_bitmap = l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC;
+                cow_start_from =
+                    MIN(sc_index, ctz32(alloc_bitmap)) << s->subcluster_bits;
+            } else {
+                cow_start_from = 0;
+            }
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_start_from = 0;
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_start_from = cow_start_to;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     /* Get the L2 entry of the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_end_to = cow_end_from;
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            break;
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            if (has_subclusters(s)) {
+                /* Skip all trailing zero and unallocated subclusters */
+                uint32_t alloc_bitmap = l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC;
+                cow_end_to -=
+                    MIN(s->subclusters_per_cluster - sc_index - 1,
+                        clz32(alloc_bitmap)) << s->subcluster_bits;
+            }
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_end_to = cow_end_from;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     *m = g_malloc0(sizeof(**m));
@@ -1184,6 +1275,8 @@ static void calculate_l2_meta(BlockDriverState *bs,
 
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+    return 0;
 }
 
 /*
@@ -1272,8 +1365,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 
         uint64_t start = guest_offset;
         uint64_t end = start + bytes;
-        uint64_t old_start = l2meta_cow_start(old_alloc);
-        uint64_t old_end = l2meta_cow_end(old_alloc);
+        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
@@ -1398,8 +1491,11 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
                  - offset_into_cluster(s, guest_offset));
         assert(*bytes != 0);
 
-        calculate_l2_meta(bs, cluster_offset, guest_offset,
-                          *bytes, l2_slice, m, true);
+        ret = calculate_l2_meta(bs, cluster_offset, guest_offset,
+                                *bytes, l2_slice, m, true);
+        if (ret < 0) {
+            goto out;
+        }
 
         ret = 1;
     } else {
@@ -1575,8 +1671,11 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
-                      m, false);
+    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                            l2_slice, m, false);
+    if (ret < 0) {
+        goto out;
+    }
 
     ret = 1;
 
-- 
2.20.1



  parent reply index

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-28 11:02 [PATCH v9 00/34] Add subcluster allocation to qcow2 Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 01/34] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 02/34] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
2020-06-30 10:19   ` Max Reitz
2020-06-30 10:27     ` Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 03/34] qcow2: Add calculate_l2_meta() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 04/34] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 05/34] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
2020-06-30 10:38   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 06/34] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 07/34] qcow2: Document the Extended L2 Entries feature Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 08/34] qcow2: Add dummy has_subclusters() function Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 09/34] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 10/34] qcow2: Add offset_to_sc_index() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 11/34] qcow2: Add offset_into_subcluster() and size_to_subclusters() Alberto Garcia
2020-07-01 12:23   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 12/34] qcow2: Add l2_entry_size() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 13/34] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
2020-07-01 12:28   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 14/34] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
2020-07-01 12:52   ` Max Reitz
2020-07-01 16:26     ` Alberto Garcia
2020-07-02  9:57       ` Max Reitz
2020-07-02 22:00         ` Alberto Garcia
2020-07-03  7:17           ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 15/34] qcow2: Add qcow2_get_subcluster_range_type() Alberto Garcia
2020-07-01 13:37   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 16/34] qcow2: Add qcow2_cluster_is_allocated() Alberto Garcia
2020-07-01 13:55   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 17/34] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 18/34] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 19/34] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
2020-06-28 11:02 ` Alberto Garcia [this message]
2020-07-02 11:30   ` [PATCH v9 20/34] qcow2: Add subcluster support to calculate_l2_meta() Max Reitz
2020-06-28 11:02 ` [PATCH v9 21/34] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
2020-07-02 12:46   ` Max Reitz
2020-07-02 22:04     ` Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 22/34] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
2020-07-02 12:56   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 23/34] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
2020-07-02 13:24   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 24/34] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
2020-07-02 13:32   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 25/34] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
2020-07-02 14:01   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 26/34] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 27/34] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 28/34] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes() Alberto Garcia
2020-07-02 14:28   ` Max Reitz
2020-07-02 22:40     ` Alberto Garcia
2020-07-03  7:18       ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 29/34] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 30/34] qcow2: Add prealloc field to QCowL2Meta Alberto Garcia
2020-07-02 14:50   ` Max Reitz
2020-07-02 14:58     ` Eric Blake
2020-07-02 15:09       ` Max Reitz
2020-07-02 23:05         ` Alberto Garcia
2020-07-03  7:22           ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 31/34] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
2020-07-02 15:13   ` Max Reitz
2020-07-03 12:43     ` Alberto Garcia
2020-06-28 11:02 ` [PATCH v9 32/34] qcow2: Allow preallocation and backing files if extended_l2 is set Alberto Garcia
2020-07-03  7:45   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 33/34] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters Alberto Garcia
2020-07-03  7:46   ` Max Reitz
2020-06-28 11:02 ` [PATCH v9 34/34] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
2020-07-03  9:49   ` Max Reitz
2020-07-03 13:06     ` Alberto Garcia
2020-07-03 13:47       ` Max Reitz
2020-07-03 15:20         ` Alberto Garcia
2020-07-06 13:57       ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12ee527c0e8a80694fd249a38f106927062e3b44.1593342067.git.berto@igalia.com \
    --to=berto@igalia.com \
    --cc=dereksu@qnap.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git