From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47566) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpFm7-0001qh-Kf for qemu-devel@nongnu.org; Fri, 05 Aug 2011 04:29:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QpFm6-0007Nl-8i for qemu-devel@nongnu.org; Fri, 05 Aug 2011 04:28:59 -0400 Received: from mail-vw0-f45.google.com ([209.85.212.45]:37740) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpFm6-0007Nh-57 for qemu-devel@nongnu.org; Fri, 05 Aug 2011 04:28:58 -0400 Received: by vws17 with SMTP id 17so2226110vws.4 for ; Fri, 05 Aug 2011 01:28:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4E3BA0EB.9050604@redhat.com> References: <1312474673-2705-1-git-send-email-kwolf@redhat.com> <1312475099-3323-1-git-send-email-kwolf@redhat.com> <4E3BA0EB.9050604@redhat.com> Date: Fri, 5 Aug 2011 10:28:57 +0200 Message-ID: From: Frediano Ziglio Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 0.15.0] qcow2: Fix L1 table size after bdrv_snapshot_goto List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: stefanha@gmail.com, qemu-devel@nongnu.org, hahn@univention.de 2011/8/5 Kevin Wolf : > Am 05.08.2011 08:35, schrieb Frediano Ziglio: >> 2011/8/4 Kevin Wolf : >>> When loading an internal snapshot whose L1 table is smaller than the cu= rrent L1 >>> table, the size of the current L1 would be shrunk to the snapshot's L1 = size in >>> memory, but not on disk. This lead to incorrect refcount updates and ev= entuelly >>> to image corruption. >>> >>> Instead of writing the new L1 size to disk, this simply retains the big= ger L1 >>> size that is currently in use and makes sure that the unused part is ze= roed. >>> >>> Signed-off-by: Kevin Wolf >>> --- >>> >>> And the moment you send it out, you notice that it's wrong... *sigh* >>> >>> v2: >>> - Check for s->l1_size > sn->l1_size in order to avoid disasters... >>> >>> Philipp, I think this should fix your corruption. Please give it a try. >>> >>> Anthony, this must go into 0.15. Given the short time until -rc2, do yo= u prefer >>> to pick it up directly or should I send a pull request tomorrow? The pa= tch >>> looks obvious, is tested with the given testcase and survives a basic >>> qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot) >>> >>> Stefan, please review :-) >>> >>> =C2=A0block/qcow2-snapshot.c | =C2=A0 =C2=A05 ++++- >>> =C2=A01 files changed, 4 insertions(+), 1 deletions(-) >>> >>> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c >>> index 74823a5..6972e66 100644 >>> --- a/block/qcow2-snapshot.c >>> +++ b/block/qcow2-snapshot.c >>> @@ -330,8 +330,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, cons= t char *snapshot_id) >>> =C2=A0 =C2=A0 if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0) >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto fail; >>> >>> - =C2=A0 =C2=A0s->l1_size =3D sn->l1_size; >>> + =C2=A0 =C2=A0if (s->l1_size > sn->l1_size) { >>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0memset(s->l1_table + sn->l1_size, 0, s->l1= _size - sn->l1_size); >>> + =C2=A0 =C2=A0} >>> =C2=A0 =C2=A0 l1_size2 =3D s->l1_size * sizeof(uint64_t); >>> + >>> =C2=A0 =C2=A0 /* copy the snapshot l1 table to the current l1 table */ >>> =C2=A0 =C2=A0 if (bdrv_pread(bs->file, sn->l1_table_offset, >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s-= >l1_table, l1_size2) !=3D l1_size2) >>> -- >>> 1.7.6 >>> >> >> This patch looked odd at first sight. First a qcow2_grow_l1_table is >> called to shrink L1 so perhaps should be qcow2_resize_l1_table. > > No, it doesn't shrink the table: > > =C2=A0 =C2=A0if (min_size <=3D s->l1_size) > =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0; > Yes, but perhaps returning success and not clipping anything is not that correct, perhaps qcow2_snapshot_goto should not call qcow2_grow_l1_table with a shorter value. >> Perhaps also it would be better to clean entries in >> qcow2_grow_l1_table instead of =C2=A0qcow2_snapshot_goto to avoid same >> problem in different calls to qcow2_grow_l1_table. The other oddity >> (still to understand) is: why does some code use l1_table above >> l1_size ?? > > Which code do you mean specifically? > > Kevin > I think this is the issue # 1204 -> 128 cluster per L2 entry -> 128k per L2 entry # 128 cluster per L1 entry -> 16M per L1 entry qemu-img create -f qcow2 /tmp/sn.qcow2 16m -o cluster_size=3D1024 qemu-img snapshot -c foo /tmp/sn.qcow2 # extend L1 qemu-io -c 'write -b 0 4M' /tmp/sn.qcow2 # shrink qemu-img snapshot -a foo /tmp/sn.qcow2 qemu-img check /tmp/sn.qcow2 Well... I was trying to get a leak and got a core instead. I removed your patch and got leaks. also, should not be memset(s->l1_table + sn->l1_size, 0, (s->l1_size - sn->l1_size) * sizeof(uint64_t)); instead of memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size); Frediano