* [Qemu-devel] Strange virtio regression on mainline and stable-0.10
@ 2009-05-05 9:52 Avi Kivity
2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 9:52 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Running the Fedora 10 installer on a virtio disk on current master and
on v0.10.3 will cause the installer to complain when mounting the
freshly formatted filesystems. A bisect (on stable-0.10) yielded
4df8f71ee5276fb22a810bbb67db3a4288f5cad4 is first bad commit
commit 4df8f71ee5276fb22a810bbb67db3a4288f5cad4
Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Date: Fri Apr 17 20:44:41 2009 +0000
qcow2 corruption: Fix alloc_cluster_link_l2 (Kevin Wolf)
This patch fixes a qcow2 corruption bug introduced in SVN Rev 5861.
L2 tables
are big endian, so entries must be converted before being passed to
functions.
This bug is easy to trigger. The following script will create and
destroy a
qcow2 image (the header is gone after three loop iterations):
#!/bin/bash
qemu-img create -f qcow2 test.qcow 1M
for i in $(seq 1 10); do
qemu-system-x86_64 -hda test.qcow -monitor stdio > /dev/null
2>&1 <<EOF
savevm test-$i
quit
EOF
done
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As the guilty commit.
I don't understand it, as free_any_clusters() shouldn't be called while
formatting. Any ideas?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] Re: Strange virtio regression on mainline and stable-0.10
2009-05-05 9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
@ 2009-05-05 10:03 ` Avi Kivity
[not found] ` <4A001563.1020604@redhat.com>
2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
2 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 10:03 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Avi Kivity wrote:
>
> I don't understand it, as free_any_clusters() shouldn't be called
> while formatting. Any ideas?
>
Here's the write log with the free_any_clusters() calls:
Formatting '/images/tmp.img', fmt=qcow2, size=10485760 kB
qcow_aio_write: sector 0 count 1
qcow_aio_write: sector 0 count 1
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620d9 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620d9 count 8
qcow_aio_write: sector a302d9 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 62259 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector ff2259 count 8
qcow_aio_write: sector ff2259 count 32
qcow_aio_write: sector ff2259 count 8
qcow_aio_write: sector 62259 count 8
qcow_aio_write: sector 63201 count 248
qcow_aio_write: sector 632f9 count 248
qcow_aio_write: sector 633f1 count 248
qcow_aio_write: sector 634e9 count 8
free_any_clusters: 800000000007f000 1
free_any_clusters: 800000000005f000 1
free_any_clusters: 800000000003f000 1
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] Re: Strange virtio regression on mainline and stable-0.10
[not found] ` <4A001563.1020604@redhat.com>
@ 2009-05-05 11:46 ` Avi Kivity
0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 11:46 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]
(re-added qemu-devel)
Kevin Wolf wrote:
> Avi Kivity schrieb:
>
>> Running the Fedora 10 installer on a virtio disk on current master and
>> on v0.10.3 will cause the installer to complain when mounting the
>> freshly formatted filesystems.
>>
>
> Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=498405
>
>
It appears to be the same.
> I encountered IO errors with qcow2 and virtio this morning, so I already
> wanted to start debugging it. Unfortunately, with current master extboot
> seems to be broken for me and so I can't even boot with virtio. Have not
> figured out yet what's going wrong here.
>
I'm using the installer so I don't depend on extboot. Of course the
extboot issue needs to be fixed as well.
>> I don't understand it, as free_any_clusters() shouldn't be called while
>> formatting. Any ideas?
>>
>
> free_any_clusters shouldn't be called at all without snapshots or
> backing files.
>
> qemu-img check notices refcount errors on my broken image though, so
> this could still be the same.
data point: if I force writes to be serialized in block-qcow2.c, then
everything works, and there are no calls to free_any_clusters(). It's
likely that there is some lack of serialization in the allocation code,
some write that is in flight does not update the global state, only
acb-local state.
--
error compiling committee.c: too many arguments to function
[-- Attachment #2: serialize-qcow2.patch --]
[-- Type: text/x-patch, Size: 1154 bytes --]
diff --git a/block-qcow2.c b/block-qcow2.c
index c4cd38d..151e688 100644
--- a/block-qcow2.c
+++ b/block-qcow2.c
@@ -1443,6 +1443,9 @@ static BlockDriverAIOCB *qcow_aio_readv(BlockDriverState *bs,
return &acb->common;
}
+static int nb_writes;
+static BlockDriverAIOCB *pending_writes[300];
+
static void qcow_aio_write_cb(void *opaque, int ret)
{
QCowAIOCB *acb = opaque;
@@ -1518,6 +1521,18 @@ done:
qemu_vfree(acb->orig_buf);
acb->common.cb(acb->common.opaque, ret);
qemu_aio_release(acb);
+
+ {
+ int i;
+
+ for (i = 1; i < nb_writes; ++i) {
+ pending_writes[i - 1] = pending_writes[i];
+ }
+ --nb_writes;
+ if (nb_writes) {
+ qcow_aio_write_cb(pending_writes[0], 0);
+ }
+ }
}
static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
@@ -1535,7 +1550,12 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
if (!acb)
return NULL;
- qcow_aio_write_cb(acb, 0);
+ pending_writes[nb_writes++] = acb;
+
+ if (nb_writes == 1) {
+ qcow_aio_write_cb(acb, 0);
+ }
+
return &acb->common;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
2009-05-05 9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
[not found] ` <4A001563.1020604@redhat.com>
@ 2009-05-05 16:18 ` Avi Kivity
2009-05-05 18:18 ` Avi Kivity
2009-05-06 8:14 ` Kevin Wolf
2 siblings, 2 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 16:18 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Avi Kivity wrote:
> Running the Fedora 10 installer on a virtio disk on current master and
> on v0.10.3 will cause the installer to complain when mounting the
> freshly formatted filesystems.
The problem is that qcow2 does a read-modify-write on
non-cluster-aligned writes. So the following sequence triggers the bug:
guest: write(sector=0, count=1) (A)
guest: write(sector=1, count=1) (B)
qemu: read(sector=0, count=16) (A)
qemu: submit aio write (sector=0, count=16) (A)
qemu: read(sector=0, count=16) (B)
qemu: submit aio write (sector=0, count=16) (B) - contains data from
before A's write
This could be solved by maintaining a hash table of refcounted RMW
copies for the disk. When reading for a RMW, look up the hash table, if
there's a copy there, use it instead of reading it yourself.
We should also avoid the RMW for non-compressed, non-encrypted clusters,
as virtually ALL writes will be misaligned.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
@ 2009-05-05 18:18 ` Avi Kivity
2009-05-06 8:14 ` Kevin Wolf
1 sibling, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 18:18 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Avi Kivity wrote:
> Avi Kivity wrote:
>> Running the Fedora 10 installer on a virtio disk on current master
>> and on v0.10.3 will cause the installer to complain when mounting the
>> freshly formatted filesystems.
>
> The problem is that qcow2 does a read-modify-write on
> non-cluster-aligned writes. So the following sequence triggers the bug:
No, that's not the problem.
Now I think the problem occurs if the guest does two non-overlapping
writes in parallel that hit the same cluster. Initially the cluster is
not allocated, so the two writes will go into two newly allocated
clusters. When qcow2 tries to update the block pointers, something
strange happens.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
2009-05-05 18:18 ` Avi Kivity
@ 2009-05-06 8:14 ` Kevin Wolf
2009-05-06 8:34 ` Avi Kivity
1 sibling, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2009-05-06 8:14 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel
Avi Kivity schrieb:
> Avi Kivity wrote:
>> Running the Fedora 10 installer on a virtio disk on current master and
>> on v0.10.3 will cause the installer to complain when mounting the
>> freshly formatted filesystems.
>
> The problem is that qcow2 does a read-modify-write on
> non-cluster-aligned writes. So the following sequence triggers the bug:
> [...]
>
> This could be solved by maintaining a hash table of refcounted RMW
> copies for the disk. When reading for a RMW, look up the hash table, if
> there's a copy there, use it instead of reading it yourself.
>
> We should also avoid the RMW for non-compressed, non-encrypted clusters,
> as virtually ALL writes will be misaligned.
I don't think there is a RMW except for the COW case which is
unavoidable and obviously happens only once for each cluster. Do you see
any other places where this happens?
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
2009-05-06 8:14 ` Kevin Wolf
@ 2009-05-06 8:34 ` Avi Kivity
0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-06 8:34 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Kevin Wolf wrote:
> Avi Kivity schrieb:
>
>> Avi Kivity wrote:
>>
>>> Running the Fedora 10 installer on a virtio disk on current master and
>>> on v0.10.3 will cause the installer to complain when mounting the
>>> freshly formatted filesystems.
>>>
>> The problem is that qcow2 does a read-modify-write on
>> non-cluster-aligned writes. So the following sequence triggers the bug:
>> [...]
>>
>> This could be solved by maintaining a hash table of refcounted RMW
>> copies for the disk. When reading for a RMW, look up the hash table, if
>> there's a copy there, use it instead of reading it yourself.
>>
>> We should also avoid the RMW for non-compressed, non-encrypted clusters,
>> as virtually ALL writes will be misaligned.
>>
>
> I don't think there is a RMW except for the COW case which is
> unavoidable and obviously happens only once for each cluster. Do you see
> any other places where this happens?
>
No, I misread the code. I think the real problem is two parallel writes
for the same cluster (but different sectors) started concurrently, so
get_cluster_offset() places them in different clusters. When the second
write completes we get unexpected results since the metadata now
contains a block where on the start of the operation it was unallocated.
We could probably get away with serializing only writes that hit the
same cluster. A better approach may be to try to place parallel
sequential writes contiguously in the allocating case.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-05-06 8:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05 9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
[not found] ` <4A001563.1020604@redhat.com>
2009-05-05 11:46 ` Avi Kivity
2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
2009-05-05 18:18 ` Avi Kivity
2009-05-06 8:14 ` Kevin Wolf
2009-05-06 8:34 ` Avi Kivity
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.