All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Strange virtio regression on mainline and stable-0.10
@ 2009-05-05  9:52 Avi Kivity
  2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05  9:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Running the Fedora 10 installer on a virtio disk on current master and 
on v0.10.3 will cause the installer to complain when mounting the 
freshly formatted filesystems.  A bisect (on stable-0.10) yielded

4df8f71ee5276fb22a810bbb67db3a4288f5cad4 is first bad commit
commit 4df8f71ee5276fb22a810bbb67db3a4288f5cad4
Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Fri Apr 17 20:44:41 2009 +0000

    qcow2 corruption: Fix alloc_cluster_link_l2 (Kevin Wolf)
   
    This patch fixes a qcow2 corruption bug introduced in SVN Rev 5861. 
L2 tables
    are big endian, so entries must be converted before being passed to 
functions.
   
    This bug is easy to trigger. The following script will create and 
destroy a
    qcow2 image (the header is gone after three loop iterations):
   
        #!/bin/bash
        qemu-img create -f qcow2 test.qcow 1M
        for i in $(seq 1 10); do
        qemu-system-x86_64 -hda test.qcow -monitor stdio > /dev/null 
2>&1 <<EOF
        savevm test-$i
        quit
        EOF
        done
   
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

As the guilty commit.

I don't understand it, as free_any_clusters() shouldn't be called while 
formatting.  Any ideas?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: Strange virtio regression on mainline and stable-0.10
  2009-05-05  9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
@ 2009-05-05 10:03 ` Avi Kivity
       [not found] ` <4A001563.1020604@redhat.com>
  2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
  2 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 10:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Avi Kivity wrote:
>
> I don't understand it, as free_any_clusters() shouldn't be called 
> while formatting.  Any ideas?
>

Here's the write log with the free_any_clusters() calls:


Formatting '/images/tmp.img', fmt=qcow2, size=10485760 kB
qcow_aio_write: sector 0 count 1
qcow_aio_write: sector 0 count 1
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620d9 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620d9 count 8
qcow_aio_write: sector a302d9 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 62259 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector 620e1 count 8
qcow_aio_write: sector ff2259 count 8
qcow_aio_write: sector ff2259 count 32
qcow_aio_write: sector ff2259 count 8
qcow_aio_write: sector 62259 count 8
qcow_aio_write: sector 63201 count 248
qcow_aio_write: sector 632f9 count 248
qcow_aio_write: sector 633f1 count 248
qcow_aio_write: sector 634e9 count 8
free_any_clusters: 800000000007f000 1
free_any_clusters: 800000000005f000 1
free_any_clusters: 800000000003f000 1




-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: Strange virtio regression on mainline and stable-0.10
       [not found] ` <4A001563.1020604@redhat.com>
@ 2009-05-05 11:46   ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 11:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

(re-added qemu-devel)

Kevin Wolf wrote:
> Avi Kivity schrieb:
>   
>> Running the Fedora 10 installer on a virtio disk on current master and 
>> on v0.10.3 will cause the installer to complain when mounting the 
>> freshly formatted filesystems. 
>>     
>
> Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=498405
>
>   

It appears to be the same.

> I encountered IO errors with qcow2 and virtio this morning, so I already
> wanted to start debugging it. Unfortunately, with current master extboot
> seems to be broken for me and so I can't even boot with virtio. Have not
> figured out yet what's going wrong here.
>   

I'm using the installer so I don't depend on extboot.  Of course the 
extboot issue needs to be fixed as well.

>> I don't understand it, as free_any_clusters() shouldn't be called while 
>> formatting.  Any ideas?
>>     
>
> free_any_clusters shouldn't be called at all without snapshots or
> backing files.
>
> qemu-img check notices refcount errors on my broken image though, so
> this could still be the same.

data point: if I force writes to be serialized in block-qcow2.c, then 
everything works, and there are no calls to free_any_clusters().  It's 
likely that there is some lack of serialization in the allocation code, 
some write that is in flight does not update the global state, only 
acb-local state.

-- 
error compiling committee.c: too many arguments to function


[-- Attachment #2: serialize-qcow2.patch --]
[-- Type: text/x-patch, Size: 1154 bytes --]

diff --git a/block-qcow2.c b/block-qcow2.c
index c4cd38d..151e688 100644
--- a/block-qcow2.c
+++ b/block-qcow2.c
@@ -1443,6 +1443,9 @@ static BlockDriverAIOCB *qcow_aio_readv(BlockDriverState *bs,
     return &acb->common;
 }
 
+static int nb_writes;
+static BlockDriverAIOCB *pending_writes[300];
+
 static void qcow_aio_write_cb(void *opaque, int ret)
 {
     QCowAIOCB *acb = opaque;
@@ -1518,6 +1521,18 @@ done:
         qemu_vfree(acb->orig_buf);
     acb->common.cb(acb->common.opaque, ret);
     qemu_aio_release(acb);
+
+    {
+        int i;
+
+        for (i = 1; i < nb_writes; ++i) {
+            pending_writes[i - 1] = pending_writes[i];
+        }
+        --nb_writes;
+        if (nb_writes) {
+            qcow_aio_write_cb(pending_writes[0], 0);
+        }
+    }
 }
 
 static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
@@ -1535,7 +1550,12 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
     if (!acb)
         return NULL;
 
-    qcow_aio_write_cb(acb, 0);
+    pending_writes[nb_writes++] = acb;
+
+    if (nb_writes == 1) {
+        qcow_aio_write_cb(acb, 0);
+    }
+
     return &acb->common;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
  2009-05-05  9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
  2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
       [not found] ` <4A001563.1020604@redhat.com>
@ 2009-05-05 16:18 ` Avi Kivity
  2009-05-05 18:18   ` Avi Kivity
  2009-05-06  8:14   ` Kevin Wolf
  2 siblings, 2 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 16:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Avi Kivity wrote:
> Running the Fedora 10 installer on a virtio disk on current master and 
> on v0.10.3 will cause the installer to complain when mounting the 
> freshly formatted filesystems.

The problem is that qcow2 does a read-modify-write on 
non-cluster-aligned writes.  So the following sequence triggers the bug:

 guest: write(sector=0, count=1) (A)
 guest: write(sector=1, count=1) (B)
 qemu: read(sector=0, count=16) (A)
 qemu: submit aio write (sector=0, count=16) (A)
 qemu: read(sector=0, count=16) (B)
 qemu: submit aio write (sector=0, count=16) (B) - contains data from 
before A's write

This could be solved by maintaining a hash table of refcounted RMW 
copies for the disk.  When reading for a RMW, look up the hash table, if 
there's a copy there, use it instead of reading it yourself.

We should also avoid the RMW for non-compressed, non-encrypted clusters, 
as virtually ALL writes will be misaligned.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
  2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
@ 2009-05-05 18:18   ` Avi Kivity
  2009-05-06  8:14   ` Kevin Wolf
  1 sibling, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-05 18:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Avi Kivity wrote:
> Avi Kivity wrote:
>> Running the Fedora 10 installer on a virtio disk on current master 
>> and on v0.10.3 will cause the installer to complain when mounting the 
>> freshly formatted filesystems.
>
> The problem is that qcow2 does a read-modify-write on 
> non-cluster-aligned writes.  So the following sequence triggers the bug:

No, that's not the problem.

Now I think the problem occurs if the guest does two non-overlapping 
writes in parallel that hit the same cluster.  Initially the cluster is 
not allocated, so the two writes will go into two newly allocated 
clusters.  When qcow2 tries to update the block pointers, something 
strange happens.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
  2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
  2009-05-05 18:18   ` Avi Kivity
@ 2009-05-06  8:14   ` Kevin Wolf
  2009-05-06  8:34     ` Avi Kivity
  1 sibling, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2009-05-06  8:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

Avi Kivity schrieb:
> Avi Kivity wrote:
>> Running the Fedora 10 installer on a virtio disk on current master and 
>> on v0.10.3 will cause the installer to complain when mounting the 
>> freshly formatted filesystems.
> 
> The problem is that qcow2 does a read-modify-write on 
> non-cluster-aligned writes.  So the following sequence triggers the bug:
> [...]
> 
> This could be solved by maintaining a hash table of refcounted RMW 
> copies for the disk.  When reading for a RMW, look up the hash table, if 
> there's a copy there, use it instead of reading it yourself.
> 
> We should also avoid the RMW for non-compressed, non-encrypted clusters, 
> as virtually ALL writes will be misaligned.

I don't think there is a RMW except for the COW case which is
unavoidable and obviously happens only once for each cluster. Do you see
any other places where this happens?

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
  2009-05-06  8:14   ` Kevin Wolf
@ 2009-05-06  8:34     ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-05-06  8:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Kevin Wolf wrote:
> Avi Kivity schrieb:
>   
>> Avi Kivity wrote:
>>     
>>> Running the Fedora 10 installer on a virtio disk on current master and 
>>> on v0.10.3 will cause the installer to complain when mounting the 
>>> freshly formatted filesystems.
>>>       
>> The problem is that qcow2 does a read-modify-write on 
>> non-cluster-aligned writes.  So the following sequence triggers the bug:
>> [...]
>>
>> This could be solved by maintaining a hash table of refcounted RMW 
>> copies for the disk.  When reading for a RMW, look up the hash table, if 
>> there's a copy there, use it instead of reading it yourself.
>>
>> We should also avoid the RMW for non-compressed, non-encrypted clusters, 
>> as virtually ALL writes will be misaligned.
>>     
>
> I don't think there is a RMW except for the COW case which is
> unavoidable and obviously happens only once for each cluster. Do you see
> any other places where this happens?
>   

No, I misread the code.  I think the real problem is two parallel writes 
for the same cluster (but different sectors) started concurrently, so 
get_cluster_offset() places them in different clusters.  When the second 
write completes we get unexpected results since the metadata now 
contains a block where on the start of the operation it was unallocated.

We could probably get away with serializing only writes that hit the 
same cluster.  A better approach may be to try to place parallel 
sequential writes contiguously in the allocating case.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-06  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05  9:52 [Qemu-devel] Strange virtio regression on mainline and stable-0.10 Avi Kivity
2009-05-05 10:03 ` [Qemu-devel] " Avi Kivity
     [not found] ` <4A001563.1020604@redhat.com>
2009-05-05 11:46   ` Avi Kivity
2009-05-05 16:18 ` [Qemu-devel] " Avi Kivity
2009-05-05 18:18   ` Avi Kivity
2009-05-06  8:14   ` Kevin Wolf
2009-05-06  8:34     ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.