[Qemu-devel] [Bug 721825] [NEW] VDI block driver bugs

* [Qemu-devel] [Bug 721825] [NEW] VDI block driver bugs
@ 2011-02-19 16:19 Stefan Hajnoczi
  2011-02-20 18:29 ` [Qemu-devel] [Bug 721825] " Stefan Weil
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-02-19 16:19 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

Chunqiang Tang reports the following issues with the VDI block driver,
these are present in QEMU 0.14:

"Bug 1. The most serious bug is caused by race condition in updating a new 
bmap entry in memory and on disk. Considering the following operation 
sequence. 
  O1: VM issues a write to sector X
  O2: VDI allocates a new bmap entry and updates in-memory s->bmap
  O3: VDI writes data to disk
  O4: The disk I/O for writing sector X fails
  O5: VDI reports error to VM and returns.

Note that the bmap entry is updated in memory, but not persisted on disk. 
Now consider another write that immediately follows:
  P1: VM issues a write to sector X+1, which locates in the same block as 
the previously used sector X.
  P2: s->bmap already has one entry for the block, and hence VDI writes 
data directly without persisting the new s->bmap entry on disk.
  P3: The write disk I/O succeeds
  P4: VDI report success to VM, but the bitmap entry is still not 
persisted on disk.

Now suppose the VM powers off gracefully (i.e., the QEMU process quits) 
and reboots. The second write to sector X+1, which is reported as finished 
successfully, is simply lost, because the corresponding in-memory s->bmap 
entry is never persisted on disk. This is exactly what FVD's testing tool 
discovers. After the block device is closed and then re-opened, disk 
content verification fails.

This is just one example of the problem. Race condition plus host crash 
also causes problems. Consider another example below.
  Q1: VM issues a write to sector X
  Q2: VDI allocates a new bmap entry and updates in-memory s->bmap
  Q3: VDI writes sector X to disk and waits for the callback
  Q4: VM issues a write to another sector X+1, which is in the same block 
as sector X.
  Q5: VDI sees the bitmap entry in s->bmap is already allocated, and 
writes sector X+1 to disk.
  Q6: Write to sector X+1 finishes, and VDI's callback is invoked.
  Q7: VDI acknowledges to the VM the completion of writing sector X+1
  Q8: After observing the completion of writing sector X+1, VM issues a 
flush to ensure that sector X+1 is persisted on disk.
  Q9: VDI finishes the flush and acknowledge the completion of the 
operation.
  Q10: ... (some other arbitrary operations, but the disk I/O for writing 
sector X is still not finished....)
  Q11: The host crashes

Now the new bitmap entry is not persisted on disk, while both writing to 
sector X+1 and the flush has been acknowledged as finished. Sector X+1 is 
lost, which is a corruption. This problem exists even if it uses O_DSYNC. 
The root cause of the problem is that, if a request updates in-memory 
s->bmap, another request that sees this update assumes that the update is 
already persisted on disk, which is not.

Bug 2: Similar to the bugs the FVD testing tool found for QCOW2, there are 
several cases of the code below on failure handling path without setting 
error return code, which mistakenly reports failure as success. This 
mistake is caught by FVD when doing image content validation.
       if (acb->hd_aiocb == NULL) {
           /* missing     ret = -EIO; */
            goto done; 
        } 

Bug 3: Similar to the bugs the FVD testing tool found for QCOW2, 
vdi_aio_cancel does not perform a complete clean up and there are several 
related bugs. First, memory buffer is not freed, acb->orig_buf and 
acb->block_buffer. Second, acb->bh is not cancelled. Third, 
vdi_aio_setup() does not initialize acb->bh to NULL so that when a request 
acb is cancelled and then later reused for another request, its acb->bh != 
NULL and the new request fails in  vdi_schedule_bh(). This is caught by 
FVD's testing tool, when it observes that no I/O failure is injected but 
VDI reports a failed I/O request, which indicates a bug in the driver."

http://permalink.gmane.org/gmane.comp.emulators.qemu/94340

** Affects: qemu
     Importance: Undecided
         Status: New

** Tags: block vdi

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/721825

Title:
  VDI block driver bugs

Status in QEMU:
  New

Bug description:
  Chunqiang Tang reports the following issues with the VDI block driver,
  these are present in QEMU 0.14:

  "Bug 1. The most serious bug is caused by race condition in updating a new 
  bmap entry in memory and on disk. Considering the following operation 
  sequence. 
    O1: VM issues a write to sector X
    O2: VDI allocates a new bmap entry and updates in-memory s->bmap
    O3: VDI writes data to disk
    O4: The disk I/O for writing sector X fails
    O5: VDI reports error to VM and returns.

  Note that the bmap entry is updated in memory, but not persisted on disk. 
  Now consider another write that immediately follows:
    P1: VM issues a write to sector X+1, which locates in the same block as 
  the previously used sector X.
    P2: s->bmap already has one entry for the block, and hence VDI writes 
  data directly without persisting the new s->bmap entry on disk.
    P3: The write disk I/O succeeds
    P4: VDI report success to VM, but the bitmap entry is still not 
  persisted on disk.

  Now suppose the VM powers off gracefully (i.e., the QEMU process quits) 
  and reboots. The second write to sector X+1, which is reported as finished 
  successfully, is simply lost, because the corresponding in-memory s->bmap 
  entry is never persisted on disk. This is exactly what FVD's testing tool 
  discovers. After the block device is closed and then re-opened, disk 
  content verification fails.

  This is just one example of the problem. Race condition plus host crash 
  also causes problems. Consider another example below.
    Q1: VM issues a write to sector X
    Q2: VDI allocates a new bmap entry and updates in-memory s->bmap
    Q3: VDI writes sector X to disk and waits for the callback
    Q4: VM issues a write to another sector X+1, which is in the same block 
  as sector X.
    Q5: VDI sees the bitmap entry in s->bmap is already allocated, and 
  writes sector X+1 to disk.
    Q6: Write to sector X+1 finishes, and VDI's callback is invoked.
    Q7: VDI acknowledges to the VM the completion of writing sector X+1
    Q8: After observing the completion of writing sector X+1, VM issues a 
  flush to ensure that sector X+1 is persisted on disk.
    Q9: VDI finishes the flush and acknowledge the completion of the 
  operation.
    Q10: ... (some other arbitrary operations, but the disk I/O for writing 
  sector X is still not finished....)
    Q11: The host crashes

  Now the new bitmap entry is not persisted on disk, while both writing to 
  sector X+1 and the flush has been acknowledged as finished. Sector X+1 is 
  lost, which is a corruption. This problem exists even if it uses O_DSYNC. 
  The root cause of the problem is that, if a request updates in-memory 
  s->bmap, another request that sees this update assumes that the update is 
  already persisted on disk, which is not.

  Bug 2: Similar to the bugs the FVD testing tool found for QCOW2, there are 
  several cases of the code below on failure handling path without setting 
  error return code, which mistakenly reports failure as success. This 
  mistake is caught by FVD when doing image content validation.
         if (acb->hd_aiocb == NULL) {
             /* missing     ret = -EIO; */
              goto done; 
          } 

  Bug 3: Similar to the bugs the FVD testing tool found for QCOW2, 
  vdi_aio_cancel does not perform a complete clean up and there are several 
  related bugs. First, memory buffer is not freed, acb->orig_buf and 
  acb->block_buffer. Second, acb->bh is not cancelled. Third, 
  vdi_aio_setup() does not initialize acb->bh to NULL so that when a request 
  acb is cancelled and then later reused for another request, its acb->bh != 
  NULL and the new request fails in  vdi_schedule_bh(). This is caught by 
  FVD's testing tool, when it observes that no I/O failure is injected but 
  VDI reports a failed I/O request, which indicates a bug in the driver."

  http://permalink.gmane.org/gmane.comp.emulators.qemu/94340

^ permalink raw reply	[flat|nested] 9+ messages in thread