Race condition in overlayed qcow2?

* Race condition in overlayed qcow2?
@ 2020-02-19 14:32 dovgaluk
  2020-02-19 16:07 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 19+ messages in thread
From: dovgaluk @ 2020-02-19 14:32 UTC (permalink / raw)
  To: vsementsov, qemu-devel, mreitz, kwolf

Hi!

I encountered a problem with record/replay of QEMU execution and figured 
out the following, when
QEMU is started with one virtual disk connected to the qcow2 image with 
applied 'snapshot' option.

The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2: 
introduce parallel subrequest handling in read and write"
introduces some kind of race condition, which causes difference in the 
data read from the disk.

I detected this by adding the following code, which logs IO operation 
checksum. And this checksum may be different in different runs of the 
same recorded execution.

logging in blk_aio_complete function:
         qemu_log("%"PRId64": blk_aio_complete\n", 
replay_get_current_icount());
         QEMUIOVector *qiov = acb->rwco.iobuf;
         if (qiov && qiov->iov) {
             size_t i, j;
             uint64_t sum = 0;
             int count = 0;
             for (i = 0 ; i < qiov->niov ; ++i) {
                 for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
                     sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
                     ++count;
                 }
             }
             qemu_log("--- iobuf offset %"PRIx64" len %x sum: 
%"PRIx64"\n", acb->rwco.offset, count, sum);
         }

I tried to get rid of aio task by patching qcow2_co_preadv_part:
ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes, 
qiov, qiov_offset);

That change fixed a bug, but I have no idea what to debug next to figure 
out the exact reason of the failure.

Do you have any ideas or hints?

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 19+ messages in thread