All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
@ 2017-06-29 13:27 Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 01/11] qcow2: call CoQueue APIs under CoMutex Paolo Bonzini
                   ` (14 more replies)
  0 siblings, 15 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

This part takes care of drivers and devices, making sure that they can
accept concurrent I/O from multiple AioContext.

The following drivers are thread-safe without using any QemuMutex/CoMutex:
crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
because the patch fixed an unrelated testcase.

The following drivers already use mutexes for everything except possibly
snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
parallels, vhdx, vmdk, curl, iscsi, nfs.

The following drivers already use mutexes for _almost_ everything: vpc
(missing get_block_status), vdi (missing bitmap access), vvfat (missing
commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
They are fixed by patches 1-5.

The following drivers must be changed to use CoMutex to protect internal
data: qed (patches 6-9), sheepdog (patch 10).

The following driver must be changed to support I/O from any AioContext:
ssh.  It is fixed by patch 11.

Paolo

v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
        183 (bdrv_invalidate_cache from block migration)

Paolo Bonzini (11):
  qcow2: call CoQueue APIs under CoMutex
  coroutine-lock: add qemu_co_rwlock_downgrade and
    qemu_co_rwlock_upgrade
  vdi: make it thread-safe
  vpc: make it thread-safe
  vvfat: make it thread-safe
  qed: move tail of qed_aio_write_main to qed_aio_write_{cow,alloc}
  block: invoke .bdrv_drain callback in coroutine context and from
    AioContext
  qed: introduce bdrv_qed_init_state
  qed: protect table cache with CoMutex
  sheepdog: add queue_lock
  ssh: support I/O from any AioContext

 block/io.c                 |  42 +++++++--
 block/qcow2.c              |   4 +-
 block/qed-cluster.c        |   4 +-
 block/qed-l2-cache.c       |   6 ++
 block/qed-table.c          |  24 +++--
 block/qed.c                | 214 ++++++++++++++++++++++++++++-----------------
 block/qed.h                |  11 ++-
 block/sheepdog.c           |  21 ++++-
 block/ssh.c                |  24 +++--
 block/vdi.c                |  48 +++++-----
 block/vpc.c                |  20 ++---
 block/vvfat.c              |   8 +-
 include/block/block_int.h  |   2 +-
 include/qemu/coroutine.h   |  18 ++++
 util/qemu-coroutine-lock.c |  35 ++++++++
 15 files changed, 331 insertions(+), 150 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 01/11] qcow2: call CoQueue APIs under CoMutex
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 02/11] coroutine-lock: add qemu_co_rwlock_downgrade and qemu_co_rwlock_upgrade Paolo Bonzini
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2f94f0326e..70d3f4a18e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1741,8 +1741,6 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
     ret = 0;
 
 fail:
-    qemu_co_mutex_unlock(&s->lock);
-
     while (l2meta != NULL) {
         QCowL2Meta *next;
 
@@ -1756,6 +1754,8 @@ fail:
         l2meta = next;
     }
 
+    qemu_co_mutex_unlock(&s->lock);
+
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 02/11] coroutine-lock: add qemu_co_rwlock_downgrade and qemu_co_rwlock_upgrade
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 01/11] qcow2: call CoQueue APIs under CoMutex Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 03/11] vdi: make it thread-safe Paolo Bonzini
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

These functions are more efficient in the presence of contention.
qemu_co_rwlock_downgrade also guarantees not to block, which may
be useful in some algorithms too.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h   | 18 ++++++++++++++++++
 util/qemu-coroutine-lock.c | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index a4509bd977..9aff9a735e 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -229,6 +229,24 @@ void qemu_co_rwlock_init(CoRwlock *lock);
 void qemu_co_rwlock_rdlock(CoRwlock *lock);
 
 /**
+ * Write Locks the CoRwlock from a reader.  This is a bit more efficient than
+ * @qemu_co_rwlock_unlock followed by a separate @qemu_co_rwlock_wrlock.
+ * However, if the lock cannot be upgraded immediately, control is transferred
+ * to the caller of the current coroutine.  Also, @qemu_co_rwlock_upgrade
+ * only overrides CoRwlock fairness if there are no concurrent readers, so
+ * another writer might run while @qemu_co_rwlock_upgrade blocks.
+ */
+void qemu_co_rwlock_upgrade(CoRwlock *lock);
+
+/**
+ * Downgrades a write-side critical section to a reader.  Downgrading with
+ * @qemu_co_rwlock_downgrade never blocks, unlike @qemu_co_rwlock_unlock
+ * followed by @qemu_co_rwlock_rdlock.  This makes it more efficient, but
+ * may also sometimes be necessary for correctness.
+ */
+void qemu_co_rwlock_downgrade(CoRwlock *lock);
+
+/**
  * Write Locks the mutex. If the lock cannot be taken immediately because
  * of a parallel reader, control is transferred to the caller of the current
  * coroutine.
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index b44b5d55eb..846ff9167f 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -402,6 +402,21 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
     qemu_co_mutex_unlock(&lock->mutex);
 }
 
+void qemu_co_rwlock_downgrade(CoRwlock *lock)
+{
+    Coroutine *self = qemu_coroutine_self();
+
+    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
+     * qemu_co_rwlock_upgrade.
+     */
+    assert(lock->reader == 0);
+    lock->reader++;
+    qemu_co_mutex_unlock(&lock->mutex);
+
+    /* The rest of the read-side critical section is run without the mutex.  */
+    self->locks_held++;
+}
+
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
     qemu_co_mutex_lock(&lock->mutex);
@@ -416,3 +431,23 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock)
      * There is no need to update self->locks_held.
      */
 }
+
+void qemu_co_rwlock_upgrade(CoRwlock *lock)
+{
+    Coroutine *self = qemu_coroutine_self();
+
+    qemu_co_mutex_lock(&lock->mutex);
+    assert(lock->reader > 0);
+    lock->reader--;
+    lock->pending_writer++;
+    while (lock->reader) {
+        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    }
+    lock->pending_writer--;
+
+    /* The rest of the write-side critical section is run with
+     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
+     * not account for the lock twice in self->locks_held.
+     */
+    self->locks_held--;
+}
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 03/11] vdi: make it thread-safe
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 01/11] qcow2: call CoQueue APIs under CoMutex Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 02/11] coroutine-lock: add qemu_co_rwlock_downgrade and qemu_co_rwlock_upgrade Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 04/11] vpc: " Paolo Bonzini
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

The VirtualBox driver is using a mutex to order all allocating writes,
but it is not protecting accesses to the bitmap because they implicitly
happen under the AioContext mutex.  Change this to use a CoRwlock
explicitly.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vdi.c | 48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 79af47763b..080e1562e3 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -172,7 +172,7 @@ typedef struct {
     /* VDI header (converted to host endianness). */
     VdiHeader header;
 
-    CoMutex write_lock;
+    CoRwlock bmap_lock;
 
     Error *migration_blocker;
 } BDRVVdiState;
@@ -485,7 +485,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail_free_bmap;
     }
 
-    qemu_co_mutex_init(&s->write_lock);
+    qemu_co_rwlock_init(&s->bmap_lock);
 
     return 0;
 
@@ -557,7 +557,9 @@ vdi_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                n_bytes, offset);
 
         /* prepare next AIO request */
+        qemu_co_rwlock_rdlock(&s->bmap_lock);
         bmap_entry = le32_to_cpu(s->bmap[block_index]);
+        qemu_co_rwlock_unlock(&s->bmap_lock);
         if (!VDI_IS_ALLOCATED(bmap_entry)) {
             /* Block not allocated, return zeros, no need to wait. */
             qemu_iovec_memset(qiov, bytes_done, 0, n_bytes);
@@ -595,6 +597,7 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     uint32_t block_index;
     uint32_t offset_in_block;
     uint32_t n_bytes;
+    uint64_t data_offset;
     uint32_t bmap_first = VDI_UNALLOCATED;
     uint32_t bmap_last = VDI_UNALLOCATED;
     uint8_t *block = NULL;
@@ -614,10 +617,19 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                n_bytes, offset);
 
         /* prepare next AIO request */
+        qemu_co_rwlock_rdlock(&s->bmap_lock);
         bmap_entry = le32_to_cpu(s->bmap[block_index]);
         if (!VDI_IS_ALLOCATED(bmap_entry)) {
             /* Allocate new block and write to it. */
             uint64_t data_offset;
+            qemu_co_rwlock_upgrade(&s->bmap_lock);
+            bmap_entry = le32_to_cpu(s->bmap[block_index]);
+            if (VDI_IS_ALLOCATED(bmap_entry)) {
+                /* A concurrent allocation did the work for us.  */
+                qemu_co_rwlock_downgrade(&s->bmap_lock);
+                goto nonallocating_write;
+            }
+
             bmap_entry = s->header.blocks_allocated;
             s->bmap[block_index] = cpu_to_le32(bmap_entry);
             s->header.blocks_allocated++;
@@ -635,30 +647,18 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
             memset(block + offset_in_block + n_bytes, 0,
                    s->block_size - n_bytes - offset_in_block);
 
-            /* Note that this coroutine does not yield anywhere from reading the
-             * bmap entry until here, so in regards to all the coroutines trying
-             * to write to this cluster, the one doing the allocation will
-             * always be the first to try to acquire the lock.
-             * Therefore, it is also the first that will actually be able to
-             * acquire the lock and thus the padded cluster is written before
-             * the other coroutines can write to the affected area. */
-            qemu_co_mutex_lock(&s->write_lock);
+            /* Write the new block under CoRwLock write-side protection,
+             * so this full-cluster write does not overlap a partial write
+             * of the same cluster, issued from the "else" branch.
+             */
             ret = bdrv_pwrite(bs->file, data_offset, block, s->block_size);
-            qemu_co_mutex_unlock(&s->write_lock);
+            qemu_co_rwlock_unlock(&s->bmap_lock);
         } else {
-            uint64_t data_offset = s->header.offset_data +
-                                   (uint64_t)bmap_entry * s->block_size +
-                                   offset_in_block;
-            qemu_co_mutex_lock(&s->write_lock);
-            /* This lock is only used to make sure the following write operation
-             * is executed after the write issued by the coroutine allocating
-             * this cluster, therefore we do not need to keep it locked.
-             * As stated above, the allocating coroutine will always try to lock
-             * the mutex before all the other concurrent accesses to that
-             * cluster, therefore at this point we can be absolutely certain
-             * that that write operation has returned (there may be other writes
-             * in flight, but they do not concern this very operation). */
-            qemu_co_mutex_unlock(&s->write_lock);
+nonallocating_write:
+            data_offset = s->header.offset_data +
+                           (uint64_t)bmap_entry * s->block_size +
+                           offset_in_block;
+            qemu_co_rwlock_unlock(&s->bmap_lock);
 
             qemu_iovec_reset(&local_qiov);
             qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 04/11] vpc: make it thread-safe
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 03/11] vdi: make it thread-safe Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-07-10 11:32   ` Fam Zheng
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 05/11] vvfat: " Paolo Bonzini
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vpc.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 4240ba9d1c..0ff686540a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -496,12 +496,6 @@ static inline int64_t get_image_offset(BlockDriverState *bs, uint64_t offset,
     return block_offset;
 }
 
-static inline int64_t get_sector_offset(BlockDriverState *bs,
-                                        int64_t sector_num, bool write)
-{
-    return get_image_offset(bs, sector_num * BDRV_SECTOR_SIZE, write);
-}
-
 /*
  * Writes the footer to the end of the image file. This is needed when the
  * file grows as it overwrites the old footer
@@ -696,6 +690,7 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
     VHDFooter *footer = (VHDFooter*) s->footer_buf;
     int64_t start, offset;
     bool allocated;
+    int64_t ret;
     int n;
 
     if (be32_to_cpu(footer->type) == VHD_FIXED) {
@@ -705,10 +700,13 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
                (sector_num << BDRV_SECTOR_BITS);
     }
 
-    offset = get_sector_offset(bs, sector_num, 0);
+    qemu_co_mutex_lock(&s->lock);
+
+    offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, 0);
     start = offset;
     allocated = (offset != -1);
     *pnum = 0;
+    ret = 0;
 
     do {
         /* All sectors in a block are contiguous (without using the bitmap) */
@@ -723,15 +721,17 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
          * sectors since there is always a bitmap in between. */
         if (allocated) {
             *file = bs->file->bs;
-            return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+            ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+            break;
         }
         if (nb_sectors == 0) {
             break;
         }
-        offset = get_sector_offset(bs, sector_num, 0);
+        offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, 0);
     } while (offset == -1);
 
-    return 0;
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
 }
 
 /*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 05/11] vvfat: make it thread-safe
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 04/11] vpc: " Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 06/11] qed: move tail of qed_aio_write_main to qed_aio_write_{cow, alloc} Paolo Bonzini
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vvfat.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 8ab647c0c6..d2679c2ff5 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2982,8 +2982,14 @@ static int coroutine_fn
 write_target_commit(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                     QEMUIOVector *qiov, int flags)
 {
+    int ret;
+
     BDRVVVFATState* s = *((BDRVVVFATState**) bs->opaque);
-    return try_commit(s);
+    qemu_co_mutex_lock(&s->lock);
+    ret = try_commit(s);
+    qemu_co_mutex_unlock(&s->lock);
+
+    return ret;
 }
 
 static void write_target_close(BlockDriverState *bs) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 06/11] qed: move tail of qed_aio_write_main to qed_aio_write_{cow, alloc}
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 05/11] vvfat: " Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 07/11] block: invoke .bdrv_drain callback in coroutine context and from AioContext Paolo Bonzini
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

This part is never called for in-place writes, move it away to avoid
the "backwards" coding style typical of callback-based code.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qed.c | 70 ++++++++++++++++++++++++++++---------------------------------
 1 file changed, 32 insertions(+), 38 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 385381a78a..d593557522 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -982,40 +982,12 @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset = acb->cur_cluster +
                       qed_offset_into_cluster(s, acb->cur_pos);
-    int ret;
 
     trace_qed_aio_write_main(s, acb, 0, offset, acb->cur_qiov.size);
 
     BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
-    ret = bdrv_co_pwritev(s->bs->file, offset, acb->cur_qiov.size,
-                          &acb->cur_qiov, 0);
-    if (ret < 0) {
-        return ret;
-    }
-
-    if (acb->find_cluster_ret != QED_CLUSTER_FOUND) {
-        if (s->bs->backing) {
-            /*
-             * Flush new data clusters before updating the L2 table
-             *
-             * This flush is necessary when a backing file is in use.  A crash
-             * during an allocating write could result in empty clusters in the
-             * image.  If the write only touched a subregion of the cluster,
-             * then backing image sectors have been lost in the untouched
-             * region.  The solution is to flush after writing a new data
-             * cluster and before updating the L2 table.
-             */
-            ret = bdrv_co_flush(s->bs->file->bs);
-            if (ret < 0) {
-                return ret;
-            }
-        }
-        ret = qed_aio_write_l2_update(acb, acb->cur_cluster);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-    return 0;
+    return bdrv_co_pwritev(s->bs->file, offset, acb->cur_qiov.size,
+                           &acb->cur_qiov, 0);
 }
 
 /**
@@ -1050,7 +1022,29 @@ static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
         return ret;
     }
 
-    return qed_aio_write_main(acb);
+    ret = qed_aio_write_main(acb);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->bs->backing) {
+        /*
+         * Flush new data clusters before updating the L2 table
+         *
+         * This flush is necessary when a backing file is in use.  A crash
+         * during an allocating write could result in empty clusters in the
+         * image.  If the write only touched a subregion of the cluster,
+         * then backing image sectors have been lost in the untouched
+         * region.  The solution is to flush after writing a new data
+         * cluster and before updating the L2 table.
+         */
+        ret = bdrv_co_flush(s->bs->file->bs);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
 }
 
 /**
@@ -1103,6 +1097,7 @@ static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
         if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
             return 0;
         }
+        acb->cur_cluster = 1;
     } else {
         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     }
@@ -1115,15 +1110,14 @@ static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
         }
     }
 
-    if (acb->flags & QED_AIOCB_ZERO) {
-        ret = qed_aio_write_l2_update(acb, 1);
-    } else {
+    if (!(acb->flags & QED_AIOCB_ZERO)) {
         ret = qed_aio_write_cow(acb);
+        if (ret < 0) {
+            return ret;
+        }
     }
-    if (ret < 0) {
-        return ret;
-    }
-    return 0;
+
+    return qed_aio_write_l2_update(acb, acb->cur_cluster);
 }
 
 /**
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 07/11] block: invoke .bdrv_drain callback in coroutine context and from AioContext
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (5 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 06/11] qed: move tail of qed_aio_write_main to qed_aio_write_{cow, alloc} Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state Paolo Bonzini
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

This will let the callback take a CoMutex in the next patch.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c                | 42 +++++++++++++++++++++++++++++++++---------
 block/qed.c               |  6 +++---
 include/block/block_int.h |  2 +-
 3 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 9bba730a7e..68f19bbe69 100644
--- a/block/io.c
+++ b/block/io.c
@@ -149,6 +149,37 @@ bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
+typedef struct {
+    Coroutine *co;
+    BlockDriverState *bs;
+    bool done;
+} BdrvCoDrainData;
+
+static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
+{
+    BdrvCoDrainData *data = opaque;
+    BlockDriverState *bs = data->bs;
+
+    bs->drv->bdrv_co_drain(bs);
+
+    /* Set data->done before reading bs->wakeup.  */
+    atomic_mb_set(&data->done, true);
+    bdrv_wakeup(bs);
+}
+
+static void bdrv_drain_invoke(BlockDriverState *bs)
+{
+    BdrvCoDrainData data = { .bs = bs, .done = false };
+
+    if (!bs->drv || !bs->drv->bdrv_co_drain) {
+        return;
+    }
+
+    data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
+    bdrv_coroutine_enter(bs, data.co);
+    BDRV_POLL_WHILE(bs, !data.done);
+}
+
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
     BdrvChild *child, *tmp;
@@ -156,9 +187,8 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
 
     waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
 
-    if (bs->drv && bs->drv->bdrv_drain) {
-        bs->drv->bdrv_drain(bs);
-    }
+    /* Ensure any pending metadata writes are submitted to bs->file.  */
+    bdrv_drain_invoke(bs);
 
     QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
         BlockDriverState *bs = child->bs;
@@ -184,12 +214,6 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
     return waited;
 }
 
-typedef struct {
-    Coroutine *co;
-    BlockDriverState *bs;
-    bool done;
-} BdrvCoDrainData;
-
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
diff --git a/block/qed.c b/block/qed.c
index d593557522..db390efdbd 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -350,7 +350,7 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
     }
 }
 
-static void bdrv_qed_drain(BlockDriverState *bs)
+static void coroutine_fn bdrv_qed_co_drain(BlockDriverState *bs)
 {
     BDRVQEDState *s = bs->opaque;
 
@@ -359,7 +359,7 @@ static void bdrv_qed_drain(BlockDriverState *bs)
      */
     if (s->need_check_timer && timer_pending(s->need_check_timer)) {
         qed_cancel_need_check_timer(s);
-        qed_need_check_timer_cb(s);
+        qed_need_check_timer_entry(s);
     }
 }
 
@@ -1541,7 +1541,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_check               = bdrv_qed_check,
     .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
     .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
-    .bdrv_drain               = bdrv_qed_drain,
+    .bdrv_co_drain            = bdrv_qed_co_drain,
 };
 
 static void bdrv_qed_init(void)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602150..6d3fbbfc1e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -321,7 +321,7 @@ struct BlockDriver {
      * Drain and stop any internal sources of requests in the driver, and
      * remain so until next I/O callback (e.g. bdrv_co_writev) is called.
      */
-    void (*bdrv_drain)(BlockDriverState *bs);
+    void coroutine_fn (*bdrv_co_drain)(BlockDriverState *bs);
 
     void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
                            Error **errp);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (6 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 07/11] block: invoke .bdrv_drain callback in coroutine context and from AioContext Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 15:26   ` Eric Blake
  2017-07-17  3:36   ` Fam Zheng
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 09/11] qed: protect table cache with CoMutex Paolo Bonzini
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

This will be used in the next patch, which will call bdrv_qed_do_open
with a CoMutex taken.  bdrv_qed_init_state provides a nice place to
initialize it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        RFC->v2: new

 block/qed.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index db390efdbd..8228a50f68 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -363,6 +363,15 @@ static void coroutine_fn bdrv_qed_co_drain(BlockDriverState *bs)
     }
 }
 
+static void bdrv_qed_init_state(BlockDriverState *bs)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    memset(s, 0, sizeof(BDRVQEDState));
+    s->bs = bs;
+    qemu_co_queue_init(&s->allocating_write_reqs);
+}
+
 static int bdrv_qed_do_open(BlockDriverState *bs, QDict *options, int flags,
                             Error **errp)
 {
@@ -371,9 +380,6 @@ static int bdrv_qed_do_open(BlockDriverState *bs, QDict *options, int flags,
     int64_t file_size;
     int ret;
 
-    s->bs = bs;
-    qemu_co_queue_init(&s->allocating_write_reqs);
-
     ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
     if (ret < 0) {
         return ret;
@@ -507,6 +513,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    bdrv_qed_init_state(bs);
     return bdrv_qed_do_open(bs, options, flags, errp);
 }
 
@@ -1461,7 +1468,7 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs, Error **errp)
 
     bdrv_qed_close(bs);
 
-    memset(s, 0, sizeof(BDRVQEDState));
+    bdrv_qed_init_state(bs);
     ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 09/11] qed: protect table cache with CoMutex
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (7 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 10/11] sheepdog: add queue_lock Paolo Bonzini
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

This makes the driver thread-safe.  The CoMutex is dropped temporarily
while accessing the data clusters or the backing file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        RFC->v2: add bdrv_qed_invalidate_cache change; invalidate_cache
                 can run in a coroutine when called by block migration.

 block/qed-cluster.c  |   4 +-
 block/qed-l2-cache.c |   6 +++
 block/qed-table.c    |  24 +++++++---
 block/qed.c          | 133 +++++++++++++++++++++++++++++++++++----------------
 block/qed.h          |  11 +++--
 5 files changed, 124 insertions(+), 54 deletions(-)

diff --git a/block/qed-cluster.c b/block/qed-cluster.c
index d8d6e66a0f..672e2e654b 100644
--- a/block/qed-cluster.c
+++ b/block/qed-cluster.c
@@ -85,6 +85,8 @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
  *
  * On failure QED_CLUSTER_L2 or QED_CLUSTER_L1 is returned for missing L2 or L1
  * table offset, respectively. len is number of contiguous unallocated bytes.
+ *
+ * Called with table_lock held.
  */
 int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
                                   uint64_t pos, size_t *len,
@@ -112,7 +114,6 @@ int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
     }
 
     ret = qed_read_l2_table(s, request, l2_offset);
-    qed_acquire(s);
     if (ret) {
         goto out;
     }
@@ -137,6 +138,5 @@ int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
 
 out:
     *img_offset = offset;
-    qed_release(s);
     return ret;
 }
diff --git a/block/qed-l2-cache.c b/block/qed-l2-cache.c
index 5cba794650..b548362398 100644
--- a/block/qed-l2-cache.c
+++ b/block/qed-l2-cache.c
@@ -101,6 +101,8 @@ CachedL2Table *qed_alloc_l2_cache_entry(L2TableCache *l2_cache)
 /**
  * Decrease an entry's reference count and free if necessary when the reference
  * count drops to zero.
+ *
+ * Called with table_lock held.
  */
 void qed_unref_l2_cache_entry(CachedL2Table *entry)
 {
@@ -122,6 +124,8 @@ void qed_unref_l2_cache_entry(CachedL2Table *entry)
  *
  * For a cached entry, this function increases the reference count and returns
  * the entry.
+ *
+ * Called with table_lock held.
  */
 CachedL2Table *qed_find_l2_cache_entry(L2TableCache *l2_cache, uint64_t offset)
 {
@@ -150,6 +154,8 @@ CachedL2Table *qed_find_l2_cache_entry(L2TableCache *l2_cache, uint64_t offset)
  * N.B. This function steals a reference to the l2_table from the caller so the
  * caller must obtain a new reference by issuing a call to
  * qed_find_l2_cache_entry().
+ *
+ * Called with table_lock held.
  */
 void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table)
 {
diff --git a/block/qed-table.c b/block/qed-table.c
index ebee2c50f0..eead8b0fc7 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -18,6 +18,7 @@
 #include "qed.h"
 #include "qemu/bswap.h"
 
+/* Called either from qed_check or with table_lock held.  */
 static int qed_read_table(BDRVQEDState *s, uint64_t offset, QEDTable *table)
 {
     QEMUIOVector qiov;
@@ -32,18 +33,22 @@ static int qed_read_table(BDRVQEDState *s, uint64_t offset, QEDTable *table)
 
     trace_qed_read_table(s, offset, table);
 
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_unlock(&s->table_lock);
+    }
     ret = bdrv_preadv(s->bs->file, offset, &qiov);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_lock(&s->table_lock);
+    }
     if (ret < 0) {
         goto out;
     }
 
     /* Byteswap offsets */
-    qed_acquire(s);
     noffsets = qiov.size / sizeof(uint64_t);
     for (i = 0; i < noffsets; i++) {
         table->offsets[i] = le64_to_cpu(table->offsets[i]);
     }
-    qed_release(s);
 
     ret = 0;
 out:
@@ -61,6 +66,8 @@ out:
  * @index:      Index of first element
  * @n:          Number of elements
  * @flush:      Whether or not to sync to disk
+ *
+ * Called either from qed_check or with table_lock held.
  */
 static int qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
                            unsigned int index, unsigned int n, bool flush)
@@ -97,16 +104,20 @@ static int qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
     /* Adjust for offset into table */
     offset += start * sizeof(uint64_t);
 
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_unlock(&s->table_lock);
+    }
     ret = bdrv_pwritev(s->bs->file, offset, &qiov);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_lock(&s->table_lock);
+    }
     trace_qed_write_table_cb(s, table, flush, ret);
     if (ret < 0) {
         goto out;
     }
 
     if (flush) {
-        qed_acquire(s);
         ret = bdrv_flush(s->bs);
-        qed_release(s);
         if (ret < 0) {
             goto out;
         }
@@ -123,6 +134,7 @@ int qed_read_l1_table_sync(BDRVQEDState *s)
     return qed_read_table(s, s->header.l1_table_offset, s->l1_table);
 }
 
+/* Called either from qed_check or with table_lock held.  */
 int qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n)
 {
     BLKDBG_EVENT(s->bs->file, BLKDBG_L1_UPDATE);
@@ -136,6 +148,7 @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
     return qed_write_l1_table(s, index, n);
 }
 
+/* Called either from qed_check or with table_lock held.  */
 int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
 {
     int ret;
@@ -154,7 +167,6 @@ int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
     BLKDBG_EVENT(s->bs->file, BLKDBG_L2_LOAD);
     ret = qed_read_table(s, offset, request->l2_table->table);
 
-    qed_acquire(s);
     if (ret) {
         /* can't trust loaded L2 table anymore */
         qed_unref_l2_cache_entry(request->l2_table);
@@ -170,7 +182,6 @@ int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
         request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, offset);
         assert(request->l2_table != NULL);
     }
-    qed_release(s);
 
     return ret;
 }
@@ -180,6 +191,7 @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
     return qed_read_l2_table(s, request, offset);
 }
 
+/* Called either from qed_check or with table_lock held.  */
 int qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
                        unsigned int index, unsigned int n, bool flush)
 {
diff --git a/block/qed.c b/block/qed.c
index 8228a50f68..afad14fc01 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -93,6 +93,8 @@ int qed_write_header_sync(BDRVQEDState *s)
  *
  * This function only updates known header fields in-place and does not affect
  * extra data after the QED header.
+ *
+ * No new allocating reqs can start while this function runs.
  */
 static int coroutine_fn qed_write_header(BDRVQEDState *s)
 {
@@ -109,6 +111,8 @@ static int coroutine_fn qed_write_header(BDRVQEDState *s)
     QEMUIOVector qiov;
     int ret;
 
+    assert(s->allocating_acb || s->allocating_write_reqs_plugged);
+
     buf = qemu_blockalign(s->bs, len);
     iov = (struct iovec) {
         .iov_base = buf,
@@ -219,6 +223,8 @@ static int qed_read_string(BdrvChild *file, uint64_t offset, size_t n,
  * This function only produces the offset where the new clusters should be
  * written.  It updates BDRVQEDState but does not make any changes to the image
  * file.
+ *
+ * Called with table_lock held.
  */
 static uint64_t qed_alloc_clusters(BDRVQEDState *s, unsigned int n)
 {
@@ -236,6 +242,8 @@ QEDTable *qed_alloc_table(BDRVQEDState *s)
 
 /**
  * Allocate a new zeroed L2 table
+ *
+ * Called with table_lock held.
  */
 static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
 {
@@ -249,19 +257,32 @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
     return l2_table;
 }
 
-static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
+static bool qed_plug_allocating_write_reqs(BDRVQEDState *s)
 {
+    qemu_co_mutex_lock(&s->table_lock);
+
+    /* No reentrancy is allowed.  */
     assert(!s->allocating_write_reqs_plugged);
+    if (s->allocating_acb != NULL) {
+        /* Another allocating write came concurrently.  This cannot happen
+         * from bdrv_qed_co_drain, but it can happen when the timer runs.
+         */
+        qemu_co_mutex_unlock(&s->table_lock);
+        return false;
+    }
 
     s->allocating_write_reqs_plugged = true;
+    qemu_co_mutex_unlock(&s->table_lock);
+    return true;
 }
 
 static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
 {
+    qemu_co_mutex_lock(&s->table_lock);
     assert(s->allocating_write_reqs_plugged);
-
     s->allocating_write_reqs_plugged = false;
-    qemu_co_enter_next(&s->allocating_write_reqs);
+    qemu_co_queue_next(&s->allocating_write_reqs);
+    qemu_co_mutex_unlock(&s->table_lock);
 }
 
 static void coroutine_fn qed_need_check_timer_entry(void *opaque)
@@ -269,17 +290,14 @@ static void coroutine_fn qed_need_check_timer_entry(void *opaque)
     BDRVQEDState *s = opaque;
     int ret;
 
-    /* The timer should only fire when allocating writes have drained */
-    assert(!s->allocating_acb);
-
     trace_qed_need_check_timer_cb(s);
 
-    qed_acquire(s);
-    qed_plug_allocating_write_reqs(s);
+    if (!qed_plug_allocating_write_reqs(s)) {
+        return;
+    }
 
     /* Ensure writes are on disk before clearing flag */
     ret = bdrv_co_flush(s->bs->file->bs);
-    qed_release(s);
     if (ret < 0) {
         qed_unplug_allocating_write_reqs(s);
         return;
@@ -301,16 +319,6 @@ static void qed_need_check_timer_cb(void *opaque)
     qemu_coroutine_enter(co);
 }
 
-void qed_acquire(BDRVQEDState *s)
-{
-    aio_context_acquire(bdrv_get_aio_context(s->bs));
-}
-
-void qed_release(BDRVQEDState *s)
-{
-    aio_context_release(bdrv_get_aio_context(s->bs));
-}
-
 static void qed_start_need_check_timer(BDRVQEDState *s)
 {
     trace_qed_start_need_check_timer(s);
@@ -369,6 +377,7 @@ static void bdrv_qed_init_state(BlockDriverState *bs)
 
     memset(s, 0, sizeof(BDRVQEDState));
     s->bs = bs;
+    qemu_co_mutex_init(&s->table_lock);
     qemu_co_queue_init(&s->allocating_write_reqs);
 }
 
@@ -688,6 +697,7 @@ typedef struct {
     BlockDriverState **file;
 } QEDIsAllocatedCB;
 
+/* Called with table_lock held.  */
 static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len)
 {
     QEDIsAllocatedCB *cb = opaque;
@@ -735,6 +745,7 @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
     uint64_t offset;
     int ret;
 
+    qemu_co_mutex_lock(&s->table_lock);
     ret = qed_find_cluster(s, &request, cb.pos, &len, &offset);
     qed_is_allocated_cb(&cb, ret, offset, len);
 
@@ -742,6 +753,7 @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
     assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
 
     qed_unref_l2_cache_entry(request.l2_table);
+    qemu_co_mutex_unlock(&s->table_lock);
 
     return cb.status;
 }
@@ -872,6 +884,8 @@ out:
  *
  * The cluster offset may be an allocated byte offset in the image file, the
  * zero cluster marker, or the unallocated cluster marker.
+ *
+ * Called with table_lock held.
  */
 static void coroutine_fn qed_update_l2_table(BDRVQEDState *s, QEDTable *table,
                                              int index, unsigned int n,
@@ -887,6 +901,7 @@ static void coroutine_fn qed_update_l2_table(BDRVQEDState *s, QEDTable *table,
     }
 }
 
+/* Called with table_lock held.  */
 static void coroutine_fn qed_aio_complete(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
@@ -910,7 +925,7 @@ static void coroutine_fn qed_aio_complete(QEDAIOCB *acb)
     if (acb == s->allocating_acb) {
         s->allocating_acb = NULL;
         if (!qemu_co_queue_empty(&s->allocating_write_reqs)) {
-            qemu_co_enter_next(&s->allocating_write_reqs);
+            qemu_co_queue_next(&s->allocating_write_reqs);
         } else if (s->header.features & QED_F_NEED_CHECK) {
             qed_start_need_check_timer(s);
         }
@@ -919,6 +934,8 @@ static void coroutine_fn qed_aio_complete(QEDAIOCB *acb)
 
 /**
  * Update L1 table with new L2 table offset and write it out
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_l1_update(QEDAIOCB *acb)
 {
@@ -947,6 +964,8 @@ static int coroutine_fn qed_aio_write_l1_update(QEDAIOCB *acb)
 
 /**
  * Update L2 table with new cluster offsets and write them out
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
 {
@@ -983,6 +1002,8 @@ static int coroutine_fn qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
 
 /**
  * Write data to the image file
+ *
+ * Called with table_lock *not* held.
  */
 static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
 {
@@ -999,6 +1020,8 @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
 
 /**
  * Populate untouched regions of new data cluster
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
 {
@@ -1006,6 +1029,8 @@ static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
     uint64_t start, len, offset;
     int ret;
 
+    qemu_co_mutex_unlock(&s->table_lock);
+
     /* Populate front untouched region of new data cluster */
     start = qed_start_of_cluster(s, acb->cur_pos);
     len = qed_offset_into_cluster(s, acb->cur_pos);
@@ -1013,7 +1038,7 @@ static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
     trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
     ret = qed_copy_from_backing_file(s, start, len, acb->cur_cluster);
     if (ret < 0) {
-        return ret;
+        goto out;
     }
 
     /* Populate back untouched region of new data cluster */
@@ -1026,12 +1051,12 @@ static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
     trace_qed_aio_write_postfill(s, acb, start, len, offset);
     ret = qed_copy_from_backing_file(s, start, len, offset);
     if (ret < 0) {
-        return ret;
+        goto out;
     }
 
     ret = qed_aio_write_main(acb);
     if (ret < 0) {
-        return ret;
+        goto out;
     }
 
     if (s->bs->backing) {
@@ -1046,12 +1071,11 @@ static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
          * cluster and before updating the L2 table.
          */
         ret = bdrv_co_flush(s->bs->file->bs);
-        if (ret < 0) {
-            return ret;
-        }
     }
 
-    return 0;
+out:
+    qemu_co_mutex_lock(&s->table_lock);
+    return ret;
 }
 
 /**
@@ -1074,6 +1098,8 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
  * @len:        Length in bytes
  *
  * This path is taken when writing to previously unallocated clusters.
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
@@ -1088,7 +1114,7 @@ static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
     /* Freeze this request if another allocating write is in progress */
     if (s->allocating_acb != acb || s->allocating_write_reqs_plugged) {
         if (s->allocating_acb != NULL) {
-            qemu_co_queue_wait(&s->allocating_write_reqs, NULL);
+            qemu_co_queue_wait(&s->allocating_write_reqs, &s->table_lock);
             assert(s->allocating_acb == NULL);
         }
         s->allocating_acb = acb;
@@ -1135,10 +1161,17 @@ static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
  * @len:        Length in bytes
  *
  * This path is taken when writing to already allocated clusters.
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
                                               size_t len)
 {
+    BDRVQEDState *s = acb_to_s(acb);
+    int r;
+
+    qemu_co_mutex_unlock(&s->table_lock);
+
     /* Allocate buffer for zero writes */
     if (acb->flags & QED_AIOCB_ZERO) {
         struct iovec *iov = acb->qiov->iov;
@@ -1146,7 +1179,8 @@ static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
         if (!iov->iov_base) {
             iov->iov_base = qemu_try_blockalign(acb->bs, iov->iov_len);
             if (iov->iov_base == NULL) {
-                return -ENOMEM;
+                r = -ENOMEM;
+                goto out;
             }
             memset(iov->iov_base, 0, iov->iov_len);
         }
@@ -1156,8 +1190,11 @@ static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
     acb->cur_cluster = offset;
     qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
-    /* Do the actual write */
-    return qed_aio_write_main(acb);
+    /* Do the actual write.  */
+    r = qed_aio_write_main(acb);
+out:
+    qemu_co_mutex_lock(&s->table_lock);
+    return r;
 }
 
 /**
@@ -1167,6 +1204,8 @@ static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
  * @ret:        QED_CLUSTER_FOUND, QED_CLUSTER_L2 or QED_CLUSTER_L1
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_write_data(void *opaque, int ret,
                                            uint64_t offset, size_t len)
@@ -1198,6 +1237,8 @@ static int coroutine_fn qed_aio_write_data(void *opaque, int ret,
  * @ret:        QED_CLUSTER_FOUND, QED_CLUSTER_L2 or QED_CLUSTER_L1
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
+ *
+ * Called with table_lock held.
  */
 static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
                                           uint64_t offset, size_t len)
@@ -1205,6 +1246,9 @@ static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     BlockDriverState *bs = acb->bs;
+    int r;
+
+    qemu_co_mutex_unlock(&s->table_lock);
 
     /* Adjust offset into cluster */
     offset += qed_offset_into_cluster(s, acb->cur_pos);
@@ -1213,22 +1257,23 @@ static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
 
     qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
-    /* Handle zero cluster and backing file reads */
+    /* Handle zero cluster and backing file reads, otherwise read
+     * data cluster directly.
+     */
     if (ret == QED_CLUSTER_ZERO) {
         qemu_iovec_memset(&acb->cur_qiov, 0, 0, acb->cur_qiov.size);
-        return 0;
+        r = 0;
     } else if (ret != QED_CLUSTER_FOUND) {
-        return qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
-                                     &acb->backing_qiov);
+        r = qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
+                                  &acb->backing_qiov);
+    } else {
+        BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+        r = bdrv_co_preadv(bs->file, offset, acb->cur_qiov.size,
+                           &acb->cur_qiov, 0);
     }
 
-    BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-    ret = bdrv_co_preadv(bs->file, offset, acb->cur_qiov.size,
-                         &acb->cur_qiov, 0);
-    if (ret < 0) {
-        return ret;
-    }
-    return 0;
+    qemu_co_mutex_lock(&s->table_lock);
+    return r;
 }
 
 /**
@@ -1241,6 +1286,7 @@ static int coroutine_fn qed_aio_next_io(QEDAIOCB *acb)
     size_t len;
     int ret;
 
+    qemu_co_mutex_lock(&s->table_lock);
     while (1) {
         trace_qed_aio_next_io(s, acb, 0, acb->cur_pos + acb->cur_qiov.size);
 
@@ -1280,6 +1326,7 @@ static int coroutine_fn qed_aio_next_io(QEDAIOCB *acb)
 
     trace_qed_aio_complete(s, acb, ret);
     qed_aio_complete(acb);
+    qemu_co_mutex_unlock(&s->table_lock);
     return ret;
 }
 
@@ -1469,7 +1516,13 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs, Error **errp)
     bdrv_qed_close(bs);
 
     bdrv_qed_init_state(bs);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_lock(&s->table_lock);
+    }
     ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, &local_err);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_unlock(&s->table_lock);
+    }
     if (local_err) {
         error_propagate(errp, local_err);
         error_prepend(errp, "Could not reopen qed layer: ");
diff --git a/block/qed.h b/block/qed.h
index dd3a2d5519..f35341f134 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -151,15 +151,21 @@ typedef struct QEDAIOCB {
 
 typedef struct {
     BlockDriverState *bs;           /* device */
-    uint64_t file_size;             /* length of image file, in bytes */
 
+    /* Written only by an allocating write or the timer handler (the latter
+     * while allocating reqs are plugged).
+     */
     QEDHeader header;               /* always cpu-endian */
+
+    /* Protected by table_lock.  */
+    CoMutex table_lock;
     QEDTable *l1_table;
     L2TableCache l2_cache;          /* l2 table cache */
     uint32_t table_nelems;
     uint32_t l1_shift;
     uint32_t l2_shift;
     uint32_t l2_mask;
+    uint64_t file_size;             /* length of image file, in bytes */
 
     /* Allocating write request queue */
     QEDAIOCB *allocating_acb;
@@ -177,9 +183,6 @@ enum {
     QED_CLUSTER_L1,            /* cluster missing in L1 */
 };
 
-void qed_acquire(BDRVQEDState *s);
-void qed_release(BDRVQEDState *s);
-
 /**
  * Header functions
  */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 10/11] sheepdog: add queue_lock
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (8 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 09/11] qed: protect table cache with CoMutex Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext Paolo Bonzini
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/sheepdog.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 08d7b11e9d..a6013f0f17 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -390,6 +390,7 @@ struct BDRVSheepdogState {
     QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
     QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 
+    CoMutex queue_lock;
     CoQueue overlapping_queue;
     QLIST_HEAD(inflight_aiocb_head, SheepdogAIOCB) inflight_aiocb_head;
 };
@@ -488,7 +489,7 @@ static void wait_for_overlapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB *acb)
 retry:
     QLIST_FOREACH(cb, &s->inflight_aiocb_head, aiocb_siblings) {
         if (AIOCBOverlapping(acb, cb)) {
-            qemu_co_queue_wait(&s->overlapping_queue, NULL);
+            qemu_co_queue_wait(&s->overlapping_queue, &s->queue_lock);
             goto retry;
         }
     }
@@ -525,8 +526,10 @@ static void sd_aio_setup(SheepdogAIOCB *acb, BDRVSheepdogState *s,
         return;
     }
 
+    qemu_co_mutex_lock(&s->queue_lock);
     wait_for_overlapping_aiocb(s, acb);
     QLIST_INSERT_HEAD(&s->inflight_aiocb_head, acb, aiocb_siblings);
+    qemu_co_mutex_unlock(&s->queue_lock);
 }
 
 static SocketAddress *sd_socket_address(const char *path,
@@ -785,6 +788,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
      * have to move all the inflight requests to the failed queue before
      * resend_aioreq() is called.
      */
+    qemu_co_mutex_lock(&s->queue_lock);
     QLIST_FOREACH_SAFE(aio_req, &s->inflight_aio_head, aio_siblings, next) {
         QLIST_REMOVE(aio_req, aio_siblings);
         QLIST_INSERT_HEAD(&s->failed_aio_head, aio_req, aio_siblings);
@@ -794,8 +798,11 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
     while (!QLIST_EMPTY(&s->failed_aio_head)) {
         aio_req = QLIST_FIRST(&s->failed_aio_head);
         QLIST_REMOVE(aio_req, aio_siblings);
+        qemu_co_mutex_unlock(&s->queue_lock);
         resend_aioreq(s, aio_req);
+        qemu_co_mutex_lock(&s->queue_lock);
     }
+    qemu_co_mutex_unlock(&s->queue_lock);
 }
 
 /*
@@ -887,7 +894,10 @@ static void coroutine_fn aio_read_response(void *opaque)
     */
     s->co_recv = NULL;
 
+    qemu_co_mutex_lock(&s->queue_lock);
     QLIST_REMOVE(aio_req, aio_siblings);
+    qemu_co_mutex_unlock(&s->queue_lock);
+
     switch (rsp.result) {
     case SD_RES_SUCCESS:
         break;
@@ -1307,7 +1317,9 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
     uint64_t old_oid = aio_req->base_oid;
     bool create = aio_req->create;
 
+    qemu_co_mutex_lock(&s->queue_lock);
     QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
+    qemu_co_mutex_unlock(&s->queue_lock);
 
     if (!nr_copies) {
         error_report("bug");
@@ -1678,6 +1690,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     bs->total_sectors = s->inode.vdi_size / BDRV_SECTOR_SIZE;
     pstrcpy(s->name, sizeof(s->name), vdi);
     qemu_co_mutex_init(&s->lock);
+    qemu_co_mutex_init(&s->queue_lock);
     qemu_co_queue_init(&s->overlapping_queue);
     qemu_opts_del(opts);
     g_free(buf);
@@ -2431,12 +2444,16 @@ static void coroutine_fn sd_co_rw_vector(SheepdogAIOCB *acb)
 
 static void sd_aio_complete(SheepdogAIOCB *acb)
 {
+    BDRVSheepdogState *s;
     if (acb->aiocb_type == AIOCB_FLUSH_CACHE) {
         return;
     }
 
+    s = acb->s;
+    qemu_co_mutex_lock(&s->queue_lock);
     QLIST_REMOVE(acb, aiocb_siblings);
-    qemu_co_queue_restart_all(&acb->s->overlapping_queue);
+    qemu_co_queue_restart_all(&s->overlapping_queue);
+    qemu_co_mutex_unlock(&s->queue_lock);
 }
 
 static coroutine_fn int sd_co_writev(BlockDriverState *bs, int64_t sector_num,
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (9 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 10/11] sheepdog: add queue_lock Paolo Bonzini
@ 2017-06-29 13:27 ` Paolo Bonzini
  2017-07-05 21:57 ` [Qemu-devel] [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-06-29 13:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, qemu-block, kwolf

The coroutine may run in a different AioContext, causing the
fd handler to busy wait.  Fix this by resetting the handler
in restart_coroutine, before the coroutine is restarted.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/ssh.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 52964416da..e6a5b08de3 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -888,13 +888,22 @@ static int ssh_has_zero_init(BlockDriverState *bs)
     return has_zero_init;
 }
 
+typedef struct BDRVSSHRestart {
+    BlockDriverState *bs;
+    Coroutine *co;
+} BDRVSSHRestart;
+
 static void restart_coroutine(void *opaque)
 {
-    Coroutine *co = opaque;
+    BDRVSSHRestart *restart = opaque;
+    BlockDriverState *bs = restart->bs;
+    BDRVSSHState *s = bs->opaque;
+    AioContext *ctx = bdrv_get_aio_context(bs);
 
-    DPRINTF("co=%p", co);
+    DPRINTF("co=%p", restart->co);
+    aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL);
 
-    aio_co_wake(co);
+    aio_co_wake(restart->co);
 }
 
 /* A non-blocking call returned EAGAIN, so yield, ensuring the
@@ -905,7 +914,10 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
     IOHandler *rd_handler = NULL, *wr_handler = NULL;
-    Coroutine *co = qemu_coroutine_self();
+    BDRVSSHRestart restart = {
+        .bs = bs,
+        .co = qemu_coroutine_self()
+    };
 
     r = libssh2_session_block_directions(s->session);
 
@@ -920,11 +932,9 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
             rd_handler, wr_handler);
 
     aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
-                       false, rd_handler, wr_handler, NULL, co);
+                       false, rd_handler, wr_handler, NULL, &restart);
     qemu_coroutine_yield();
     DPRINTF("s->sock=%d - back", s->sock);
-    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, false,
-                       NULL, NULL, NULL, NULL);
 }
 
 /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state Paolo Bonzini
@ 2017-06-29 15:26   ` Eric Blake
  2017-07-17  3:36   ` Fam Zheng
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-06-29 15:26 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

On 06/29/2017 08:27 AM, Paolo Bonzini wrote:
> This will be used in the next patch, which will call bdrv_qed_do_open
> with a CoMutex taken.  bdrv_qed_init_state provides a nice place to
> initialize it.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         RFC->v2: new
> 
>  block/qed.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (10 preceding siblings ...)
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext Paolo Bonzini
@ 2017-07-05 21:57 ` Paolo Bonzini
  2017-07-06 23:50 ` [Qemu-devel] " no-reply
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-07-05 21:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block, stefanha

On 29/06/2017 15:27, Paolo Bonzini wrote:
> This part takes care of drivers and devices, making sure that they can
> accept concurrent I/O from multiple AioContext.
> 
> The following drivers are thread-safe without using any QemuMutex/CoMutex:
> crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
> because the patch fixed an unrelated testcase.
> 
> The following drivers already use mutexes for everything except possibly
> snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
> parallels, vhdx, vmdk, curl, iscsi, nfs.
> 
> The following drivers already use mutexes for _almost_ everything: vpc
> (missing get_block_status), vdi (missing bitmap access), vvfat (missing
> commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
> They are fixed by patches 1-5.
> 
> The following drivers must be changed to use CoMutex to protect internal
> data: qed (patches 6-9), sheepdog (patch 10).
> 
> The following driver must be changed to support I/O from any AioContext:
> ssh.  It is fixed by patch 11.
> 
> Paolo
> 
> v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
>         183 (bdrv_invalidate_cache from block migration)
> 
> Paolo Bonzini (11):
>   qcow2: call CoQueue APIs under CoMutex
>   coroutine-lock: add qemu_co_rwlock_downgrade and
>     qemu_co_rwlock_upgrade
>   vdi: make it thread-safe
>   vpc: make it thread-safe
>   vvfat: make it thread-safe
>   qed: move tail of qed_aio_write_main to qed_aio_write_{cow,alloc}
>   block: invoke .bdrv_drain callback in coroutine context and from
>     AioContext
>   qed: introduce bdrv_qed_init_state
>   qed: protect table cache with CoMutex
>   sheepdog: add queue_lock
>   ssh: support I/O from any AioContext
> 
>  block/io.c                 |  42 +++++++--
>  block/qcow2.c              |   4 +-
>  block/qed-cluster.c        |   4 +-
>  block/qed-l2-cache.c       |   6 ++
>  block/qed-table.c          |  24 +++--
>  block/qed.c                | 214 ++++++++++++++++++++++++++++-----------------
>  block/qed.h                |  11 ++-
>  block/sheepdog.c           |  21 ++++-
>  block/ssh.c                |  24 +++--
>  block/vdi.c                |  48 +++++-----
>  block/vpc.c                |  20 ++---
>  block/vvfat.c              |   8 +-
>  include/block/block_int.h  |   2 +-
>  include/qemu/coroutine.h   |  18 ++++
>  util/qemu-coroutine-lock.c |  35 ++++++++
>  15 files changed, 331 insertions(+), 150 deletions(-)
> 


ping?

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (11 preceding siblings ...)
  2017-07-05 21:57 ` [Qemu-devel] [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
@ 2017-07-06 23:50 ` no-reply
  2017-07-07  0:05   ` Fam Zheng
  2017-07-10 12:18 ` Stefan Hajnoczi
  2017-07-10 13:09 ` Fam Zheng
  14 siblings, 1 reply; 21+ messages in thread
From: no-reply @ 2017-07-06 23:50 UTC (permalink / raw)
  To: pbonzini; +Cc: famz, qemu-devel, kwolf, qemu-block, stefanha

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
Message-id: 20170629132749.997-1-pbonzini@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
fatal: Cannot update paths and switch to branch 'test' at the same time.
Did you intend to checkout 'origin/patchew/20170629132749.997-1-pbonzini@redhat.com' which can not be resolved as commit?
Traceback (most recent call last):
  File "/home/fam/bin/patchew", line 440, in test_one
    git_clone_repo(clone, r["repo"], r["head"], logf)
  File "/home/fam/bin/patchew", line 53, in git_clone_repo
    cwd=clone)
  File "/usr/lib64/python3.5/subprocess.py", line 271, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'checkout', 'origin/patchew/20170629132749.997-1-pbonzini@redhat.com', '-b', 'test']' returned non-zero exit status 128



---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
  2017-07-06 23:50 ` [Qemu-devel] " no-reply
@ 2017-07-07  0:05   ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-07-07  0:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, kwolf, qemu-block, stefanha

On Thu, 07/06 16:50, no-reply@patchew.org wrote:
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> fatal: Cannot update paths and switch to branch 'test' at the same time.
> Did you intend to checkout 'origin/patchew/20170629132749.997-1-pbonzini@redhat.com' which can not be resolved as commit?
> Traceback (most recent call last):
>   File "/home/fam/bin/patchew", line 440, in test_one
>     git_clone_repo(clone, r["repo"], r["head"], logf)
>   File "/home/fam/bin/patchew", line 53, in git_clone_repo
>     cwd=clone)
>   File "/usr/lib64/python3.5/subprocess.py", line 271, in check_call
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['git', 'checkout', 'origin/patchew/20170629132749.997-1-pbonzini@redhat.com', '-b', 'test']' returned non-zero exit status 128

Ignore this please, patchew is recovering from a bad state.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] vpc: make it thread-safe
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 04/11] vpc: " Paolo Bonzini
@ 2017-07-10 11:32   ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-07-10 11:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, qemu-block, stefanha

On Thu, 06/29 15:27, Paolo Bonzini wrote:
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/vpc.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index 4240ba9d1c..0ff686540a 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -496,12 +496,6 @@ static inline int64_t get_image_offset(BlockDriverState *bs, uint64_t offset,
>      return block_offset;
>  }
>  
> -static inline int64_t get_sector_offset(BlockDriverState *bs,
> -                                        int64_t sector_num, bool write)
> -{
> -    return get_image_offset(bs, sector_num * BDRV_SECTOR_SIZE, write);
> -}
> -
>  /*
>   * Writes the footer to the end of the image file. This is needed when the
>   * file grows as it overwrites the old footer
> @@ -696,6 +690,7 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
>      VHDFooter *footer = (VHDFooter*) s->footer_buf;
>      int64_t start, offset;
>      bool allocated;
> +    int64_t ret;
>      int n;
>  
>      if (be32_to_cpu(footer->type) == VHD_FIXED) {
> @@ -705,10 +700,13 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
>                 (sector_num << BDRV_SECTOR_BITS);
>      }
>  
> -    offset = get_sector_offset(bs, sector_num, 0);
> +    qemu_co_mutex_lock(&s->lock);
> +
> +    offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, 0);

You used false instead of 0 in v1 which was better, so this is a regression. :)

Can be fixed when applying.

>      start = offset;
>      allocated = (offset != -1);
>      *pnum = 0;
> +    ret = 0;
>  
>      do {
>          /* All sectors in a block are contiguous (without using the bitmap) */
> @@ -723,15 +721,17 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
>           * sectors since there is always a bitmap in between. */
>          if (allocated) {
>              *file = bs->file->bs;
> -            return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +            ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +            break;
>          }
>          if (nb_sectors == 0) {
>              break;
>          }
> -        offset = get_sector_offset(bs, sector_num, 0);
> +        offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, 0);

Ditto.

>      } while (offset == -1);
>  
> -    return 0;
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
>  }
>  
>  /*
> -- 
> 2.13.0
> 
> 
> 

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (12 preceding siblings ...)
  2017-07-06 23:50 ` [Qemu-devel] " no-reply
@ 2017-07-10 12:18 ` Stefan Hajnoczi
  2017-07-10 13:09 ` Fam Zheng
  14 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2017-07-10 12:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2631 bytes --]

On Thu, Jun 29, 2017 at 03:27:38PM +0200, Paolo Bonzini wrote:
> This part takes care of drivers and devices, making sure that they can
> accept concurrent I/O from multiple AioContext.
> 
> The following drivers are thread-safe without using any QemuMutex/CoMutex:
> crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
> because the patch fixed an unrelated testcase.
> 
> The following drivers already use mutexes for everything except possibly
> snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
> parallels, vhdx, vmdk, curl, iscsi, nfs.
> 
> The following drivers already use mutexes for _almost_ everything: vpc
> (missing get_block_status), vdi (missing bitmap access), vvfat (missing
> commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
> They are fixed by patches 1-5.
> 
> The following drivers must be changed to use CoMutex to protect internal
> data: qed (patches 6-9), sheepdog (patch 10).
> 
> The following driver must be changed to support I/O from any AioContext:
> ssh.  It is fixed by patch 11.
> 
> Paolo
> 
> v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
>         183 (bdrv_invalidate_cache from block migration)
> 
> Paolo Bonzini (11):
>   qcow2: call CoQueue APIs under CoMutex
>   coroutine-lock: add qemu_co_rwlock_downgrade and
>     qemu_co_rwlock_upgrade
>   vdi: make it thread-safe
>   vpc: make it thread-safe
>   vvfat: make it thread-safe
>   qed: move tail of qed_aio_write_main to qed_aio_write_{cow,alloc}
>   block: invoke .bdrv_drain callback in coroutine context and from
>     AioContext
>   qed: introduce bdrv_qed_init_state
>   qed: protect table cache with CoMutex
>   sheepdog: add queue_lock
>   ssh: support I/O from any AioContext
> 
>  block/io.c                 |  42 +++++++--
>  block/qcow2.c              |   4 +-
>  block/qed-cluster.c        |   4 +-
>  block/qed-l2-cache.c       |   6 ++
>  block/qed-table.c          |  24 +++--
>  block/qed.c                | 214 ++++++++++++++++++++++++++++-----------------
>  block/qed.h                |  11 ++-
>  block/sheepdog.c           |  21 ++++-
>  block/ssh.c                |  24 +++--
>  block/vdi.c                |  48 +++++-----
>  block/vpc.c                |  20 ++---
>  block/vvfat.c              |   8 +-
>  include/block/block_int.h  |   2 +-
>  include/qemu/coroutine.h   |  18 ++++
>  util/qemu-coroutine-lock.c |  35 ++++++++
>  15 files changed, 331 insertions(+), 150 deletions(-)
> 
> -- 
> 2.13.0
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2
  2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
                   ` (13 preceding siblings ...)
  2017-07-10 12:18 ` Stefan Hajnoczi
@ 2017-07-10 13:09 ` Fam Zheng
  14 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-07-10 13:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, qemu-block, stefanha

On Thu, 06/29 15:27, Paolo Bonzini wrote:
> This part takes care of drivers and devices, making sure that they can
> accept concurrent I/O from multiple AioContext.
> 
> The following drivers are thread-safe without using any QemuMutex/CoMutex:
> crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
> because the patch fixed an unrelated testcase.
> 
> The following drivers already use mutexes for everything except possibly
> snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
> parallels, vhdx, vmdk, curl, iscsi, nfs.
> 
> The following drivers already use mutexes for _almost_ everything: vpc
> (missing get_block_status), vdi (missing bitmap access), vvfat (missing
> commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
> They are fixed by patches 1-5.
> 
> The following drivers must be changed to use CoMutex to protect internal
> data: qed (patches 6-9), sheepdog (patch 10).
> 
> The following driver must be changed to support I/O from any AioContext:
> ssh.  It is fixed by patch 11.
> 
> Paolo
> 
> v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
>         183 (bdrv_invalidate_cache from block migration)

Thanks, queued:

https://github.com/famz/qemu/tree/staging

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state
  2017-06-29 13:27 ` [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state Paolo Bonzini
  2017-06-29 15:26   ` Eric Blake
@ 2017-07-17  3:36   ` Fam Zheng
  1 sibling, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-07-17  3:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, qemu-block, stefanha

On Thu, 06/29 15:27, Paolo Bonzini wrote:
> diff --git a/block/qed.c b/block/qed.c
> index db390efdbd..8228a50f68 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -363,6 +363,15 @@ static void coroutine_fn bdrv_qed_co_drain(BlockDriverState *bs)
>      }
>  }
>  
> +static void bdrv_qed_init_state(BlockDriverState *bs)
> +{
> +    BDRVQEDState *s = bs->opaque;

This line should be in the next patch, to keep the compiler happy
(-Wunused-variable). I'll fix it when sending pull request.

> +
> +    memset(s, 0, sizeof(BDRVQEDState));
> +    s->bs = bs;
> +    qemu_co_queue_init(&s->allocating_write_reqs);
> +}
> +

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext
  2017-05-31  9:43 [Qemu-devel] [RFC PATCH " Paolo Bonzini
@ 2017-05-31  9:43 ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-05-31  9:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, qemu-block, mreitz

The coroutine may run in a different AioContext, causing the
fd handler to busy wait.  Fix this by resetting the handler
in restart_coroutine, before the coroutine is restarted.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/ssh.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 11203fc5a2..b5a35824d7 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -889,13 +889,22 @@ static int ssh_has_zero_init(BlockDriverState *bs)
     return has_zero_init;
 }
 
+typedef struct BDRVSSHRestart {
+    BlockDriverState *bs;
+    Coroutine *co;
+} BDRVSSHRestart;
+
 static void restart_coroutine(void *opaque)
 {
-    Coroutine *co = opaque;
+    BDRVSSHRestart *restart = opaque;
+    BlockDriverState *bs = restart->bs;
+    BDRVSSHState *s = bs->opaque;
+    AioContext *ctx = bdrv_get_aio_context(bs);
 
-    DPRINTF("co=%p", co);
+    DPRINTF("co=%p", restart->co);
+    aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL);
 
-    aio_co_wake(co);
+    aio_co_wake(restart->co);
 }
 
 /* A non-blocking call returned EAGAIN, so yield, ensuring the
@@ -906,7 +915,10 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
     IOHandler *rd_handler = NULL, *wr_handler = NULL;
-    Coroutine *co = qemu_coroutine_self();
+    BDRVSSHRestart restart = {
+        .bs = bs,
+        .co = qemu_coroutine_self()
+    };
 
     r = libssh2_session_block_directions(s->session);
 
@@ -921,11 +933,9 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
             rd_handler, wr_handler);
 
     aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
-                       false, rd_handler, wr_handler, NULL, co);
+                       false, rd_handler, wr_handler, NULL, &restart);
     qemu_coroutine_yield();
     DPRINTF("s->sock=%d - back", s->sock);
-    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, false,
-                       NULL, NULL, NULL, NULL);
 }
 
 /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-07-17  3:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 13:27 [Qemu-devel] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 01/11] qcow2: call CoQueue APIs under CoMutex Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 02/11] coroutine-lock: add qemu_co_rwlock_downgrade and qemu_co_rwlock_upgrade Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 03/11] vdi: make it thread-safe Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 04/11] vpc: " Paolo Bonzini
2017-07-10 11:32   ` Fam Zheng
2017-06-29 13:27 ` [Qemu-devel] [PATCH 05/11] vvfat: " Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 06/11] qed: move tail of qed_aio_write_main to qed_aio_write_{cow, alloc} Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 07/11] block: invoke .bdrv_drain callback in coroutine context and from AioContext Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 08/11] qed: introduce bdrv_qed_init_state Paolo Bonzini
2017-06-29 15:26   ` Eric Blake
2017-07-17  3:36   ` Fam Zheng
2017-06-29 13:27 ` [Qemu-devel] [PATCH 09/11] qed: protect table cache with CoMutex Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 10/11] sheepdog: add queue_lock Paolo Bonzini
2017-06-29 13:27 ` [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext Paolo Bonzini
2017-07-05 21:57 ` [Qemu-devel] [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2 Paolo Bonzini
2017-07-06 23:50 ` [Qemu-devel] " no-reply
2017-07-07  0:05   ` Fam Zheng
2017-07-10 12:18 ` Stefan Hajnoczi
2017-07-10 13:09 ` Fam Zheng
  -- strict thread matches above, loose matches on Subject: below --
2017-05-31  9:43 [Qemu-devel] [RFC PATCH " Paolo Bonzini
2017-05-31  9:43 ` [Qemu-devel] [PATCH 11/11] ssh: support I/O from any AioContext Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.