On Mon, Aug 16, 2021 at 09:42:46AM -0700, Elena Ufimtseva wrote:
> +void vfio_user_drain_reqs(VFIOProxy *proxy)
> +{
> +    VFIOUserReply *reply;
> +    bool iolock = 0;
> +
> +    /*
> +     * Any DMA map/unmap requests sent in the middle
> +     * of a memory region transaction were sent async.
> +     * Wait for them here.
> +     */
> +    QEMU_LOCK_GUARD(&proxy->lock);
> +    if (proxy->last_nowait != NULL) {
> +        iolock = qemu_mutex_iothread_locked();
> +        if (iolock) {
> +            qemu_mutex_unlock_iothread();
> +        }
> +
> +        reply = proxy->last_nowait;
> +        reply->nowait = 0;
> +        while (reply->complete == 0) {
> +            if (!qemu_cond_timedwait(&reply->cv, &proxy->lock, wait_time)) {
> +                error_printf("vfio_drain_reqs - timed out\n");
> +                break;
> +            }
> +        }
> +
> +        if (reply->msg->flags & VFIO_USER_ERROR) {
> +            error_printf("vfio_user_rcv error reply on async request ");
> +            error_printf("command %x error %s\n", reply->msg->command,
> +                         strerror(reply->msg->error_reply));
> +        }
> +        proxy->last_nowait = NULL;
> +        g_free(reply->msg);
> +        QTAILQ_INSERT_HEAD(&proxy->free, reply, next);
> +    }
> +
> +    if (iolock) {
> +        qemu_mutex_lock_iothread();
> +    }

Not sure this lock ordering is correct. Above we acquire proxy->lock
while holding the BQL and here we acquire the BQL while holding
proxy->lock. If another thread (e.g. a vCPU thread) does something
similar this is the ABBA lock ordering problem.

The more obviously correct way to write this is:

  WITH_QEMU_LOCK_GUARD(&proxy->lock) {
      ...
  }

  if (iolock) {
      qemu_mutex_lock_iothread();
  }

> +}
> +
>  static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd,
>                                    uint32_t size, uint32_t flags)
>  {
> @@ -715,6 +756,89 @@ int vfio_user_validate_version(VFIODevice *vbasedev, Error **errp)
>      return 0;
>  }
>  
> +int vfio_user_dma_map(VFIOProxy *proxy, struct vfio_iommu_type1_dma_map *map,
> +                      VFIOUserFDs *fds, bool will_commit)
> +{
> +    VFIOUserDMAMap *msgp = g_malloc(sizeof(*msgp));

Is this zero-initialized anywhere to guarantee that QEMU memory isn't
exposed to the VFIO device emulation program?

> +    int ret, flags;
> +
> +    /* commit will wait, so send async without dropping BQL */
> +    flags = will_commit ? (NOIOLOCK | NOWAIT) : 0;

Why is this distinction between will_commit and !will_commit necessary?
I get a sense that the network communications code drops the BQL and
that's undesirable here for some reason. I wonder why the code doesn't
take the NOWAIT code path all the time?