On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> A lot of CPU time is spent simply locking/unlocking q->lock during
> polling. Check for completion outside the lock to make q->lock disappear
> from the profile.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index eb2f54dd9d..7eb4512666 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
>  
>      for (i = 0; i < s->nr_queues; i++) {
>          NVMeQueuePair *q = s->queues[i];
> +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> +
> +        /*
> +         * q->lock isn't needed for checking completion because
> +         * nvme_process_completion() only runs in the event loop thread and
> +         * cannot race with itself.
> +         */
> +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> +            continue;
> +        }
> +

IIUC, this is introducing an early check of the phase bit to determine
if there is something new in the queue.

I'm fine with this optimization, but I have the feeling that the
comment doesn't properly describe it.

Sergio.

>          qemu_mutex_lock(&q->lock);
>          while (nvme_process_completion(s, q)) {
>              /* Keep polling */
> -- 
> 2.25.3
>