On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote: > A lot of CPU time is spent simply locking/unlocking q->lock during > polling. Check for completion outside the lock to make q->lock disappear > from the profile. > > Signed-off-by: Stefan Hajnoczi > --- > block/nvme.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/block/nvme.c b/block/nvme.c > index eb2f54dd9d..7eb4512666 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) > > for (i = 0; i < s->nr_queues; i++) { > NVMeQueuePair *q = s->queues[i]; > + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; > + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; > + > + /* > + * q->lock isn't needed for checking completion because > + * nvme_process_completion() only runs in the event loop thread and > + * cannot race with itself. > + */ > + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { > + continue; > + } > + IIUC, this is introducing an early check of the phase bit to determine if there is something new in the queue. I'm fine with this optimization, but I have the feeling that the comment doesn't properly describe it. Sergio. > qemu_mutex_lock(&q->lock); > while (nvme_process_completion(s, q)) { > /* Keep polling */ > -- > 2.25.3 >