Re: [PATCH] virtio-blk: support polling I/O

From: Stefan Hajnoczi <stefanha@redhat.com>
To: Suwan Kim <suwan.kim027@gmail.com>
Cc: mst@redhat.com, jasowang@redhat.com, pbonzini@redhat.com,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org
Subject: Re: [PATCH] virtio-blk: support polling I/O
Date: Mon, 14 Mar 2022 15:19:01 +0000	[thread overview]
Message-ID: <Yi9c5bhdDrQ1pLDY@stefanha-x1.localdomain> (raw)
In-Reply-To: <20220311152832.17703-1-suwan.kim027@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4166 bytes --]

On Sat, Mar 12, 2022 at 12:28:32AM +0900, Suwan Kim wrote:
> This patch supports polling I/O via virtio-blk driver. Polling
> feature is enabled based on "VIRTIO_BLK_F_MQ" feature and the number
> of polling queues can be set by QEMU virtio-blk-pci property
> "num-poll-queues=N". This patch improves the polling I/O throughput
> and latency.
> 
> The virtio-blk driver doesn't not have a poll function and a poll
> queue and it has been operating in interrupt driven method even if
> the polling function is called in the upper layer.
> 
> virtio-blk polling is implemented upon 'batched completion' of block
> layer. virtblk_poll() queues completed request to io_comp_batch->req_list
> and later, virtblk_complete_batch() calls unmap function and ends
> the requests in batch.
> 
> virtio-blk reads the number of queues and poll queues from QEMU
> virtio-blk-pci properties ("num-queues=N", "num-poll-queues=M").
> It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
> as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
> queues, the poll queues have no callback function.
> 
> Regarding HW-SW queue mapping, the default queue mapping uses the
> existing method that condsiders MSI irq vector. But the poll queue
> doesn't have an irq, so it uses the regular blk-mq cpu mapping.
> 
> To enable poll queues, "num-poll-queues=N" property of virtio-blk-pci
> needs to be added to QEMU command line. For that, I temporarily
> implemented the property on QEMU. Please refer to the git repository below.
> 
> 	git : https://github.com/asfaca/qemu.git #on master branch commit
> 
> For verifying the improvement, I did Fio polling I/O performance test
> with io_uring engine with the options below.
> (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
> I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
> queues for VM.
> (-device virtio-blk-pci,num-queues=4,num-poll-queues=2)
> As a result, IOPS and average latency improved about 10%.
> 
> Test result:
> 
> - Fio io_uring poll without virtio-blk poll support
> 	-- numjobs=1 : IOPS = 297K, avg latency = 214.59us
> 	-- numjobs=2 : IOPS = 360K, avg latency = 363.88us
> 	-- numjobs=4 : IOPS = 289K, avg latency = 885.42us
> 
> - Fio io_uring poll with virtio-blk poll support
> 	-- numjobs=1 : IOPS = 332K, avg latency = 192.61us
> 	-- numjobs=2 : IOPS = 371K, avg latency = 348.31us
> 	-- numjobs=4 : IOPS = 321K, avg latency = 795.93us

Last year there was a patch series that switched regular queues into
polling queues when HIPRI requests were in flight:
https://lore.kernel.org/linux-block/20210520141305.355961-1-stefanha@redhat.com/T/

The advantage is that polling is possible without prior device
configuration, making it easier for users.

However, the dynamic approach is a bit more complex and bugs can result
in lost irqs (hung I/O). Christoph Hellwig asked for dedicated polling
queues, which your patch series now delivers.

I think your patch series is worth merging once the comments others have
already made have been addressed. I'll keep an eye out for the VIRTIO
spec change to extend the virtio-blk configuration space, which needs to
be accepted before the Linux can be merged.

> @@ -728,16 +749,82 @@ static const struct attribute_group *virtblk_attr_groups[] = {
>  static int virtblk_map_queues(struct blk_mq_tag_set *set)
>  {
>  	struct virtio_blk *vblk = set->driver_data;
> +	int i, qoff;
> +
> +	for (i = 0, qoff = 0; i < set->nr_maps; i++) {
> +		struct blk_mq_queue_map *map = &set->map[i];
> +
> +		map->nr_queues = vblk->io_queues[i];
> +		map->queue_offset = qoff;
> +		qoff += map->nr_queues;
> +
> +		if (map->nr_queues == 0)
> +			continue;
> +
> +		if (i == HCTX_TYPE_DEFAULT)
> +			blk_mq_virtio_map_queues(&set->map[i], vblk->vdev, 0);
> +		else
> +			blk_mq_map_queues(&set->map[i]);

A comment would be nice here to explain that regular queues have
interrupts and hence CPU affinity is defined by the core virtio code,
but polling queues have no interrupts so we let the block layer assign
CPU affinity.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]