Re: [PATCH 1/1] virtio-blk: avoid preallocating big SGL for data

From: Christoph Hellwig <hch@infradead.org>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: hch@infradead.org, mst@redhat.com,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	stefanha@redhat.com, israelr@nvidia.com, nitzanc@nvidia.com,
	oren@nvidia.com, linux-block@vger.kernel.org, axboe@kernel.dk
Subject: Re: [PATCH 1/1] virtio-blk: avoid preallocating big SGL for data
Date: Wed, 1 Sep 2021 07:18:55 +0100	[thread overview]
Message-ID: <YS8bT23rdMfT/+AF@infradead.org> (raw)
In-Reply-To: <20210830233500.51395-1-mgurtovoy@nvidia.com>

On Tue, Aug 31, 2021 at 02:35:00AM +0300, Max Gurtovoy wrote:
> No need to pre-allocate a big buffer for the IO SGL anymore. If a device
> has lots of deep queues, preallocation for the sg list can consume
> substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
> can be 64 or 128 and each queue's depth might be 128. This means the
> resulting preallocation for the data SGLs is big.
> 
> Switch to runtime allocation for SGL for lists longer than 2 entries.
> This is the approach used by NVMe drivers so it should be reasonable for
> virtio block as well. Runtime SGL allocation has always been the case
> for the legacy I/O path so this is nothing new.
> 
> The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
> support SG_CHAIN, use only runtime allocation for the SGL.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> Reviewed-by: Israel Rukshin <israelr@nvidia.com>
> ---
>  drivers/block/virtio_blk.c | 37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 77e8468e8593..9a4c5d428b58 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -24,6 +24,12 @@
>  /* The maximum number of sg elements that fit into a virtqueue */
>  #define VIRTIO_BLK_MAX_SG_ELEMS 32768
>  
> +#ifdef CONFIG_ARCH_NO_SG_CHAIN
> +#define VIRTIO_BLK_INLINE_SG_CNT	0
> +#else
> +#define VIRTIO_BLK_INLINE_SG_CNT	2
> +#endif
> +
>  static int virtblk_queue_count_set(const char *val,
>  		const struct kernel_param *kp)
>  {
> @@ -99,7 +105,7 @@ struct virtio_blk {
>  struct virtblk_req {
>  	struct virtio_blk_outhdr out_hdr;
>  	u8 status;
> -	struct scatterlist sg[];
> +	struct sg_table sg_table;

Please keep the sg flexible array member here instead of the pointer
arithmetics that is added instead below.

> +	err = sg_alloc_table_chained(&vbr->sg_table,
> +				     blk_rq_nr_phys_segments(req),
> +				     vbr->sg_table.sgl,
> +				     VIRTIO_BLK_INLINE_SG_CNT);
> +	if (err)
> +		return BLK_STS_RESOURCE;
> +

This will BUG() for requests without segments (fush and discard).  You
probably want a separate helper to actually map data in the, extending
the big switch on the op.  While we're at it, the blk_mq_start_request
should also move as close as possible to the actual sending of the
request to the host.

You'll also need to select SG_POOL now that you're using these
functions.