qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Denis Plotnikov <dplotnikov@virtuozzo.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>
Subject: Re: [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839
Date: Wed, 23 Oct 2019 09:13:12 +0000	[thread overview]
Message-ID: <8eabd34d-d94f-cdfd-3cc8-529cee9f6145@virtuozzo.com> (raw)
In-Reply-To: <20191021132455.GH22659@stefanha-x1.localdomain>


On 21.10.2019 16:24, Stefan Hajnoczi wrote:
> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
>> field reported by SCSI controler. Thus typical sequential read with
>> 1 MB size results in the following pattern of the IO from the guest:
>>    8,16   1    15754     2.766095122  2071  D   R 2095104 + 1008 [dd]
>>    8,16   1    15755     2.766108785  2071  D   R 2096112 + 1008 [dd]
>>    8,16   1    15756     2.766113486  2071  D   R 2097120 + 32 [dd]
>>    8,16   1    15757     2.767668961     0  C   R 2095104 + 1008 [0]
>>    8,16   1    15758     2.768534315     0  C   R 2096112 + 1008 [0]
>>    8,16   1    15759     2.768539782     0  C   R 2097120 + 32 [0]
>> The IO was generated by
>>    dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
>>
>> This effectively means that on rotational disks we will observe 3 IOPS
>> for each 2 MBs processed. This definitely negatively affects both
>> guest and host IO performance.
>>
>> The cure is relatively simple - we should report lengthy scatter-gather
>> ability of the SCSI controller. Fortunately the situation here is very
>> good. VirtIO transport layer can accomodate 1024 items in one request
>> while we are using only 128. This situation is present since almost
>> very beginning. 2 items are dedicated for request metadata thus we
>> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.
>>
>> The following pattern is observed after the patch:
>>    8,16   1     9921     2.662721340  2063  D   R 2095104 + 1024 [dd]
>>    8,16   1     9922     2.662737585  2063  D   R 2096128 + 1024 [dd]
>>    8,16   1     9923     2.665188167     0  C   R 2095104 + 1024 [0]
>>    8,16   1     9924     2.665198777     0  C   R 2096128 + 1024 [0]
>> which is much better.
>>
>> The dark side of this patch is that we are tweaking guest visible
>> parameter, though this should be relatively safe as above transport
>> layer support is present in QEMU/host Linux for a very long time.
>> The patch adds configurable property for VirtIO SCSI with a new default
>> and hardcode option for VirtBlock which does not provide good
>> configurable framework.
>>
>> Unfortunately the commit can not be applied as is. For the real cure we
>> need guest to be fixed to accomodate that queue length, which is done
>> only in the latest 4.14 kernel. Thus we are going to expose the property
>> and tweak it on machine type level.
>>
>> The problem with the old kernels is that they have
>> max_segments <= virtqueue_size restriction which cause the guest
>> crashing in the case of violation.
>> To fix the case described above in the old kernels we can increase
>> virtqueue_size to 256 and max_segments to 254. The pitfall here is
>> that seabios allows the virtqueue_size-s < 128, however, the seabios
>> patch extending that value to 256 is pending.
> If I understand correctly you are relying on Indirect Descriptor support
> in the guest driver in order to exceed the Virtqueue Descriptor Table
> size.
>
> Unfortunately the "max_segments <= virtqueue_size restriction" is
> required by the VIRTIO 1.1 specification:
>
>    2.6.5.3.1 Driver Requirements: Indirect Descriptors
>
>    A driver MUST NOT create a descriptor chain longer than the Queue
>    Size of the device.
>
> So this idea seems to be in violation of the specification?
>
> There is a bug in hw/block/virtio-blk.c:virtio_blk_update_config() and
> hw/scsi/virtio-scsi.c:virtio_scsi_get_config():
>
>    virtio_stl_p(vdev, &blkcfg.seg_max, 128 - 2);
>
> This number should be the minimum of blk_get_max_iov() and
> virtio_queue_get_num(), minus 2 for the header and footer.

Stefan,

It seems VitrioSCSI don't have a direct link to blk, apart of 
VirtIOBlock->blk, and the link to a blk comes with each scsi request. I 
suspect that idea here is that a single virtioscsi can serve several 
blk-s. If my assumption is corect, then we can't get blk_get_max_iov() 
on virtioscsi configuration stage and we shouldn't take into account 
max_iov and limit max_segments with virtio_queue_get_num()-2 only.

Is it so, or is there any other details to take into account?

Thanks!

Denis

>
> I looked at the Linux SCSI driver code and it seems each HBA has a
> single max_segments number - it does not vary on a per-device basis.
> This could be a problem if two host block device with different
> max_segments are exposed to the guest through the same virtio-scsi
> controller.  Another bug? :(
>
> Anyway, if you want ~1024 descriptors you should set Queue Size to 1024.
> I don't see a spec-compliant way of doing it otherwise.  Hopefully I
> have overlooked something and there is a nice way to solve this.
>
> Stefan


  parent reply	other threads:[~2019-10-23  9:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 11:55 [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839 Denis Plotnikov
2019-10-21 13:24 ` Stefan Hajnoczi
2019-10-22  4:01   ` Denis Lunev
2019-10-23 14:17     ` Stefan Hajnoczi
2019-10-23 14:37       ` Denis Lunev
2019-10-23  9:13   ` Denis Plotnikov [this message]
2019-10-23 21:50   ` Michael S. Tsirkin
2019-10-23 21:28 ` Michael S. Tsirkin
2019-10-24 11:34   ` Denis Lunev
2019-11-06 12:03     ` Michael S. Tsirkin
2019-11-13 12:38       ` Denis Plotnikov
2019-11-13 13:18         ` Michael S. Tsirkin
2019-11-14 15:33           ` Denis Plotnikov
2019-11-25  9:16       ` Denis Plotnikov
2019-12-05  7:59         ` Denis Plotnikov
2019-12-13 12:24           ` [PING] " Denis Plotnikov
2019-12-13 12:40         ` Michael S. Tsirkin
2019-11-12 10:03   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8eabd34d-d94f-cdfd-3cc8-529cee9f6145@virtuozzo.com \
    --to=dplotnikov@virtuozzo.com \
    --cc=den@virtuozzo.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).