From: Denis Lunev <den@virtuozzo.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mreitz@redhat.com" <mreitz@redhat.com>,
Denis Plotnikov <dplotnikov@virtuozzo.com>,
"kraxel@redhat.com" <kraxel@redhat.com>
Subject: Re: [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839
Date: Wed, 23 Oct 2019 14:37:56 +0000 [thread overview]
Message-ID: <6e1e0b88-25b3-3db1-5fdc-255190720646@virtuozzo.com> (raw)
In-Reply-To: <20191023141701.GD9574@stefanha-x1.localdomain>
On 10/23/19 5:17 PM, Stefan Hajnoczi wrote:
> On Tue, Oct 22, 2019 at 04:01:57AM +0000, Denis Lunev wrote:
>> On 10/21/19 4:24 PM, Stefan Hajnoczi wrote:
>>> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
>>>> From: "Denis V. Lunev" <den@openvz.org>
>>>>
>>>> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
>>>> field reported by SCSI controler. Thus typical sequential read with
>>>> 1 MB size results in the following pattern of the IO from the guest:
>>>> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd]
>>>> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd]
>>>> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd]
>>>> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0]
>>>> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0]
>>>> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0]
>>>> The IO was generated by
>>>> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
>>>>
>>>> This effectively means that on rotational disks we will observe 3 IOPS
>>>> for each 2 MBs processed. This definitely negatively affects both
>>>> guest and host IO performance.
>>>>
>>>> The cure is relatively simple - we should report lengthy scatter-gather
>>>> ability of the SCSI controller. Fortunately the situation here is very
>>>> good. VirtIO transport layer can accomodate 1024 items in one request
>>>> while we are using only 128. This situation is present since almost
>>>> very beginning. 2 items are dedicated for request metadata thus we
>>>> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.
>>>>
>>>> The following pattern is observed after the patch:
>>>> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd]
>>>> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd]
>>>> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0]
>>>> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0]
>>>> which is much better.
>>>>
>>>> The dark side of this patch is that we are tweaking guest visible
>>>> parameter, though this should be relatively safe as above transport
>>>> layer support is present in QEMU/host Linux for a very long time.
>>>> The patch adds configurable property for VirtIO SCSI with a new default
>>>> and hardcode option for VirtBlock which does not provide good
>>>> configurable framework.
>>>>
>>>> Unfortunately the commit can not be applied as is. For the real cure we
>>>> need guest to be fixed to accomodate that queue length, which is done
>>>> only in the latest 4.14 kernel. Thus we are going to expose the property
>>>> and tweak it on machine type level.
>>>>
>>>> The problem with the old kernels is that they have
>>>> max_segments <= virtqueue_size restriction which cause the guest
>>>> crashing in the case of violation.
>>>> To fix the case described above in the old kernels we can increase
>>>> virtqueue_size to 256 and max_segments to 254. The pitfall here is
>>>> that seabios allows the virtqueue_size-s < 128, however, the seabios
>>>> patch extending that value to 256 is pending.
>>> If I understand correctly you are relying on Indirect Descriptor support
>>> in the guest driver in order to exceed the Virtqueue Descriptor Table
>>> size.
>>>
>>> Unfortunately the "max_segments <= virtqueue_size restriction" is
>>> required by the VIRTIO 1.1 specification:
>>>
>>> 2.6.5.3.1 Driver Requirements: Indirect Descriptors
>>>
>>> A driver MUST NOT create a descriptor chain longer than the Queue
>>> Size of the device.
>>>
>>> So this idea seems to be in violation of the specification?
>>>
>>> There is a bug in hw/block/virtio-blk.c:virtio_blk_update_config() and
>>> hw/scsi/virtio-scsi.c:virtio_scsi_get_config():
>>>
>>> virtio_stl_p(vdev, &blkcfg.seg_max, 128 - 2);
>>>
>>> This number should be the minimum of blk_get_max_iov() and
>>> virtio_queue_get_num(), minus 2 for the header and footer.
>>>
>>> I looked at the Linux SCSI driver code and it seems each HBA has a
>>> single max_segments number - it does not vary on a per-device basis.
>>> This could be a problem if two host block device with different
>>> max_segments are exposed to the guest through the same virtio-scsi
>>> controller. Another bug? :(
>>>
>>> Anyway, if you want ~1024 descriptors you should set Queue Size to 1024.
>>> I don't see a spec-compliant way of doing it otherwise. Hopefully I
>>> have overlooked something and there is a nice way to solve this.
>>>
>>> Stefan
>> you are perfectly correct. We need actually 3 changes to improve
>> guest behavior:
>> 1) This patch, which adds property but does not change anything
>> useful
> This patch is problematic because it causes existing guest drivers to
> violate the VIRTIO specification (or fail) if the value is set too high.
> In addition, it does not take into account the virtqueue size so the
> default value is too low when the user sets -device ...,queue-size=1024.
>
> Let's calculate blkcfg.seg_max based on the virtqueue size as mentioned
> in my previous email instead.
As far as I understand maximum amount of segments could be more than
virtqueue size for indirect requests (allowed in VirtIO 1.0).
> There is one caveat with my suggestion: drivers are allowed to access
> VIRTIO Configuration Space before virtqueue setup has determined the
> final size. Therefore the value of this field can change after
> virtqueue setup. Drivers that set a custom virtqueue size would need to
> read the value after virtqueue setup. (Linux drivers do not modify the
> virtqueue size so it won't affect them.)
>
> Stefan
I think that we should do that a little bit different :) We can not
change max_segs
just if queue size is changed, this should be somehow bound to machine type.
Thus I propose to add "automatic" value, i.e.
if max_segs is set to 0 the code should set it to queue size -2.
This should be default. Alternatively the value from max_segs should be
taken. Will this work for you?
Please note, currently the specification could also be violated if we will
reduce queue size to 64 :)
Den
next prev parent reply other threads:[~2019-10-23 14:39 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 11:55 [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839 Denis Plotnikov
2019-10-21 13:24 ` Stefan Hajnoczi
2019-10-22 4:01 ` Denis Lunev
2019-10-23 14:17 ` Stefan Hajnoczi
2019-10-23 14:37 ` Denis Lunev [this message]
2019-10-23 9:13 ` Denis Plotnikov
2019-10-23 21:50 ` Michael S. Tsirkin
2019-10-23 21:28 ` Michael S. Tsirkin
2019-10-24 11:34 ` Denis Lunev
2019-11-06 12:03 ` Michael S. Tsirkin
2019-11-13 12:38 ` Denis Plotnikov
2019-11-13 13:18 ` Michael S. Tsirkin
2019-11-14 15:33 ` Denis Plotnikov
2019-11-25 9:16 ` Denis Plotnikov
2019-12-05 7:59 ` Denis Plotnikov
2019-12-13 12:24 ` [PING] " Denis Plotnikov
2019-12-13 12:40 ` Michael S. Tsirkin
2019-11-12 10:03 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e1e0b88-25b3-3db1-5fdc-255190720646@virtuozzo.com \
--to=den@virtuozzo.com \
--cc=dplotnikov@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).