On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote: > From: "Denis V. Lunev" > > Linux guests submit IO requests no longer than PAGE_SIZE * max_seg > field reported by SCSI controler. Thus typical sequential read with > 1 MB size results in the following pattern of the IO from the guest: > 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd] > 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd] > 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd] > 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0] > 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0] > 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0] > The IO was generated by > dd if=/dev/sda of=/dev/null bs=1024 iflag=direct > > This effectively means that on rotational disks we will observe 3 IOPS > for each 2 MBs processed. This definitely negatively affects both > guest and host IO performance. > > The cure is relatively simple - we should report lengthy scatter-gather > ability of the SCSI controller. Fortunately the situation here is very > good. VirtIO transport layer can accomodate 1024 items in one request > while we are using only 128. This situation is present since almost > very beginning. 2 items are dedicated for request metadata thus we > should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg. > > The following pattern is observed after the patch: > 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd] > 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd] > 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0] > 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0] > which is much better. > > The dark side of this patch is that we are tweaking guest visible > parameter, though this should be relatively safe as above transport > layer support is present in QEMU/host Linux for a very long time. > The patch adds configurable property for VirtIO SCSI with a new default > and hardcode option for VirtBlock which does not provide good > configurable framework. > > Unfortunately the commit can not be applied as is. For the real cure we > need guest to be fixed to accomodate that queue length, which is done > only in the latest 4.14 kernel. Thus we are going to expose the property > and tweak it on machine type level. > > The problem with the old kernels is that they have > max_segments <= virtqueue_size restriction which cause the guest > crashing in the case of violation. > To fix the case described above in the old kernels we can increase > virtqueue_size to 256 and max_segments to 254. The pitfall here is > that seabios allows the virtqueue_size-s < 128, however, the seabios > patch extending that value to 256 is pending. If I understand correctly you are relying on Indirect Descriptor support in the guest driver in order to exceed the Virtqueue Descriptor Table size. Unfortunately the "max_segments <= virtqueue_size restriction" is required by the VIRTIO 1.1 specification: 2.6.5.3.1 Driver Requirements: Indirect Descriptors A driver MUST NOT create a descriptor chain longer than the Queue Size of the device. So this idea seems to be in violation of the specification? There is a bug in hw/block/virtio-blk.c:virtio_blk_update_config() and hw/scsi/virtio-scsi.c:virtio_scsi_get_config(): virtio_stl_p(vdev, &blkcfg.seg_max, 128 - 2); This number should be the minimum of blk_get_max_iov() and virtio_queue_get_num(), minus 2 for the header and footer. I looked at the Linux SCSI driver code and it seems each HBA has a single max_segments number - it does not vary on a per-device basis. This could be a problem if two host block device with different max_segments are exposed to the guest through the same virtio-scsi controller. Another bug? :( Anyway, if you want ~1024 descriptors you should set Queue Size to 1024. I don't see a spec-compliant way of doing it otherwise. Hopefully I have overlooked something and there is a nice way to solve this. Stefan