Re: [Qemu-devel] [PATCH] block: don't probe zeroes in bs->file by default on block_status - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "armbru@redhat.com" <armbru@redhat.com>,
	"fam@euphon.net" <fam@euphon.net>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Denis Lunev <den@virtuozzo.com>
Subject: Re: [Qemu-devel] [PATCH] block: don't probe zeroes in bs->file by default on block_status
Date: Fri, 11 Jan 2019 10:13:17 +0000	[thread overview]
Message-ID: <b9bcc040-3831-d250-4b7f-ffe27984e243@virtuozzo.com> (raw)
In-Reply-To: <2ad46997-01aa-7a7a-ed53-4463a60ca564@virtuozzo.com>

11.01.2019 10:54, Vladimir Sementsov-Ogievskiy wrote:
> 10.01.2019 23:51, Eric Blake wrote:
>> On 1/10/19 7:20 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> drv_co_block_status digs bs->file for additional, more accurate search
>>> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
>>
>> s/region, reported/regions reported/
>>
>>>
>>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
>>> knows, where are holes and where is data. But every block_status
>>
>> Not quite true. qcow2 knows where some holes are, but does not know if
>> there are even more holes hiding in the sections marked data (such as
>> because of how the disk was pre-allocated).
>>
>>> request calls lseek additionally. Assume a big disk, full of
>>> data, in any iterative copying block job (or img convert) we'll call
>>> lseek(HOLE) on every iteration, and each of these lseeks will have to
>>> iterate through all metadata up to the end of file. It's obviously
>>> ineffective behavior. And for many scenarios we don't need this lseek
>>> at all.
>>
>> How much performance can we buy back without any knobs at all, if we
>> just taught posix-file.c to cache lseek() results?  That is, when
>> visiting a file sequentially, if lseek(fd, 0, SEEK_HOLE) returns EOF on
>> our first block status query, then all subsequent block status queries
>> fall within what we know to be data, and we can skip the lseek() calls.
> 
> EOF is bad mark I think. We may have a small hole not far from EOF, which
> will lead to the same performance, but not EOF returned.
> 
> I think, proper implementation of lseek cache should work, but it is more
> difficult than just disable lseek. And in scenarios without preallocation
> we don't need lseek, so again better is disable it.
> 
> And I don't sure that qemu is a proper place for lseek optimizations.
> 
> And another way may be to select cases when fiemap is safe and use it.
> 
> Again, for scenarios when we don't need nor lseek, nor fiemap, neither
> cache for them the best option is not use them.
> 
>>
>>>
>>> So, let's "5daa74a6ebc" by default, leaving an option to return
>>
>> s/let's/let's undo/
>>
>>> previous behavior, which is needed for scenarios with preallocated
>>> images.
>>>
>>> Add iotest illustrating new option semantics.
>>
>> And none of the existing iotests fail due to changed output?  Does that
>> mean our testsuite does not have good coverage of pre-allocation modes
>> where the zero probe mattered?
> 
> To be honest, I didn't run all the tests, only several.
> 
> Do it now, and found that, yes, at least 102 broken. Will fix it with the following
> version. Strange, do patchew run tests on patches in list now?
> 
>>
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>
>>
>>> +++ b/qapi/block-core.json
>>> @@ -3673,6 +3673,8 @@
>>>   #                 (default: off)
>>>   # @force-share:   force share all permission on added nodes.
>>>   #                 Requires read-only=true. (Since 2.10)
>>> +# @probe-zeroes:  Probe zeroes on protocol layer if format reports data
>>> +#                 (default: false) (since 4.0)
>>
>> Why do we need a new bool?  Can we instead...
>>
>>>   #
>>>   # Remaining options are determined by the block driver.
>>>   #
>>> @@ -3686,7 +3688,8 @@
>>>               '*read-only': 'bool',
>>>               '*auto-read-only': 'bool',
>>>               '*force-share': 'bool',
>>> -            '*detect-zeroes': 'BlockdevDetectZeroesOptions' },
>>> +            '*detect-zeroes': 'BlockdevDetectZeroesOptions',
>>> +            '*probe-zeroes': 'bool' },
>>
>> ...add another enum value to 'detect-zeroes', since after all, what we
>> are really doing is fine-tuning what level of zero probing we are
>> willing to do?
> 
> Should we? Or I just bind old behavior to detect-zeroes=on? And then, if needed,
> we'll add intermediate modes.
> 
>>
>>
>>> +++ b/qemu-options.hx
>>> @@ -615,6 +615,7 @@ DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
>>>       "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n"
>>>       "          [,cache.direct=on|off][,cache.no-flush=on|off]\n"
>>>       "          [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
>>> +    "          [,probe-zeroes=on|off]\n"
>>>       "          [,driver specific parameters...]\n"
>>>       "                configure a block backend\n", QEMU_ARCH_ALL)
>>>   STEXI
>>> @@ -670,6 +671,8 @@ discard requests.
>>>   conversion of plain zero writes by the OS to driver specific optimized
>>>   zero write commands. You may even choose "unmap" if @var{discard} is set
>>>   to "unmap" to allow a zero write to be converted to an @code{unmap} operation.
>>> +@item probe-zeroes=@var{probe-zeroes}
>>> +Probe zeroes on protocol layer if format reports data. Default is "off".
>>
>> Again, fitting this into detect-zeroes seems better than inventing a new
>> knob.
>>
> 
> No objections. But description of detect-zeroes are more about writes, should
> we change them somehow?
> 
> # Describes the operation mode for the automatic conversion of plain
> # zero writes by the OS to driver specific optimized zero write commands.
> 
> ...
> 
> # @detect-zeroes: detect and optimize zero writes (Since 2.1)
> 
> 
> 

Hm, just note, that detect-zeroes was about writes and should be set on target, and
the new thing is about block-status and should be set on source, and this thing should
be described in spec as well.

-- 
Best regards,
Vladimir