All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] callout to *file in bdrv_co_get_block_status
@ 2017-03-17 10:45 Peter Lieven
  2017-03-17 10:59 ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-17 10:45 UTC (permalink / raw)
  To: qemu block, qemu-devel; +Cc: Fam Zheng, Paolo Bonzini

Hi,


I tried to debug why qemu-img convert with a VMDK source laying on a tmpfs is horrible slow.

For some reason a lseek on a tmpfs is slow. Strictly speaking the lseek in find_allocation in file-posix.c

is slow.


When qemu-img convert iterates over all sectors of a VMDK file to check their allocation status it ends

up checking allocation status of all allocated sectors due to the following condition in bdrv_co_get_block_status:


    if (*file && *file != bs &&
        (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
        (ret & BDRV_BLOCK_OFFSET_VALID)) {
        BlockDriverState *file2;
        int file_pnum;
        ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
                                        *pnum, &file_pnum, &file2);
        if (ret2 >= 0) {
            /* Ignore errors.  This is just providing extra information, it
             * is useful but not necessary.
             */
            if (!file_pnum) {
                /* !file_pnum indicates an offset at or beyond the EOF; it is
                 * perfectly valid for the format block driver to point to such
                 * offsets, so catch it and mark everything as zero */
                ret |= BDRV_BLOCK_ZERO;
            } else {
                /* Limit request to the range reported by the protocol driver */
                *pnum = file_pnum;
                ret |= (ret2 & BDRV_BLOCK_ZERO);
            }
        }
    }


Does anybody remember for what circumstances this case this was added? In case of an container format

like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?


Thanks,

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 10:45 [Qemu-devel] callout to *file in bdrv_co_get_block_status Peter Lieven
@ 2017-03-17 10:59 ` Paolo Bonzini
  2017-03-17 11:11   ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-17 10:59 UTC (permalink / raw)
  To: Peter Lieven, qemu block, qemu-devel; +Cc: Fam Zheng



On 17/03/2017 11:45, Peter Lieven wrote:
> Hi,
> 
> 
> I tried to debug why qemu-img convert with a VMDK source laying on a tmpfs is horrible slow.
> 
> For some reason a lseek on a tmpfs is slow. Strictly speaking the lseek in find_allocation in file-posix.c
> 
> is slow.
> 
> 
> When qemu-img convert iterates over all sectors of a VMDK file to check their allocation status it ends
> 
> up checking allocation status of all allocated sectors due to the following condition in bdrv_co_get_block_status:
> 
> 
>     if (*file && *file != bs &&
>         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>         (ret & BDRV_BLOCK_OFFSET_VALID)) {
>         BlockDriverState *file2;
>         int file_pnum;
>         ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
>                                         *pnum, &file_pnum, &file2);
>         if (ret2 >= 0) {
>             /* Ignore errors.  This is just providing extra information, it
>              * is useful but not necessary.
>              */
>             if (!file_pnum) {
>                 /* !file_pnum indicates an offset at or beyond the EOF; it is
>                  * perfectly valid for the format block driver to point to such
>                  * offsets, so catch it and mark everything as zero */
>                 ret |= BDRV_BLOCK_ZERO;
>             } else {
>                 /* Limit request to the range reported by the protocol driver */
>                 *pnum = file_pnum;
>                 ret |= (ret2 & BDRV_BLOCK_ZERO);
>             }
>         }
>     }
> 
> 
> Does anybody remember for what circumstances this case this was added? In case of an container format
> 
> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?

It provides additional information, for example it works better with
prealloc=metadata.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 10:59 ` Paolo Bonzini
@ 2017-03-17 11:11   ` Peter Lieven
  2017-03-17 11:16     ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-17 11:11 UTC (permalink / raw)
  To: Paolo Bonzini, qemu block, qemu-devel; +Cc: Fam Zheng

Am 17.03.2017 um 11:59 schrieb Paolo Bonzini:
>
> On 17/03/2017 11:45, Peter Lieven wrote:
>> Hi,
>>
>>
>> I tried to debug why qemu-img convert with a VMDK source laying on a tmpfs is horrible slow.
>>
>> For some reason a lseek on a tmpfs is slow. Strictly speaking the lseek in find_allocation in file-posix.c
>>
>> is slow.
>>
>>
>> When qemu-img convert iterates over all sectors of a VMDK file to check their allocation status it ends
>>
>> up checking allocation status of all allocated sectors due to the following condition in bdrv_co_get_block_status:
>>
>>
>>     if (*file && *file != bs &&
>>         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>>         (ret & BDRV_BLOCK_OFFSET_VALID)) {
>>         BlockDriverState *file2;
>>         int file_pnum;
>>         ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
>>                                         *pnum, &file_pnum, &file2);
>>         if (ret2 >= 0) {
>>             /* Ignore errors.  This is just providing extra information, it
>>              * is useful but not necessary.
>>              */
>>             if (!file_pnum) {
>>                 /* !file_pnum indicates an offset at or beyond the EOF; it is
>>                  * perfectly valid for the format block driver to point to such
>>                  * offsets, so catch it and mark everything as zero */
>>                 ret |= BDRV_BLOCK_ZERO;
>>             } else {
>>                 /* Limit request to the range reported by the protocol driver */
>>                 *pnum = file_pnum;
>>                 ret |= (ret2 & BDRV_BLOCK_ZERO);
>>             }
>>         }
>>     }
>>
>>
>> Does anybody remember for what circumstances this case this was added? In case of an container format
>>
>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
> It provides additional information, for example it works better with
> prealloc=metadata.

Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 11:11   ` Peter Lieven
@ 2017-03-17 11:16     ` Paolo Bonzini
  2017-03-17 11:20       ` Peter Lieven
  2017-03-17 11:24       ` Fam Zheng
  0 siblings, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-17 11:16 UTC (permalink / raw)
  To: Peter Lieven, qemu block, qemu-devel; +Cc: Fam Zheng



On 17/03/2017 12:11, Peter Lieven wrote:
>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>> It provides additional information, for example it works better with
>> prealloc=metadata.
> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?

If the metadata is preallocated, cluster will (or should) show up as
zero, speeding up the copy.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 11:16     ` Paolo Bonzini
@ 2017-03-17 11:20       ` Peter Lieven
  2017-03-20  2:46         ` Fam Zheng
  2017-03-17 11:24       ` Fam Zheng
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-17 11:20 UTC (permalink / raw)
  To: Paolo Bonzini, qemu block, qemu-devel; +Cc: Fam Zheng

Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>
> On 17/03/2017 12:11, Peter Lieven wrote:
>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>> It provides additional information, for example it works better with
>>> prealloc=metadata.
>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
> If the metadata is preallocated, cluster will (or should) show up as
> zero, speeding up the copy.

Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
So where does it actually help?

The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))

So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
BDRV_BLOCK_ZERO.

This can only happen if I partially write to a cluster, or am I wrong here?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 11:16     ` Paolo Bonzini
  2017-03-17 11:20       ` Peter Lieven
@ 2017-03-17 11:24       ` Fam Zheng
  2017-03-17 14:51         ` Paolo Bonzini
  1 sibling, 1 reply; 23+ messages in thread
From: Fam Zheng @ 2017-03-17 11:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Peter Lieven, qemu block, qemu-devel

On Fri, 03/17 12:16, Paolo Bonzini wrote:
> 
> 
> On 17/03/2017 12:11, Peter Lieven wrote:
> >>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
> >> It provides additional information, for example it works better with
> >> prealloc=metadata.
> > Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
> > lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
> > it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
> 
> If the metadata is preallocated, cluster will (or should) show up as
> zero, speeding up the copy.

I think from qemu-img convert's perspective, it doesn't care about the *file
status if the metadata already speaks, because, like you said, the data shows up
as zeroes.

In other words I think this can be optimized.

Fam

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 11:24       ` Fam Zheng
@ 2017-03-17 14:51         ` Paolo Bonzini
  2017-03-18 16:16           ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-17 14:51 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Peter Lieven, qemu-devel, qemu block



On 17/03/2017 12:24, Fam Zheng wrote:
> On Fri, 03/17 12:16, Paolo Bonzini wrote:
>>
>>
>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>> It provides additional information, for example it works better with
>>>> prealloc=metadata.
>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>
>> If the metadata is preallocated, cluster will (or should) show up as
>> zero, speeding up the copy.
> 
> I think from qemu-img convert's perspective, it doesn't care about the *file
> status if the metadata already speaks, because, like you said, the data shows up
> as zeroes.

That's already the case: *file is only examined if the metadata  says
BDRV_BLOCK_DATA=1, BDRV_BLOCK_ZERO=0.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 14:51         ` Paolo Bonzini
@ 2017-03-18 16:16           ` Peter Lieven
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Lieven @ 2017-03-18 16:16 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: qemu-devel, qemu block

Am 17.03.2017 um 15:51 schrieb Paolo Bonzini:
>
> On 17/03/2017 12:24, Fam Zheng wrote:
>> On Fri, 03/17 12:16, Paolo Bonzini wrote:
>>>
>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>> It provides additional information, for example it works better with
>>>>> prealloc=metadata.
>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>> If the metadata is preallocated, cluster will (or should) show up as
>>> zero, speeding up the copy.
>> I think from qemu-img convert's perspective, it doesn't care about the *file
>> status if the metadata already speaks, because, like you said, the data shows up
>> as zeroes.
> That's already the case: *file is only examined if the metadata  says
> BDRV_BLOCK_DATA=1, BDRV_BLOCK_ZERO=0.

Maybe Fam meant in qemu-img this info is not that necessary because it will skip zeroes inside
a datablock anyway. I don't know why the lseek is soo slow, but the optimization could be to identify
the case where this extra info of the *file containing zeroes is really useful and then only call out for
it in those cases.

Peter

>
> Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-17 11:20       ` Peter Lieven
@ 2017-03-20  2:46         ` Fam Zheng
  2017-03-20 11:21           ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Fam Zheng @ 2017-03-20  2:46 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Paolo Bonzini, qemu block, qemu-devel

On Fri, 03/17 12:20, Peter Lieven wrote:
> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
> >
> > On 17/03/2017 12:11, Peter Lieven wrote:
> >>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
> >>> It provides additional information, for example it works better with
> >>> prealloc=metadata.
> >> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
> >> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
> >> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
> > If the metadata is preallocated, cluster will (or should) show up as
> > zero, speeding up the copy.
> 
> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
> So where does it actually help?
> 
> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
> 
> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
> BDRV_BLOCK_ZERO.
> 
> This can only happen if I partially write to a cluster, or am I wrong here?

I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
know.

Fam

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20  2:46         ` Fam Zheng
@ 2017-03-20 11:21           ` Paolo Bonzini
  2017-03-20 11:49             ` Fam Zheng
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-20 11:21 UTC (permalink / raw)
  To: Fam Zheng, Peter Lieven; +Cc: qemu block, qemu-devel



On 20/03/2017 03:46, Fam Zheng wrote:
> On Fri, 03/17 12:20, Peter Lieven wrote:
>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>
>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>> It provides additional information, for example it works better with
>>>>> prealloc=metadata.
>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>> If the metadata is preallocated, cluster will (or should) show up as
>>> zero, speeding up the copy.
>>
>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>> So where does it actually help?
>>
>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>
>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>> BDRV_BLOCK_ZERO.
>>
>> This can only happen if I partially write to a cluster, or am I wrong here?
> 
> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
> know.

That's true of qcow2, but many formats (including raw :)) don't know
about BDRV_BLOCK_ZERO.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 11:21           ` Paolo Bonzini
@ 2017-03-20 11:49             ` Fam Zheng
  2017-03-20 12:17               ` Peter Lieven
  2017-03-20 12:47               ` Peter Lieven
  0 siblings, 2 replies; 23+ messages in thread
From: Fam Zheng @ 2017-03-20 11:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Peter Lieven, qemu-devel, qemu block

On Mon, 03/20 12:21, Paolo Bonzini wrote:
> 
> 
> On 20/03/2017 03:46, Fam Zheng wrote:
> > On Fri, 03/17 12:20, Peter Lieven wrote:
> >> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
> >>>
> >>> On 17/03/2017 12:11, Peter Lieven wrote:
> >>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
> >>>>> It provides additional information, for example it works better with
> >>>>> prealloc=metadata.
> >>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
> >>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
> >>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
> >>> If the metadata is preallocated, cluster will (or should) show up as
> >>> zero, speeding up the copy.
> >>
> >> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
> >> So where does it actually help?
> >>
> >> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
> >>
> >> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
> >> BDRV_BLOCK_ZERO.
> >>
> >> This can only happen if I partially write to a cluster, or am I wrong here?
> > 
> > I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
> > protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
> > know.
> 
> That's true of qcow2, but many formats (including raw :)) don't know
> about BDRV_BLOCK_ZERO.

Raw is a little special, it could have forwarded the call to *file in its
BlockDriver callback. Most formats with metadata stores zero/nonzero information
in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
zero/nonzero.

Fam

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 11:49             ` Fam Zheng
@ 2017-03-20 12:17               ` Peter Lieven
  2017-03-20 12:47               ` Peter Lieven
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Lieven @ 2017-03-20 12:17 UTC (permalink / raw)
  To: Fam Zheng, Paolo Bonzini; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 12:49 schrieb Fam Zheng:
> On Mon, 03/20 12:21, Paolo Bonzini wrote:
>>
>> On 20/03/2017 03:46, Fam Zheng wrote:
>>> On Fri, 03/17 12:20, Peter Lieven wrote:
>>>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>>>> It provides additional information, for example it works better with
>>>>>>> prealloc=metadata.
>>>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>>>> If the metadata is preallocated, cluster will (or should) show up as
>>>>> zero, speeding up the copy.
>>>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>>>> So where does it actually help?
>>>>
>>>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>>>
>>>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>>>> BDRV_BLOCK_ZERO.
>>>>
>>>> This can only happen if I partially write to a cluster, or am I wrong here?
>>> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
>>> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
>>> know.
>> That's true of qcow2, but many formats (including raw :)) don't know
>> about BDRV_BLOCK_ZERO.
> Raw is a little special, it could have forwarded the call to *file in its
> BlockDriver callback. Most formats with metadata stores zero/nonzero information

I thinks thats the check *file != bs for, right?


> in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
> zero/nonzero.

So what are the really Formats that we don't trust? And is it worth that
we add this (maybe expensive) callout to lseek via find_allocation for any
format altough we already know the answer from the metadata?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 11:49             ` Fam Zheng
  2017-03-20 12:17               ` Peter Lieven
@ 2017-03-20 12:47               ` Peter Lieven
  2017-03-20 13:13                 ` Peter Lieven
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-20 12:47 UTC (permalink / raw)
  To: Fam Zheng, Paolo Bonzini; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 12:49 schrieb Fam Zheng:
> On Mon, 03/20 12:21, Paolo Bonzini wrote:
>>
>> On 20/03/2017 03:46, Fam Zheng wrote:
>>> On Fri, 03/17 12:20, Peter Lieven wrote:
>>>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>>>> It provides additional information, for example it works better with
>>>>>>> prealloc=metadata.
>>>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>>>> If the metadata is preallocated, cluster will (or should) show up as
>>>>> zero, speeding up the copy.
>>>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>>>> So where does it actually help?
>>>>
>>>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>>>
>>>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>>>> BDRV_BLOCK_ZERO.
>>>>
>>>> This can only happen if I partially write to a cluster, or am I wrong here?
>>> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
>>> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
>>> know.
>> That's true of qcow2, but many formats (including raw :)) don't know
>> about BDRV_BLOCK_ZERO.
> Raw is a little special, it could have forwarded the call to *file in its
> BlockDriver callback. Most formats with metadata stores zero/nonzero information
> in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
> zero/nonzero.
>
> Fam


BTW, the extra check was added in


commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Sep 4 19:00:38 2013 +0200

    block: look for zero blocks in bs->file
   
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>


It was introduced while introducing bdv_get_block_status. I don't know what the real

issue was that was addressed with this patch?


Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 12:47               ` Peter Lieven
@ 2017-03-20 13:13                 ` Peter Lieven
  2017-03-20 13:23                   ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-20 13:13 UTC (permalink / raw)
  To: Fam Zheng, Paolo Bonzini; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 13:47 schrieb Peter Lieven:
> Am 20.03.2017 um 12:49 schrieb Fam Zheng:
>> On Mon, 03/20 12:21, Paolo Bonzini wrote:
>>> On 20/03/2017 03:46, Fam Zheng wrote:
>>>> On Fri, 03/17 12:20, Peter Lieven wrote:
>>>>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>>>>> It provides additional information, for example it works better with
>>>>>>>> prealloc=metadata.
>>>>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>>>>> If the metadata is preallocated, cluster will (or should) show up as
>>>>>> zero, speeding up the copy.
>>>>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>>>>> So where does it actually help?
>>>>>
>>>>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>>>>
>>>>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>>>>> BDRV_BLOCK_ZERO.
>>>>>
>>>>> This can only happen if I partially write to a cluster, or am I wrong here?
>>>> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
>>>> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
>>>> know.
>>> That's true of qcow2, but many formats (including raw :)) don't know
>>> about BDRV_BLOCK_ZERO.
>> Raw is a little special, it could have forwarded the call to *file in its
>> BlockDriver callback. Most formats with metadata stores zero/nonzero information
>> in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
>> zero/nonzero.
>>
>> Fam
>
> BTW, the extra check was added in
>
>
> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Sep 4 19:00:38 2013 +0200
>
>     block: look for zero blocks in bs->file
>    
>     Reviewed-by: Eric Blake <eblake@redhat.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
>
> It was introduced while introducing bdv_get_block_status. I don't know what the real
>
> issue was that was addressed with this patch?

Is it possible that this optimization was added especially for RAW? I was believing that
raw would forward the get_block_status call to bs->file, but it looks it doesn't.
If this one here was for RAW would it be an option to move this callout to the raw-format driver
and remove it from the generic code?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 13:13                 ` Peter Lieven
@ 2017-03-20 13:23                   ` Paolo Bonzini
  2017-03-20 13:35                     ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-20 13:23 UTC (permalink / raw)
  To: Peter Lieven, Fam Zheng; +Cc: qemu-devel, qemu block



On 20/03/2017 14:13, Peter Lieven wrote:
> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>> Am 20.03.2017 um 12:49 schrieb Fam Zheng:
>>> On Mon, 03/20 12:21, Paolo Bonzini wrote:
>>>> On 20/03/2017 03:46, Fam Zheng wrote:
>>>>> On Fri, 03/17 12:20, Peter Lieven wrote:
>>>>>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>>>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>>>>>> It provides additional information, for example it works better with
>>>>>>>>> prealloc=metadata.
>>>>>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>>>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>>>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>>>>>> If the metadata is preallocated, cluster will (or should) show up as
>>>>>>> zero, speeding up the copy.
>>>>>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>>>>>> So where does it actually help?
>>>>>>
>>>>>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>>>>>
>>>>>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>>>>>> BDRV_BLOCK_ZERO.
>>>>>>
>>>>>> This can only happen if I partially write to a cluster, or am I wrong here?
>>>>> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
>>>>> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
>>>>> know.
>>>> That's true of qcow2, but many formats (including raw :)) don't know
>>>> about BDRV_BLOCK_ZERO.
>>> Raw is a little special, it could have forwarded the call to *file in its
>>> BlockDriver callback. Most formats with metadata stores zero/nonzero information
>>> in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
>>> zero/nonzero.
>>>
>>> Fam
>>
>> BTW, the extra check was added in
>>
>>
>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>
>>     block: look for zero blocks in bs->file
>>    
>>     Reviewed-by: Eric Blake <eblake@redhat.com>
>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>
>>
>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>
>> issue was that was addressed with this patch?
> 
> Is it possible that this optimization was added especially for RAW? I was believing that
> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
> If this one here was for RAW would it be an option to move this callout to the raw-format driver
> and remove it from the generic code?

It was meant for both raw and qcow2.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 13:23                   ` Paolo Bonzini
@ 2017-03-20 13:35                     ` Peter Lieven
  2017-03-20 14:05                       ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-20 13:35 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 14:23 schrieb Paolo Bonzini:
>
> On 20/03/2017 14:13, Peter Lieven wrote:
>> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>>> Am 20.03.2017 um 12:49 schrieb Fam Zheng:
>>>> On Mon, 03/20 12:21, Paolo Bonzini wrote:
>>>>> On 20/03/2017 03:46, Fam Zheng wrote:
>>>>>> On Fri, 03/17 12:20, Peter Lieven wrote:
>>>>>>> Am 17.03.2017 um 12:16 schrieb Paolo Bonzini:
>>>>>>>> On 17/03/2017 12:11, Peter Lieven wrote:
>>>>>>>>>>> like VMDK or QCOW2 shouldn't we trust the information from the l2 tables in the VMDK or QCOW2?
>>>>>>>>>> It provides additional information, for example it works better with
>>>>>>>>>> prealloc=metadata.
>>>>>>>>> Okay, understood. Can you imagine of a away to conditionally avoid this second callout? In my case we have an additional
>>>>>>>>> lseek for each cluster. For a 20GB file this are approx. 327k calls to lseek. And if the file has no preallocated metadata
>>>>>>>>> it will likely not improve anything. And even if the metadata is prealloced what is the allocation status of the clusters?
>>>>>>>> If the metadata is preallocated, cluster will (or should) show up as
>>>>>>>> zero, speeding up the copy.
>>>>>>> Okay, in this case the second call out to *file will not happen. It only happens if the metadata says it contains data.
>>>>>>> So where does it actually help?
>>>>>>>
>>>>>>> The condition is: (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) && (ret & BDRV_BLOCK_OFFSET_VALID))
>>>>>>>
>>>>>>> So from my view it can only have any effect if the metadata returns BDRV_BLOCK_DATA, but the protocol driver returns
>>>>>>> BDRV_BLOCK_ZERO.
>>>>>>>
>>>>>>> This can only happen if I partially write to a cluster, or am I wrong here?
>>>>>> I think you have a point. The metadata should have said BDRV_BLOCK_ZERO if
>>>>>> protocol would say BDRV_BLOCK_ZERO - there is no reason the format driver cannot
>>>>>> know.
>>>>> That's true of qcow2, but many formats (including raw :)) don't know
>>>>> about BDRV_BLOCK_ZERO.
>>>> Raw is a little special, it could have forwarded the call to *file in its
>>>> BlockDriver callback. Most formats with metadata stores zero/nonzero information
>>>> in L1/L2 tables. For qcow2 and VMDK I think it's okay to just trust meta data on
>>>> zero/nonzero.
>>>>
>>>> Fam
>>> BTW, the extra check was added in
>>>
>>>
>>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>>
>>>     block: look for zero blocks in bs->file
>>>    
>>>     Reviewed-by: Eric Blake <eblake@redhat.com>
>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>
>>>
>>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>>
>>> issue was that was addressed with this patch?
>> Is it possible that this optimization was added especially for RAW? I was believing that
>> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
>> If this one here was for RAW would it be an option to move this callout to the raw-format driver
>> and remove it from the generic code?
> It was meant for both raw and qcow2.

Okay, but as Fam mentioned qcow2 Metadata should know that a cluster is zero. Do you remember
what the issue was?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 13:35                     ` Peter Lieven
@ 2017-03-20 14:05                       ` Paolo Bonzini
  2017-03-20 16:43                         ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-20 14:05 UTC (permalink / raw)
  To: Peter Lieven, Fam Zheng; +Cc: qemu-devel, qemu block



On 20/03/2017 14:35, Peter Lieven wrote:
> Am 20.03.2017 um 14:23 schrieb Paolo Bonzini:
>> On 20/03/2017 14:13, Peter Lieven wrote:
>>> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>>>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>>>
>>>>     block: look for zero blocks in bs->file
>>>>    
>>>>     Reviewed-by: Eric Blake <eblake@redhat.com>
>>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>
>>>>
>>>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>>>
>>>> issue was that was addressed with this patch?
>>> Is it possible that this optimization was added especially for RAW? I was believing that
>>> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
>>> If this one here was for RAW would it be an option to move this callout to the raw-format driver
>>> and remove it from the generic code?
>> It was meant for both raw and qcow2.
> 
> Okay, but as Fam mentioned qcow2 Metadata should know that a cluster is zero. Do you remember
> what the issue was?

I said that already---preallocated metadata.  Also, at the time
pre-qcow2v3 was more important.

Are you using libiscsi, block devices or files?

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 14:05                       ` Paolo Bonzini
@ 2017-03-20 16:43                         ` Peter Lieven
  2017-03-20 16:56                           ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-20 16:43 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 15:05 schrieb Paolo Bonzini:
>
> On 20/03/2017 14:35, Peter Lieven wrote:
>> Am 20.03.2017 um 14:23 schrieb Paolo Bonzini:
>>> On 20/03/2017 14:13, Peter Lieven wrote:
>>>> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>>>>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>>>>
>>>>>     block: look for zero blocks in bs->file
>>>>>    
>>>>>     Reviewed-by: Eric Blake <eblake@redhat.com>
>>>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>
>>>>>
>>>>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>>>>
>>>>> issue was that was addressed with this patch?
>>>> Is it possible that this optimization was added especially for RAW? I was believing that
>>>> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
>>>> If this one here was for RAW would it be an option to move this callout to the raw-format driver
>>>> and remove it from the generic code?
>>> It was meant for both raw and qcow2.
>> Okay, but as Fam mentioned qcow2 Metadata should know that a cluster is zero. Do you remember
>> what the issue was?
> I said that already---preallocated metadata.  Also, at the time
> pre-qcow2v3 was more important.

Yes, but Fam said that with preallocated metadata the clusters should be zero, or was that
not true before qcow2v3?

>
> Are you using libiscsi, block devices or files?

Its a mixture. raw with libiscsi or lvm and qcow2 and vmdk either with libnfs or on local storage.

I stumbled across the issue with lseek on a tmpfs because in the build process for our templates
I temporarily have vmdks on a tmpfs and it takes ages before qemu-img convert starts to run (it iterates
over every 64kb cluster with that callout to find_allocation and for some reason lseek is very slow on tmpfs).

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 16:43                         ` Peter Lieven
@ 2017-03-20 16:56                           ` Paolo Bonzini
  2017-03-27 13:21                             ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-20 16:56 UTC (permalink / raw)
  To: Peter Lieven, Fam Zheng; +Cc: qemu-devel, qemu block



On 20/03/2017 17:43, Peter Lieven wrote:
> Am 20.03.2017 um 15:05 schrieb Paolo Bonzini:
>>
>> On 20/03/2017 14:35, Peter Lieven wrote:
>>> Am 20.03.2017 um 14:23 schrieb Paolo Bonzini:
>>>> On 20/03/2017 14:13, Peter Lieven wrote:
>>>>> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>>>>>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>>>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>>>>>
>>>>>>     block: look for zero blocks in bs->file
>>>>>>    
>>>>>>     Reviewed-by: Eric Blake <eblake@redhat.com>
>>>>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>>>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>>
>>>>>>
>>>>>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>>>>>
>>>>>> issue was that was addressed with this patch?
>>>>> Is it possible that this optimization was added especially for RAW? I was believing that
>>>>> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
>>>>> If this one here was for RAW would it be an option to move this callout to the raw-format driver
>>>>> and remove it from the generic code?
>>>> It was meant for both raw and qcow2.
>>> Okay, but as Fam mentioned qcow2 Metadata should know that a cluster is zero. Do you remember
>>> what the issue was?
>> I said that already---preallocated metadata.  Also, at the time
>> pre-qcow2v3 was more important.
> 
> Yes, but Fam said that with preallocated metadata the clusters should be zero, or was that
> not true before qcow2v3?

Zero clusters didn't exist before qcow2v3 I think.

>> Are you using libiscsi, block devices or files?
> 
> Its a mixture. raw with libiscsi or lvm and qcow2 and vmdk either with libnfs or on local storage.
> 
> I stumbled across the issue with lseek on a tmpfs because in the build process for our templates
> I temporarily have vmdks on a tmpfs and it takes ages before qemu-img convert starts to run (it iterates
> over every 64kb cluster with that callout to find_allocation and for some reason lseek is very slow on tmpfs).

Ok, thanks.  Perhaps it's worth benchmarking tmpfs specifically.  Apart
from the system call overhead (which does not really matter if you're
going to do a read), lseek on other filesystems should not be any slower
than read.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-20 16:56                           ` Paolo Bonzini
@ 2017-03-27 13:21                             ` Peter Lieven
  2017-03-27 15:06                               ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-27 13:21 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: qemu-devel, qemu block

Am 20.03.2017 um 17:56 schrieb Paolo Bonzini:
>
> On 20/03/2017 17:43, Peter Lieven wrote:
>> Am 20.03.2017 um 15:05 schrieb Paolo Bonzini:
>>> On 20/03/2017 14:35, Peter Lieven wrote:
>>>> Am 20.03.2017 um 14:23 schrieb Paolo Bonzini:
>>>>> On 20/03/2017 14:13, Peter Lieven wrote:
>>>>>> Am 20.03.2017 um 13:47 schrieb Peter Lieven:
>>>>>>> commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>>>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>>>>> Date:   Wed Sep 4 19:00:38 2013 +0200
>>>>>>>
>>>>>>>      block: look for zero blocks in bs->file
>>>>>>>     
>>>>>>>      Reviewed-by: Eric Blake <eblake@redhat.com>
>>>>>>>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>>>>      Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>>>
>>>>>>>
>>>>>>> It was introduced while introducing bdv_get_block_status. I don't know what the real
>>>>>>>
>>>>>>> issue was that was addressed with this patch?
>>>>>> Is it possible that this optimization was added especially for RAW? I was believing that
>>>>>> raw would forward the get_block_status call to bs->file, but it looks it doesn't.
>>>>>> If this one here was for RAW would it be an option to move this callout to the raw-format driver
>>>>>> and remove it from the generic code?
>>>>> It was meant for both raw and qcow2.
>>>> Okay, but as Fam mentioned qcow2 Metadata should know that a cluster is zero. Do you remember
>>>> what the issue was?
>>> I said that already---preallocated metadata.  Also, at the time
>>> pre-qcow2v3 was more important.
>> Yes, but Fam said that with preallocated metadata the clusters should be zero, or was that
>> not true before qcow2v3?
> Zero clusters didn't exist before qcow2v3 I think.
>
>>> Are you using libiscsi, block devices or files?
>> Its a mixture. raw with libiscsi or lvm and qcow2 and vmdk either with libnfs or on local storage.
>>
>> I stumbled across the issue with lseek on a tmpfs because in the build process for our templates
>> I temporarily have vmdks on a tmpfs and it takes ages before qemu-img convert starts to run (it iterates
>> over every 64kb cluster with that callout to find_allocation and for some reason lseek is very slow on tmpfs).
> Ok, thanks.  Perhaps it's worth benchmarking tmpfs specifically.  Apart
> from the system call overhead (which does not really matter if you're
> going to do a read), lseek on other filesystems should not be any slower
> than read.

Okay, but the even the read is not really necessary if the metadata is correct?
Would it be an idea to introduce an inverse flag live BDRV_BLOCK_NOT_ZERO for
cases where we know that there is really DATA and thus can avoid the second callout?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-27 13:21                             ` Peter Lieven
@ 2017-03-27 15:06                               ` Paolo Bonzini
  2017-03-31  7:55                                 ` Peter Lieven
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-27 15:06 UTC (permalink / raw)
  To: Peter Lieven, Fam Zheng; +Cc: qemu-devel, qemu block



On 27/03/2017 15:21, Peter Lieven wrote:
>>>
>>> I stumbled across the issue with lseek on a tmpfs because in the
>>> build process for our templates
>>> I temporarily have vmdks on a tmpfs and it takes ages before qemu-img
>>> convert starts to run (it iterates
>>> over every 64kb cluster with that callout to find_allocation and for
>>> some reason lseek is very slow on tmpfs).
>> Ok, thanks.  Perhaps it's worth benchmarking tmpfs specifically.  Apart
>> from the system call overhead (which does not really matter if you're
>> going to do a read), lseek on other filesystems should not be any slower
>> than read.
> 
> Okay, but the even the read is not really necessary if the metadata is
> correct?

Yeah, what I mean is:

- if you're going to do a read of non-zero blocks, the lseek you do
before reading those blocks should not matter.

- if you're going to skip the read of non-zero blocks, the lseek you do
is always going to be faster than reading them and then checking with
buffer_is_nonzero.

> Would it be an idea to introduce an inverse flag live BDRV_BLOCK_NOT_ZERO
> for cases where we know that there is really DATA and thus can avoid the
> second callout?

How would you know that a block is nonzero?

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-27 15:06                               ` Paolo Bonzini
@ 2017-03-31  7:55                                 ` Peter Lieven
  2017-03-31 10:20                                   ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Lieven @ 2017-03-31  7:55 UTC (permalink / raw)
  To: Paolo Bonzini, Fam Zheng; +Cc: qemu-devel, qemu block

Am 27.03.2017 um 17:06 schrieb Paolo Bonzini:
>
> On 27/03/2017 15:21, Peter Lieven wrote:
>>>> I stumbled across the issue with lseek on a tmpfs because in the
>>>> build process for our templates
>>>> I temporarily have vmdks on a tmpfs and it takes ages before qemu-img
>>>> convert starts to run (it iterates
>>>> over every 64kb cluster with that callout to find_allocation and for
>>>> some reason lseek is very slow on tmpfs).
>>> Ok, thanks.  Perhaps it's worth benchmarking tmpfs specifically.  Apart
>>> from the system call overhead (which does not really matter if you're
>>> going to do a read), lseek on other filesystems should not be any slower
>>> than read.
>> Okay, but the even the read is not really necessary if the metadata is
>> correct?
> Yeah, what I mean is:
>
> - if you're going to do a read of non-zero blocks, the lseek you do
> before reading those blocks should not matter.
>
> - if you're going to skip the read of non-zero blocks, the lseek you do
> is always going to be faster than reading them and then checking with
> buffer_is_nonzero.
>
>> Would it be an idea to introduce an inverse flag live BDRV_BLOCK_NOT_ZERO
>> for cases where we know that there is really DATA and thus can avoid the
>> second callout?
> How would you know that a block is nonzero?

I would trust the metadata. At least for VMDK and QCOW2v3.
Bad idea?

Peter

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] callout to *file in bdrv_co_get_block_status
  2017-03-31  7:55                                 ` Peter Lieven
@ 2017-03-31 10:20                                   ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-03-31 10:20 UTC (permalink / raw)
  To: Peter Lieven, Fam Zheng; +Cc: qemu-devel, qemu block



On 31/03/2017 09:55, Peter Lieven wrote:
>>> Would it be an idea to introduce an inverse flag live BDRV_BLOCK_NOT_ZERO
>>> for cases where we know that there is really DATA and thus can avoid the
>>> second callout?
>> How would you know that a block is nonzero?
> I would trust the metadata. At least for VMDK and QCOW2v3.
> Bad idea?

The metadata only tells you that a block is zero, not that it's nonzero.
 What you are suggesting is really the same as removing the recursion.
However, I still haven't understood clearly if it's a QEMU or tmpfs bug;
if it's a tmpfs bug your suggestion would not fix tmpfs slowness on raw
images.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-03-31 10:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-17 10:45 [Qemu-devel] callout to *file in bdrv_co_get_block_status Peter Lieven
2017-03-17 10:59 ` Paolo Bonzini
2017-03-17 11:11   ` Peter Lieven
2017-03-17 11:16     ` Paolo Bonzini
2017-03-17 11:20       ` Peter Lieven
2017-03-20  2:46         ` Fam Zheng
2017-03-20 11:21           ` Paolo Bonzini
2017-03-20 11:49             ` Fam Zheng
2017-03-20 12:17               ` Peter Lieven
2017-03-20 12:47               ` Peter Lieven
2017-03-20 13:13                 ` Peter Lieven
2017-03-20 13:23                   ` Paolo Bonzini
2017-03-20 13:35                     ` Peter Lieven
2017-03-20 14:05                       ` Paolo Bonzini
2017-03-20 16:43                         ` Peter Lieven
2017-03-20 16:56                           ` Paolo Bonzini
2017-03-27 13:21                             ` Peter Lieven
2017-03-27 15:06                               ` Paolo Bonzini
2017-03-31  7:55                                 ` Peter Lieven
2017-03-31 10:20                                   ` Paolo Bonzini
2017-03-17 11:24       ` Fam Zheng
2017-03-17 14:51         ` Paolo Bonzini
2017-03-18 16:16           ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.