qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Potential regression in 'qemu-img convert' to LVM
@ 2020-09-14 12:25 Stefan Reiter
  2020-09-15  9:08 ` Nir Soffer
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Reiter @ 2020-09-14 12:25 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel

Hi list,

following command fails since 5.1 (tested on kernel 5.4.60):

# qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1
qemu-img: error while writing at byte 2157968896: Device or resource busy

(source is ZFS here, but doesn't matter in practice, it always fails the 
same; offset changes slightly but consistently hovers around 2^31)

strace shows the following:
fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, 
4608) = -1 EBUSY (Device or resource busy)

Other fallocate calls leading up to this work fine.

This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero 
images", before that all fallocates happened at the start. Reverting the 
commit and calling qemu-img exactly the same way on the same data works 
fine. Simply retrying the syscall on EBUSY (like EINTR) does *not* work, 
once it fails it keeps failing with the same error.

I couldn't find anything related to EBUSY on fallocate, and it only 
happens on LVM targets... Any idea or pointers where to look?

~ Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Potential regression in 'qemu-img convert' to LVM
  2020-09-14 12:25 Potential regression in 'qemu-img convert' to LVM Stefan Reiter
@ 2020-09-15  9:08 ` Nir Soffer
  2020-09-15 11:51   ` Stefan Reiter
  0 siblings, 1 reply; 5+ messages in thread
From: Nir Soffer @ 2020-09-15  9:08 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: QEMU Developers, qemu-block

On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>
> Hi list,
>
> following command fails since 5.1 (tested on kernel 5.4.60):
>
> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1
> qemu-img: error while writing at byte 2157968896: Device or resource busy
>
> (source is ZFS here, but doesn't matter in practice, it always fails the
> same; offset changes slightly but consistently hovers around 2^31)
>
> strace shows the following:
> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896,
> 4608) = -1 EBUSY (Device or resource busy)

What is the size of the LV?

Does it happen if you change sparse minimum size (-S)?

For example: -S 64k

    qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1
/dev/vg/disk-1

> Other fallocate calls leading up to this work fine.
>
> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero
> images", before that all fallocates happened at the start. Reverting the
> commit and calling qemu-img exactly the same way on the same data works
> fine.

But slowly, doing up to 100% more work for fully allocated images.

> Simply retrying the syscall on EBUSY (like EINTR) does *not* work,
> once it fails it keeps failing with the same error.
>
> I couldn't find anything related to EBUSY on fallocate, and it only
> happens on LVM targets... Any idea or pointers where to look?

Is this thin LV?

This works for us using regular LVs.

Which kernel? which distro?

Nir



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Potential regression in 'qemu-img convert' to LVM
  2020-09-15  9:08 ` Nir Soffer
@ 2020-09-15 11:51   ` Stefan Reiter
  2021-01-07 20:03     ` Nir Soffer
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Reiter @ 2020-09-15 11:51 UTC (permalink / raw)
  To: Nir Soffer; +Cc: QEMU Developers, qemu-block

On 9/15/20 11:08 AM, Nir Soffer wrote:
> On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>>
>> Hi list,
>>
>> following command fails since 5.1 (tested on kernel 5.4.60):
>>
>> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1
>> qemu-img: error while writing at byte 2157968896: Device or resource busy
>>
>> (source is ZFS here, but doesn't matter in practice, it always fails the
>> same; offset changes slightly but consistently hovers around 2^31)
>>
>> strace shows the following:
>> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896,
>> 4608) = -1 EBUSY (Device or resource busy)
> 
> What is the size of the LV?
> 

Same as the source, 5GB in my test case. Created with:

# lvcreate -ay --size 5242880k --name disk-1 vg

> Does it happen if you change sparse minimum size (-S)?
> 
> For example: -S 64k
> 
>      qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1
> /dev/vg/disk-1
> 

Tried a few different values, always the same result: EBUSY at byte 
2157968896.

>> Other fallocate calls leading up to this work fine.
>>
>> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero
>> images", before that all fallocates happened at the start. Reverting the
>> commit and calling qemu-img exactly the same way on the same data works
>> fine.
> 
> But slowly, doing up to 100% more work for fully allocated images.
> 

Of course, I'm not saying the patch is wrong, reverting it just avoids 
triggering the bug.

>> Simply retrying the syscall on EBUSY (like EINTR) does *not* work,
>> once it fails it keeps failing with the same error.
>>
>> I couldn't find anything related to EBUSY on fallocate, and it only
>> happens on LVM targets... Any idea or pointers where to look?
> 
> Is this thin LV?
> 

No, regular LV. See command above.

> This works for us using regular LVs.
> 
> Which kernel? which distro?
> 

Reproducible on:
* PVE w/ kernel 5.4.60 (Ubuntu based)
* Manjaro w/ kernel 5.8.6

I found that it does not happen with all images, I suppose there must be 
a certain number of smaller holes for it to happen. I am using a VM 
image with a bare-bones Alpine Linux installation, but it's not an 
isolated case, we've had two people report the issue on our bug tracker: 
https://bugzilla.proxmox.com/show_bug.cgi?id=3002

Thanks,
Stefan

> Nir
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Potential regression in 'qemu-img convert' to LVM
  2020-09-15 11:51   ` Stefan Reiter
@ 2021-01-07 20:03     ` Nir Soffer
  2021-03-04 16:07       ` Stefan Reiter
  0 siblings, 1 reply; 5+ messages in thread
From: Nir Soffer @ 2021-01-07 20:03 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: QEMU Developers, qemu-block, Maxim Levitsky

On Tue, Sep 15, 2020 at 2:51 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>
> On 9/15/20 11:08 AM, Nir Soffer wrote:
> > On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
> >>
> >> Hi list,
> >>
> >> following command fails since 5.1 (tested on kernel 5.4.60):
> >>
> >> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1
> >> qemu-img: error while writing at byte 2157968896: Device or resource busy
> >>
> >> (source is ZFS here, but doesn't matter in practice, it always fails the
> >> same; offset changes slightly but consistently hovers around 2^31)
> >>
> >> strace shows the following:
> >> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896,
> >> 4608) = -1 EBUSY (Device or resource busy)
> >
> > What is the size of the LV?
> >
>
> Same as the source, 5GB in my test case. Created with:
>
> # lvcreate -ay --size 5242880k --name disk-1 vg
>
> > Does it happen if you change sparse minimum size (-S)?
> >
> > For example: -S 64k
> >
> >      qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1
> > /dev/vg/disk-1
> >
>
> Tried a few different values, always the same result: EBUSY at byte
> 2157968896.
>
> >> Other fallocate calls leading up to this work fine.
> >>
> >> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero
> >> images", before that all fallocates happened at the start. Reverting the
> >> commit and calling qemu-img exactly the same way on the same data works
> >> fine.
> >
> > But slowly, doing up to 100% more work for fully allocated images.
> >
>
> Of course, I'm not saying the patch is wrong, reverting it just avoids
> triggering the bug.
>
> >> Simply retrying the syscall on EBUSY (like EINTR) does *not* work,
> >> once it fails it keeps failing with the same error.
> >>
> >> I couldn't find anything related to EBUSY on fallocate, and it only
> >> happens on LVM targets... Any idea or pointers where to look?
> >
> > Is this thin LV?
> >
>
> No, regular LV. See command above.
>
> > This works for us using regular LVs.
> >
> > Which kernel? which distro?
> >
>
> Reproducible on:
> * PVE w/ kernel 5.4.60 (Ubuntu based)
> * Manjaro w/ kernel 5.8.6
>
> I found that it does not happen with all images, I suppose there must be
> a certain number of smaller holes for it to happen. I am using a VM
> image with a bare-bones Alpine Linux installation, but it's not an
> isolated case, we've had two people report the issue on our bug tracker:
> https://bugzilla.proxmox.com/show_bug.cgi?id=3002

I think that this issue may be fixed by
https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00358.html

Nir



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Potential regression in 'qemu-img convert' to LVM
  2021-01-07 20:03     ` Nir Soffer
@ 2021-03-04 16:07       ` Stefan Reiter
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Reiter @ 2021-03-04 16:07 UTC (permalink / raw)
  To: Nir Soffer; +Cc: QEMU Developers, qemu-block, Maxim Levitsky

On 07/01/2021 21:03, Nir Soffer wrote:
> On Tue, Sep 15, 2020 at 2:51 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>>
>> On 9/15/20 11:08 AM, Nir Soffer wrote:
>>> On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>>>>
>>>> Hi list,
>>>>
>>>> following command fails since 5.1 (tested on kernel 5.4.60):
>>>>
>>>> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1
>>>> qemu-img: error while writing at byte 2157968896: Device or resource busy
>>>>
>>>> (source is ZFS here, but doesn't matter in practice, it always fails the
>>>> same; offset changes slightly but consistently hovers around 2^31)
>>>>
>>>> strace shows the following:
>>>> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896,
>>>> 4608) = -1 EBUSY (Device or resource busy)
>>>
>>> What is the size of the LV?
>>>
>>
>> Same as the source, 5GB in my test case. Created with:
>>
>> # lvcreate -ay --size 5242880k --name disk-1 vg
>>
>>> Does it happen if you change sparse minimum size (-S)?
>>>
>>> For example: -S 64k
>>>
>>>       qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1
>>> /dev/vg/disk-1
>>>
>>
>> Tried a few different values, always the same result: EBUSY at byte
>> 2157968896.
>>
>>>> Other fallocate calls leading up to this work fine.
>>>>
>>>> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero
>>>> images", before that all fallocates happened at the start. Reverting the
>>>> commit and calling qemu-img exactly the same way on the same data works
>>>> fine.
>>>
>>> But slowly, doing up to 100% more work for fully allocated images.
>>>
>>
>> Of course, I'm not saying the patch is wrong, reverting it just avoids
>> triggering the bug.
>>
>>>> Simply retrying the syscall on EBUSY (like EINTR) does *not* work,
>>>> once it fails it keeps failing with the same error.
>>>>
>>>> I couldn't find anything related to EBUSY on fallocate, and it only
>>>> happens on LVM targets... Any idea or pointers where to look?
>>>
>>> Is this thin LV?
>>>
>>
>> No, regular LV. See command above.
>>
>>> This works for us using regular LVs.
>>>
>>> Which kernel? which distro?
>>>
>>
>> Reproducible on:
>> * PVE w/ kernel 5.4.60 (Ubuntu based)
>> * Manjaro w/ kernel 5.8.6
>>
>> I found that it does not happen with all images, I suppose there must be
>> a certain number of smaller holes for it to happen. I am using a VM
>> image with a bare-bones Alpine Linux installation, but it's not an
>> isolated case, we've had two people report the issue on our bug tracker:
>> https://bugzilla.proxmox.com/show_bug.cgi?id=3002
> 
> I think that this issue may be fixed by
> https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00358.html
> 
> Nir
> 
> 

Sorry for the late reply, but yes, I can confirm this fixes the issue.

~



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-04 16:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 12:25 Potential regression in 'qemu-img convert' to LVM Stefan Reiter
2020-09-15  9:08 ` Nir Soffer
2020-09-15 11:51   ` Stefan Reiter
2021-01-07 20:03     ` Nir Soffer
2021-03-04 16:07       ` Stefan Reiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).