qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
@ 2021-04-16  5:23 Thomas Huth
  2021-04-16 20:34 ` Nir Soffer
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Huth @ 2021-04-16  5:23 UTC (permalink / raw)
  To: qemu-block, Kevin Wolf, Max Reitz
  Cc: Andrey Shinkevich, Viktor Mihajlovski, qemu-devel, Christian Borntraeger

A customer reported that running

 qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2

fails for them with the following error message when the images are
stored on a GPFS file system:

 qemu-img: error while writing sector 0: Invalid argument

After analyzing the strace output, it seems like the problem is in
handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
returns EINVAL, which can apparently happen if the file system has
a different idea of the granularity of the operation. It's arguably
a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
according to the man-page of fallocate(), but the file system is out
there in production and so we have to deal with it. In commit 294682cc3a
("block: workaround for unaligned byte range in fallocate()") we also
already applied the a work-around for the same problem to the earlier
fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
PUNCH_HOLE call.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 block/file-posix.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 20e14f8e96..7a40428d52 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
             }
             s->has_fallocate = false;
         } else if (ret != -ENOTSUP) {
+            if (ret == -EINVAL) {
+                /*
+                 * File systems like GPFS do not like unaligned byte ranges,
+                 * treat it like unsupported (so caller falls back to pwrite)
+                 */
+                return -ENOTSUP;
+            }
             return ret;
         } else {
             s->has_discard = false;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  2021-04-16  5:23 [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS Thomas Huth
@ 2021-04-16 20:34 ` Nir Soffer
  2021-04-19  5:06   ` Thomas Huth
  0 siblings, 1 reply; 6+ messages in thread
From: Nir Soffer @ 2021-04-16 20:34 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Kevin Wolf, qemu-block, Richard Jones, QEMU Developers,
	Max Reitz, Christian Borntraeger, Andrey Shinkevich,
	Viktor Mihajlovski

On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
>
> A customer reported that running
>
>  qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>
> fails for them with the following error message when the images are
> stored on a GPFS file system:
>
>  qemu-img: error while writing sector 0: Invalid argument
>
> After analyzing the strace output, it seems like the problem is in
> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
> returns EINVAL, which can apparently happen if the file system has
> a different idea of the granularity of the operation. It's arguably
> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
> according to the man-page of fallocate(), but the file system is out
> there in production and so we have to deal with it. In commit 294682cc3a
> ("block: workaround for unaligned byte range in fallocate()") we also
> already applied the a work-around for the same problem to the earlier
> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
> PUNCH_HOLE call.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  block/file-posix.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 20e14f8e96..7a40428d52 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>              }
>              s->has_fallocate = false;
>          } else if (ret != -ENOTSUP) {
> +            if (ret == -EINVAL) {
> +                /*
> +                 * File systems like GPFS do not like unaligned byte ranges,
> +                 * treat it like unsupported (so caller falls back to pwrite)
> +                 */
> +                return -ENOTSUP;

This skips the next fallback, using plain fallocate(0) if we write
after the end of the file. Is this intended?

We can treat the buggy EINVAL return value as "filesystem is buggy,
let's not try other options", or "let's try the next option". Since falling
back to actually writing zeroes is so much slower, I think it is better to
try the next option.

This issue affects also libnbd (nbdcopy file backend).

Do we have a bug for GFS?

Nir

> +            }
>              return ret;
>          } else {
>              s->has_discard = false;
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  2021-04-16 20:34 ` Nir Soffer
@ 2021-04-19  5:06   ` Thomas Huth
  2021-04-19 13:13     ` Kevin Wolf
  2021-05-19 10:21     ` Thomas Huth
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Huth @ 2021-04-19  5:06 UTC (permalink / raw)
  To: Nir Soffer
  Cc: Kevin Wolf, qemu-block, Richard Jones, QEMU Developers,
	Max Reitz, Christian Borntraeger, Andrey Shinkevich,
	Viktor Mihajlovski

On 16/04/2021 22.34, Nir Soffer wrote:
> On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
>>
>> A customer reported that running
>>
>>   qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>>
>> fails for them with the following error message when the images are
>> stored on a GPFS file system:
>>
>>   qemu-img: error while writing sector 0: Invalid argument
>>
>> After analyzing the strace output, it seems like the problem is in
>> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
>> returns EINVAL, which can apparently happen if the file system has
>> a different idea of the granularity of the operation. It's arguably
>> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
>> according to the man-page of fallocate(), but the file system is out
>> there in production and so we have to deal with it. In commit 294682cc3a
>> ("block: workaround for unaligned byte range in fallocate()") we also
>> already applied the a work-around for the same problem to the earlier
>> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
>> PUNCH_HOLE call.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   block/file-posix.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 20e14f8e96..7a40428d52 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>>               }
>>               s->has_fallocate = false;
>>           } else if (ret != -ENOTSUP) {
>> +            if (ret == -EINVAL) {
>> +                /*
>> +                 * File systems like GPFS do not like unaligned byte ranges,
>> +                 * treat it like unsupported (so caller falls back to pwrite)
>> +                 */
>> +                return -ENOTSUP;
> 
> This skips the next fallback, using plain fallocate(0) if we write
> after the end of the file. Is this intended?
> 
> We can treat the buggy EINVAL return value as "filesystem is buggy,
> let's not try other options", or "let's try the next option". Since falling
> back to actually writing zeroes is so much slower, I think it is better to
> try the next option.

I just did the same work-around as in commit 294682cc3a7 ... so if we agree 
to try the other options, too, we should change that spot, too...

However, what is not clear to me, how would you handle s->has_write_zeroes 
and s->has_discard in such a case? Set them to "false"? ... but it could 
still work for some blocks with different alignment ... but if we keep them 
set to "true", the code tries again and again to call these ioctls, maybe 
wasting other precious cycles for this?

Maybe we should do a different approach instead: In case we hit a EINVAL 
here, print an error a la:

  error_report_once("You are running on a buggy file system, please complain 
to the file system vendor");

and return -ENOTSUP ... then it's hopefully clear to the users why they are 
getting a bad performance, and that they should complain to the file system 
vendor instead to get their problem fixed.

> This issue affects also libnbd (nbdcopy file backend).
> 
> Do we have a bug for GFS?

The GPFS-related bug is:
https://bugzilla.redhat.com/show_bug.cgi?id=1944861

  Thomas



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  2021-04-19  5:06   ` Thomas Huth
@ 2021-04-19 13:13     ` Kevin Wolf
  2021-05-19 10:21     ` Thomas Huth
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2021-04-19 13:13 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-block, Richard Jones, QEMU Developers, Max Reitz,
	Nir Soffer, Christian Borntraeger, Andrey Shinkevich,
	Viktor Mihajlovski

Am 19.04.2021 um 07:06 hat Thomas Huth geschrieben:
> On 16/04/2021 22.34, Nir Soffer wrote:
> > On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
> > > 
> > > A customer reported that running
> > > 
> > >   qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
> > > 
> > > fails for them with the following error message when the images are
> > > stored on a GPFS file system:
> > > 
> > >   qemu-img: error while writing sector 0: Invalid argument
> > > 
> > > After analyzing the strace output, it seems like the problem is in
> > > handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
> > > returns EINVAL, which can apparently happen if the file system has
> > > a different idea of the granularity of the operation. It's arguably
> > > a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
> > > according to the man-page of fallocate(), but the file system is out
> > > there in production and so we have to deal with it. In commit 294682cc3a
> > > ("block: workaround for unaligned byte range in fallocate()") we also
> > > already applied the a work-around for the same problem to the earlier
> > > fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
> > > PUNCH_HOLE call.
> > > 
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > ---
> > >   block/file-posix.c | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 20e14f8e96..7a40428d52 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
> > >               }
> > >               s->has_fallocate = false;
> > >           } else if (ret != -ENOTSUP) {
> > > +            if (ret == -EINVAL) {
> > > +                /*
> > > +                 * File systems like GPFS do not like unaligned byte ranges,
> > > +                 * treat it like unsupported (so caller falls back to pwrite)
> > > +                 */
> > > +                return -ENOTSUP;
> > 
> > This skips the next fallback, using plain fallocate(0) if we write
> > after the end of the file. Is this intended?
> > 
> > We can treat the buggy EINVAL return value as "filesystem is buggy,
> > let's not try other options", or "let's try the next option". Since falling
> > back to actually writing zeroes is so much slower, I think it is better to
> > try the next option.
> 
> I just did the same work-around as in commit 294682cc3a7 ... so if we agree
> to try the other options, too, we should change that spot, too...

Yes, changing both places to fall back to the next option feels right to
me.

> However, what is not clear to me, how would you handle s->has_write_zeroes
> and s->has_discard in such a case? Set them to "false"? ... but it could
> still work for some blocks with different alignment ... but if we keep them
> set to "true", the code tries again and again to call these ioctls, maybe
> wasting other precious cycles for this?

That it could still work for other requests is a good point. So I think
EINVAL shouldn't disable s->has_*, but otherwise behave the same as
ENOTSUP.

You're right that we're potentially wasting cycles for trying unaligned
requests again and again, but that they fail isn't our fault and the
benefit of having efficient zero writes with aligned requests seems more
important that losing a few cycles on unaligned requests.

> Maybe we should do a different approach instead: In case we hit a EINVAL
> here, print an error a la:
> 
>  error_report_once("You are running on a buggy file system, please complain
> to the file system vendor");
> 
> and return -ENOTSUP ... then it's hopefully clear to the users why they are
> getting a bad performance, and that they should complain to the file system
> vendor instead to get their problem fixed.

Sounds like a reasonable thing to do (probably in addition to the above)
when we know that a file system bug prevents us from getting optimal
performance.

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  2021-04-19  5:06   ` Thomas Huth
  2021-04-19 13:13     ` Kevin Wolf
@ 2021-05-19 10:21     ` Thomas Huth
  2021-05-19 10:41       ` Thomas Huth
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Huth @ 2021-05-19 10:21 UTC (permalink / raw)
  To: Nir Soffer, qemu-block, Kevin Wolf, Max Reitz
  Cc: Christian Borntraeger, Viktor Mihajlovski, Andrey Shinkevich,
	Richard Jones, QEMU Developers

On 19/04/2021 07.06, Thomas Huth wrote:
> On 16/04/2021 22.34, Nir Soffer wrote:
>> On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
>>>
>>> A customer reported that running
>>>
>>>   qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>>>
>>> fails for them with the following error message when the images are
>>> stored on a GPFS file system:
>>>
>>>   qemu-img: error while writing sector 0: Invalid argument
>>>
>>> After analyzing the strace output, it seems like the problem is in
>>> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
>>> returns EINVAL, which can apparently happen if the file system has
>>> a different idea of the granularity of the operation. It's arguably
>>> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
>>> according to the man-page of fallocate(), but the file system is out
>>> there in production and so we have to deal with it. In commit 294682cc3a
>>> ("block: workaround for unaligned byte range in fallocate()") we also
>>> already applied the a work-around for the same problem to the earlier
>>> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
>>> PUNCH_HOLE call.
>>>
>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>> ---
>>>   block/file-posix.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index 20e14f8e96..7a40428d52 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>>>               }
>>>               s->has_fallocate = false;
>>>           } else if (ret != -ENOTSUP) {
>>> +            if (ret == -EINVAL) {
>>> +                /*
>>> +                 * File systems like GPFS do not like unaligned byte 
>>> ranges,
>>> +                 * treat it like unsupported (so caller falls back to 
>>> pwrite)
>>> +                 */
>>> +                return -ENOTSUP;
>>
>> This skips the next fallback, using plain fallocate(0) if we write
>> after the end of the file. Is this intended?
>>
>> We can treat the buggy EINVAL return value as "filesystem is buggy,
>> let's not try other options", or "let's try the next option". Since falling
>> back to actually writing zeroes is so much slower, I think it is better to
>> try the next option.
> 
> I just did the same work-around as in commit 294682cc3a7 ... so if we agree 
> to try the other options, too, we should change that spot, too...
> 
> However, what is not clear to me, how would you handle s->has_write_zeroes 
> and s->has_discard in such a case? Set them to "false"? ... but it could 
> still work for some blocks with different alignment ... but if we keep them 
> set to "true", the code tries again and again to call these ioctls, maybe 
> wasting other precious cycles for this?
> 
> Maybe we should do a different approach instead: In case we hit a EINVAL 
> here, print an error a la:
> 
> error_report_once("You are running on a buggy file system, please complain 
> to the file system vendor");
> 
> and return -ENOTSUP ... then it's hopefully clear to the users why they are 
> getting a bad performance, and that they should complain to the file system 
> vendor instead to get their problem fixed.

Ping!

Any recommendations how to proceed here?

  Thomas



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  2021-05-19 10:21     ` Thomas Huth
@ 2021-05-19 10:41       ` Thomas Huth
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Huth @ 2021-05-19 10:41 UTC (permalink / raw)
  To: Nir Soffer, qemu-block, Kevin Wolf, Max Reitz
  Cc: Christian Borntraeger, Viktor Mihajlovski, Andrey Shinkevich,
	Richard Jones, QEMU Developers

On 19/05/2021 12.21, Thomas Huth wrote:
> On 19/04/2021 07.06, Thomas Huth wrote:
>> On 16/04/2021 22.34, Nir Soffer wrote:
>>> On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
>>>>
>>>> A customer reported that running
>>>>
>>>>   qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>>>>
>>>> fails for them with the following error message when the images are
>>>> stored on a GPFS file system:
>>>>
>>>>   qemu-img: error while writing sector 0: Invalid argument
>>>>
>>>> After analyzing the strace output, it seems like the problem is in
>>>> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
>>>> returns EINVAL, which can apparently happen if the file system has
>>>> a different idea of the granularity of the operation. It's arguably
>>>> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
>>>> according to the man-page of fallocate(), but the file system is out
>>>> there in production and so we have to deal with it. In commit 294682cc3a
>>>> ("block: workaround for unaligned byte range in fallocate()") we also
>>>> already applied the a work-around for the same problem to the earlier
>>>> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
>>>> PUNCH_HOLE call.
>>>>
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> ---
>>>>   block/file-posix.c | 7 +++++++
>>>>   1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>> index 20e14f8e96..7a40428d52 100644
>>>> --- a/block/file-posix.c
>>>> +++ b/block/file-posix.c
>>>> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>>>>               }
>>>>               s->has_fallocate = false;
>>>>           } else if (ret != -ENOTSUP) {
>>>> +            if (ret == -EINVAL) {
>>>> +                /*
>>>> +                 * File systems like GPFS do not like unaligned byte 
>>>> ranges,
>>>> +                 * treat it like unsupported (so caller falls back to 
>>>> pwrite)
>>>> +                 */
>>>> +                return -ENOTSUP;
>>>
>>> This skips the next fallback, using plain fallocate(0) if we write
>>> after the end of the file. Is this intended?
>>>
>>> We can treat the buggy EINVAL return value as "filesystem is buggy,
>>> let's not try other options", or "let's try the next option". Since falling
>>> back to actually writing zeroes is so much slower, I think it is better to
>>> try the next option.
>>
>> I just did the same work-around as in commit 294682cc3a7 ... so if we 
>> agree to try the other options, too, we should change that spot, too...
>>
>> However, what is not clear to me, how would you handle s->has_write_zeroes 
>> and s->has_discard in such a case? Set them to "false"? ... but it could 
>> still work for some blocks with different alignment ... but if we keep 
>> them set to "true", the code tries again and again to call these ioctls, 
>> maybe wasting other precious cycles for this?
>>
>> Maybe we should do a different approach instead: In case we hit a EINVAL 
>> here, print an error a la:
>>
>> error_report_once("You are running on a buggy file system, please complain 
>> to the file system vendor");
>>
>> and return -ENOTSUP ... then it's hopefully clear to the users why they 
>> are getting a bad performance, and that they should complain to the file 
>> system vendor instead to get their problem fixed.
> 
> Ping!
> 
> Any recommendations how to proceed here?

Never mind, Kevin just told me that he already replied - but apparently that 
mail did not make it into my "qemu-devel" folder, so I did not see it :-( 
... Anyway, I'll try to rework my patch accordingly:

  https://lists.gnu.org/archive/html/qemu-block/2021-04/msg00488.html

  Thomas



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-19 10:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16  5:23 [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS Thomas Huth
2021-04-16 20:34 ` Nir Soffer
2021-04-19  5:06   ` Thomas Huth
2021-04-19 13:13     ` Kevin Wolf
2021-05-19 10:21     ` Thomas Huth
2021-05-19 10:41       ` Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).