qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Problems with c8bb23cbdbe3 on ppc64le
@ 2019-10-10 15:17 Max Reitz
  2019-10-10 16:15 ` Anton Nefedov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Max Reitz @ 2019-10-10 15:17 UTC (permalink / raw)
  To: Qemu-block
  Cc: Alberto Garcia, Anton Nefedov, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 591 bytes --]

Hi everyone,

(CCs just based on tags in the commit in question)

I have two bug reports which claim problems of qcow2 on XFS on ppc64le
machines since qemu 4.1.0.  One of those is about bad performance
(sorry, is isn’t public :-/), the other about data corruption
(https://bugzilla.redhat.com/show_bug.cgi?id=1751934).

It looks like in both cases reverting c8bb23cbdbe3 solves the problem
(which optimized COW of unallocated areas).

I think I’ve looked at every angle but can‘t find what could be wrong
with it.  Do any of you have any idea? :-/


Thanks,

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
@ 2019-10-10 16:15 ` Anton Nefedov
  2019-10-11  7:49   ` Max Reitz
  2019-10-21 13:33 ` Max Reitz
  2019-10-24  9:08 ` Max Reitz
  2 siblings, 1 reply; 7+ messages in thread
From: Anton Nefedov @ 2019-10-10 16:15 UTC (permalink / raw)
  To: Max Reitz, Qemu-block
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel

On 10/10/2019 6:17 PM, Max Reitz wrote:
> Hi everyone,
> 
> (CCs just based on tags in the commit in question)
> 
> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
> machines since qemu 4.1.0.  One of those is about bad performance
> (sorry, is isn’t public :-/), the other about data corruption
> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
> 
> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
> (which optimized COW of unallocated areas).
> 
> I think I’ve looked at every angle but can‘t find what could be wrong
> with it.  Do any of you have any idea? :-/
> 

hi,

oh, that patch strikes again..

I don't quite follow, was this bug confirmed to happen on x86? Comment 8
(https://bugzilla.redhat.com/show_bug.cgi?id=1751934#c8) mentioned that
(or was that mixed up with the old xfsctl bug?)

Regardless of the platform, does it reproduce? That's comforting
already; worst case we can trace each and every request then (unless it
will stop to reproduce this way).

Also, perhaps it's worth to try to replace fallocate with write(0)?
Either in qcow2 (in the patch, bdrv_co_pwrite_zeroes -> bdrv_co_pwritev)
or in the file driver. It might hint whether it's misbehaving fallocate
(in qemu or in kernel) or something else.

/Anton

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-10 16:15 ` Anton Nefedov
@ 2019-10-11  7:49   ` Max Reitz
  2019-10-21 11:40     ` Max Reitz
  0 siblings, 1 reply; 7+ messages in thread
From: Max Reitz @ 2019-10-11  7:49 UTC (permalink / raw)
  To: Anton Nefedov, Qemu-block
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1730 bytes --]

On 10.10.19 18:15, Anton Nefedov wrote:
> On 10/10/2019 6:17 PM, Max Reitz wrote:
>> Hi everyone,
>>
>> (CCs just based on tags in the commit in question)
>>
>> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
>> machines since qemu 4.1.0.  One of those is about bad performance
>> (sorry, is isn’t public :-/), the other about data corruption
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
>>
>> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
>> (which optimized COW of unallocated areas).
>>
>> I think I’ve looked at every angle but can‘t find what could be wrong
>> with it.  Do any of you have any idea? :-/
>>
> 
> hi,
> 
> oh, that patch strikes again..
> 
> I don't quite follow, was this bug confirmed to happen on x86? Comment 8
> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934#c8) mentioned that
> (or was that mixed up with the old xfsctl bug?)

I think that was mixed up with the xfsctl bug, yes.

> Regardless of the platform, does it reproduce? That's comforting
> already; worst case we can trace each and every request then (unless it
> will stop to reproduce this way).

I haven’t been able to reproduce it yet (wrestling with the test system
and getting ppc64 machines provisioned), but as far as I know it
reproduces reliably on ppc64, but only there.

> Also, perhaps it's worth to try to replace fallocate with write(0)?
> Either in qcow2 (in the patch, bdrv_co_pwrite_zeroes -> bdrv_co_pwritev)
> or in the file driver. It might hint whether it's misbehaving fallocate
> (in qemu or in kernel) or something else.

Good idea, that should at least tell us something about the corruption.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-11  7:49   ` Max Reitz
@ 2019-10-21 11:40     ` Max Reitz
  0 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2019-10-21 11:40 UTC (permalink / raw)
  To: Anton Nefedov, Qemu-block
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2764 bytes --]

On 11.10.19 09:49, Max Reitz wrote:
> On 10.10.19 18:15, Anton Nefedov wrote:
>> On 10/10/2019 6:17 PM, Max Reitz wrote:
>>> Hi everyone,
>>>
>>> (CCs just based on tags in the commit in question)
>>>
>>> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
>>> machines since qemu 4.1.0.  One of those is about bad performance
>>> (sorry, is isn’t public :-/), the other about data corruption
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
>>>
>>> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
>>> (which optimized COW of unallocated areas).
>>>
>>> I think I’ve looked at every angle but can‘t find what could be wrong
>>> with it.  Do any of you have any idea? :-/
>>>
>>
>> hi,
>>
>> oh, that patch strikes again..
>>
>> I don't quite follow, was this bug confirmed to happen on x86? Comment 8
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934#c8) mentioned that
>> (or was that mixed up with the old xfsctl bug?)
> 
> I think that was mixed up with the xfsctl bug, yes.
> 
>> Regardless of the platform, does it reproduce? That's comforting
>> already; worst case we can trace each and every request then (unless it
>> will stop to reproduce this way).
> 
> I haven’t been able to reproduce it yet (wrestling with the test system
> and getting ppc64 machines provisioned), but as far as I know it
> reproduces reliably on ppc64, but only there.
> 
>> Also, perhaps it's worth to try to replace fallocate with write(0)?
>> Either in qcow2 (in the patch, bdrv_co_pwrite_zeroes -> bdrv_co_pwritev)
>> or in the file driver. It might hint whether it's misbehaving fallocate
>> (in qemu or in kernel) or something else.
> 
> Good idea, that should at least tell us something about the corruption.

OK, after a week of debugging I’m not really much wiser.

One thing I know is that I can see the issue on x86-64 now, but not on
ext4, only XFS.

Replacing the zero-write with actually writing zeroes fixes it, but I
still don’t know whether that’s because of the kernel or because the
write is just slower or takes another code path...

The only thing I could narrow it down to is this:

The issue persists if handle_alloc_space() writes zeroes (with a
narrowed aligned zero-write with NO_FALLBACK) only to the non-COW area,
and I keep skip_cow to be false.

So there seems to be some kind of interaction between the zero-write and
the following write of data.  I don’t know what kind of interaction that
is, though.  I have tried to write a test case in qemu-img (basically
rewriting qemu-img bench), but failed so far.

It certainly looks like a kernel issue, but without a simpler reproducer
I just cannot tell.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
  2019-10-10 16:15 ` Anton Nefedov
@ 2019-10-21 13:33 ` Max Reitz
  2019-10-21 16:24   ` Max Reitz
  2019-10-24  9:08 ` Max Reitz
  2 siblings, 1 reply; 7+ messages in thread
From: Max Reitz @ 2019-10-21 13:33 UTC (permalink / raw)
  To: Qemu-block
  Cc: Alberto Garcia, Anton Nefedov, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2580 bytes --]

On 10.10.19 17:17, Max Reitz wrote:
> Hi everyone,
> 
> (CCs just based on tags in the commit in question)
> 
> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
> machines since qemu 4.1.0.  One of those is about bad performance
> (sorry, is isn’t public :-/), the other about data corruption
> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
> 
> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
> (which optimized COW of unallocated areas).
> 
> I think I’ve looked at every angle but can‘t find what could be wrong
> with it.  Do any of you have any idea? :-/

I now have a reproducer with CentOS, so it’s actually useful outside of
Red Hat:

$ cd $TEST_DIR

(Download CentOS-8-x86_64-1905-dvd1.iso here, e.g. from
http://mirror1.hs-esslingen.de/pub/Mirrors/centos/8.0.1905/isos/x86_64/CentOS-8-x86_64-1905-dvd1.torrent
)

$ wget
http://mirror1.hs-esslingen.de/pub/Mirrors/centos/8.0.1905/BaseOS/x86_64/os/isolinux/vmlinuz
$ wget
http://mirror1.hs-esslingen.de/pub/Mirrors/centos/8.0.1905/BaseOS/x86_64/os/isolinux/initrd.img

$ mkdir ks
$ cat > ks/ks.cfg <<EOF
rootpw 123456
install
keyboard us
lang en_US.UTF-8
rootpw 123456
zerombr
autopart
clearpart --all --initlabel

%packages --default
@core
%end
EOF

$ $QEMU_BUILD_DIR/qemu-img create -f qcow2 disk.qcow2 30G
$ $QEMU_BUILD_DIR/x86_64-softmmu/qemu-system-x86_64 \
    -name 'centos' \
    -machine pc \
    -nodefaults \
    -vga std \
    -display gtk \
    -serial stdio \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0 \
    -blockdev "{'node-name': 'target-disk',
                'driver': 'qcow2',
                'file': {
                    'driver': 'file',
                    'filename': 'disk.qcow2',
                    'cache': {
                        'direct': true
                    },
                    'aio': 'native'
                } }" \
    -device scsi-hd,id=image1,drive=target-disk \
    -blockdev \
     file,node-name=install-cd,filename=CentOS-8-x86_64-1905-dvd1.iso \
    -device scsi-cd,drive=install-cd \
    -blockdev vvfat,node-name=kscfg,dir=ks,label=OEMDRV,read-only=on \
    -device scsi-hd,drive=kscfg \
    -kernel vmlinuz \
    -append 'ks=hd:LABEL=OEMDRV:/ks.cfg delay=60 console=ttyS0' \
    -initrd initrd.img \
    -boot order=cd,menu=off,strict=off \
    -m 2048 \
    -enable-kvm

This installation fails about 50/50 for me.  To retry, just run the last
two steps (qemu-img create and the installation itself).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-21 13:33 ` Max Reitz
@ 2019-10-21 16:24   ` Max Reitz
  0 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2019-10-21 16:24 UTC (permalink / raw)
  To: Qemu-block
  Cc: Alberto Garcia, Anton Nefedov, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1395 bytes --]

On 21.10.19 15:33, Max Reitz wrote:
> On 10.10.19 17:17, Max Reitz wrote:
>> Hi everyone,
>>
>> (CCs just based on tags in the commit in question)
>>
>> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
>> machines since qemu 4.1.0.  One of those is about bad performance
>> (sorry, is isn’t public :-/), the other about data corruption
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
>>
>> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
>> (which optimized COW of unallocated areas).
>>
>> I think I’ve looked at every angle but can‘t find what could be wrong
>> with it.  Do any of you have any idea? :-/
> 
> I now have a reproducer with CentOS, so it’s actually useful outside of
> Red Hat:

I’ve run this test with various configurations, and the installation
does not fail on tmpfs, ext4, btrfs; or without cache.direct=on,aio=native.

(So the installation only fails on xfs and aio=native.  I’ve tried both
virtio-scsi and virtio-blk, but it fails with both.  I did test a
loop-mounted xfs volume, and the installation fails there, too,
regardless of whether the raw xfs image is placed on ext4 or tmpfs.[1])

Unfortunately I still don’t have a more concise reproducer that would
clearly show that this is a kernel bug.  So for the time being I still
don’t know what causes it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with c8bb23cbdbe3 on ppc64le
  2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
  2019-10-10 16:15 ` Anton Nefedov
  2019-10-21 13:33 ` Max Reitz
@ 2019-10-24  9:08 ` Max Reitz
  2 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2019-10-24  9:08 UTC (permalink / raw)
  To: Qemu-block
  Cc: Alberto Garcia, Anton Nefedov, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2247 bytes --]

On 10.10.19 17:17, Max Reitz wrote:
> Hi everyone,
> 
> (CCs just based on tags in the commit in question)
> 
> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
> machines since qemu 4.1.0.  One of those is about bad performance
> (sorry, is isn’t public :-/), the other about data corruption
> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
> 
> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
> (which optimized COW of unallocated areas).
> 
> I think I’ve looked at every angle but can‘t find what could be wrong
> with it.  Do any of you have any idea? :-/

It looks to me like an XFS bug.

On XFS, if you do FALLOC_FL_ZERO_RANGE past the EOF and an AIO pwrite
even further after that range, the pwrite will be discarded if the
fallocate settles after the pwrite (and both have been started before
either as finished).  That is, the file length will be increased as if
only the fallocate had been executed, but not the pwrite, so the
pwrite’s data is lost.

(Interestingly, this is pretty similar to the bug I introduced in qemu
in 50ba5b2d994853b38fed10e0841b119da0f8b8e5, where the ftruncate() would
not consider parallel in-flight writes.)

I’ve attached a C program to show the problem.  It creates an empty
file, issues FALLOC_FL_ZERO_RANGE on the first 4 kB in a thread, and an
AIO pwrite in parallel on the second 4 kB.  It then runs hexdump -C on
the file.

On XFS, the hexdump shows only 4 kB of 0s.  On ext4 and btrfs, it shows
4 kB of 0s and 4 kB of 42s.

(You can uncomment the IN_ORDER to execute the fallocate and pwrite
sequentially, then XFS will show the same output.)

(Note that it is possible that pwrite and fallocate are not issued
before the other is finished, or that fallocate settles before pwrite.
In such cases, the file will probably be written correctly.  However, I
see the wrong result pretty much 100 % of the time.  (So on my machine,
pwrite and fallocate pretty much always run in parallel and fallocate
finishes after pwrite.))

Compile the program like so:

$ gcc parallel-falloc-and-pwrite.c -pthread -laio -Wall -Wextra
-pedantic -std=c11

And run it like so:

$ ./a.out tmp-file

Max

[-- Attachment #1.1.2: parallel-falloc-and-pwrite.c --]
[-- Type: text/x-csrc, Size: 1794 bytes --]

#define _GNU_SOURCE

#include <assert.h>
#include <fcntl.h>
#include <libaio.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// Define this to perform the fallocate and the pwrite sequentially
// instead of in parallel

// #define IN_ORDER


int fd;

void *falloc_thread(void *arg)
{
    int ret;

    (void)arg;

    puts("starting fallocate");

    ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, 0, 4096);
    assert(ret == 0);

    puts("fallocate done");

    return NULL;
}

int main(int argc, char *argv[])
{
    pthread_t falloc_thr;
    int ret;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s <scratch file>\n", argv[0]);
        return 1;
    }

    fd = open(argv[1], O_CREAT | O_RDWR | O_TRUNC | O_DIRECT, 0666);
    assert(fd >= 0);

    void *buf = aligned_alloc(4096, 4096);
    memset(buf, 42, 4096);

    io_context_t ctx = 0;
    ret = io_setup(1, &ctx);
    assert(ret == 0);

    ret = pthread_create(&falloc_thr, NULL, &falloc_thread, NULL);
    assert(ret == 0);

#ifdef IN_ORDER
    ret = pthread_join(falloc_thr, NULL);
    assert(ret == 0);
#endif

    struct iocb ior;
    io_prep_pwrite(&ior, fd, buf, 4096, 4096);

    puts("submitting pwrite");

    struct iocb *ios[] = { &ior };
    ret = io_submit(ctx, 1, ios);
    assert(ret == 1);

    struct io_event evs[1];
    ret = io_getevents(ctx, 1, 1, evs, NULL);
    assert(ret == 1);

    puts("pwrite done");

#ifndef IN_ORDER
    ret = pthread_join(falloc_thr, NULL);
    assert(ret == 0);
#endif

    close(fd);
    free(buf);

    puts("\nHexdump should show 4k of 0s and 4k of 42s:\n");

    execlp("hexdump", "hexdump", "-C", argv[1], NULL);
    return 1;
}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-24  9:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
2019-10-10 16:15 ` Anton Nefedov
2019-10-11  7:49   ` Max Reitz
2019-10-21 11:40     ` Max Reitz
2019-10-21 13:33 ` Max Reitz
2019-10-21 16:24   ` Max Reitz
2019-10-24  9:08 ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).