All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Anton Nefedov <anton.nefedov@virtuozzo.com>,
	Qemu-block <qemu-block@nongnu.org>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Alberto Garcia <berto@igalia.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: Problems with c8bb23cbdbe3 on ppc64le
Date: Mon, 21 Oct 2019 13:40:27 +0200	[thread overview]
Message-ID: <4c61a0ba-3a75-fffb-a724-4f4700eaa111@redhat.com> (raw)
In-Reply-To: <cd53cd86-e93c-297a-c08e-3fc1ae2618ac@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 2764 bytes --]

On 11.10.19 09:49, Max Reitz wrote:
> On 10.10.19 18:15, Anton Nefedov wrote:
>> On 10/10/2019 6:17 PM, Max Reitz wrote:
>>> Hi everyone,
>>>
>>> (CCs just based on tags in the commit in question)
>>>
>>> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
>>> machines since qemu 4.1.0.  One of those is about bad performance
>>> (sorry, is isn’t public :-/), the other about data corruption
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
>>>
>>> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
>>> (which optimized COW of unallocated areas).
>>>
>>> I think I’ve looked at every angle but can‘t find what could be wrong
>>> with it.  Do any of you have any idea? :-/
>>>
>>
>> hi,
>>
>> oh, that patch strikes again..
>>
>> I don't quite follow, was this bug confirmed to happen on x86? Comment 8
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934#c8) mentioned that
>> (or was that mixed up with the old xfsctl bug?)
> 
> I think that was mixed up with the xfsctl bug, yes.
> 
>> Regardless of the platform, does it reproduce? That's comforting
>> already; worst case we can trace each and every request then (unless it
>> will stop to reproduce this way).
> 
> I haven’t been able to reproduce it yet (wrestling with the test system
> and getting ppc64 machines provisioned), but as far as I know it
> reproduces reliably on ppc64, but only there.
> 
>> Also, perhaps it's worth to try to replace fallocate with write(0)?
>> Either in qcow2 (in the patch, bdrv_co_pwrite_zeroes -> bdrv_co_pwritev)
>> or in the file driver. It might hint whether it's misbehaving fallocate
>> (in qemu or in kernel) or something else.
> 
> Good idea, that should at least tell us something about the corruption.

OK, after a week of debugging I’m not really much wiser.

One thing I know is that I can see the issue on x86-64 now, but not on
ext4, only XFS.

Replacing the zero-write with actually writing zeroes fixes it, but I
still don’t know whether that’s because of the kernel or because the
write is just slower or takes another code path...

The only thing I could narrow it down to is this:

The issue persists if handle_alloc_space() writes zeroes (with a
narrowed aligned zero-write with NO_FALLBACK) only to the non-COW area,
and I keep skip_cow to be false.

So there seems to be some kind of interaction between the zero-write and
the following write of data.  I don’t know what kind of interaction that
is, though.  I have tried to write a test case in qemu-img (basically
rewriting qemu-img bench), but failed so far.

It certainly looks like a kernel issue, but without a simpler reproducer
I just cannot tell.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-10-21 11:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
2019-10-10 16:15 ` Anton Nefedov
2019-10-11  7:49   ` Max Reitz
2019-10-21 11:40     ` Max Reitz [this message]
2019-10-21 13:33 ` Max Reitz
2019-10-21 16:24   ` Max Reitz
2019-10-24  9:08 ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c61a0ba-3a75-fffb-a724-4f4700eaa111@redhat.com \
    --to=mreitz@redhat.com \
    --cc=anton.nefedov@virtuozzo.com \
    --cc=berto@igalia.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.