All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Bruce Rogers <brogers@suse.com>,
	Claudio Fontana <cfontana@suse.de>,
	Thomas Huth <thuth@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
Cc: "Jason J. Herne" <jjherne@linux.ibm.com>,
	Fam Zheng <fam@euphon.net>, Liang Yan <lyan@suse.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Cornelia Huck <cohuck@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed
Date: Tue, 28 Jul 2020 11:15:18 +0300	[thread overview]
Message-ID: <a6d1196b-8236-fbe8-75f4-47137241e10a@virtuozzo.com> (raw)
In-Reply-To: <5f455e82e0a4f3662918dcdc85d1cfc5a3187896.camel@suse.com>

28.07.2020 02:09, Bruce Rogers wrote:
> On Tue, 2020-07-21 at 10:22 +0200, Claudio Fontana wrote:
>> On 7/20/20 8:24 PM, Claudio Fontana wrote:
>>> I have now been able to reproduce this on X86 as well.
>>>
>>> It happens much more rarely, about once every 10 times.
>>>
>>> I will sort out the data and try to make it even more reproducible,
>>> then post my findings in detail.
>>>
>>> Overall I proceeded as follows:
>>>
>>> 1) hooked the savevm code to skip all fields with the exception of
>>> "s390-skeys". So only s390-skeys are actually saved.
>>>
>>> 2) reimplemented "s390-skeys" in a common implementation in cpus.c,
>>> used on both x86 and s390, modeling the behaviour of save/load from
>>> hw/s390
>>>
>>> 3) ran ./check -qcow2 267 on both x86 and s390.
>>>
>>> In the case of s390, failure seems to be reproducible 100% of the
>>> times.
>>> On X86, it is as mentioned failing about 10% of the times.
>>>
>>> Ciao,
>>>
>>> Claudio
>>
>> And here is a small series of two patches that can be used to
>> reproduce the problem.
>>
>> Clearly, this is not directly related to s390 or to skeys or to
>> icount in particular, it is just an issue that happened to be more
>> visible there.
>>
>> If you could help with this, please apply the attached patches.
>>
>> Patch 1 just adds a new "300" iotest. It is way easier to extract the
>> relevant part out of test 267, which does a bit too much in the same
>> file.
>> Also this allows easier use of valgrind, since it does not "require"
>> anything.
>>
>> Patch 2 hooks the savevm code to skip all fields during the snapshot
>> with the exception of "s390-skeys", a new artificial field
>> implemented to
>> model what the real s390-skeys is doing.
>>
>> After applying patch 1 and patch 2, you can test (also on X86), with:
>>
>> ./check -qcow2 300
>>
>> On X86 many runs will be successful, but a certain % of them will
>> instead fail like this:
>>
>>
>> claudio@linux-ch70:~/git/qemu-pristine/qemu-build/tests/qemu-iotests>
>> ./check -qcow2 300
>> QEMU          -- "/home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64"
>> -nodefaults -display none -accel qtest
>> QEMU_IMG      -- "/home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/../../qemu-img"
>> QEMU_IO       -- "/home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/../../qemu-io"  --cache writeback --aio
>> threads -f qcow2
>> QEMU_NBD      -- "/home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/../../qemu-nbd"
>> IMGFMT        -- qcow2 (compat=1.1)
>> IMGPROTO      -- file
>> PLATFORM      -- Linux/x86_64 linux-ch70 4.12.14-lp151.28.36-default
>> TEST_DIR      -- /home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/scratch
>> SOCK_DIR      -- /tmp/tmp.gdcUu3l0SM
>> SOCKET_SCM_HELPER -- /home/claudio/git/qemu-pristine/qemu-
>> build/tests/qemu-iotests/socket_scm_helper
>>
>> 300      fail       [10:14:05] [10:14:06]      (last: 0s)    output
>> mismatch (see 300.out.bad)
>> --- /home/claudio/git/qemu-pristine/qemu/tests/qemu-
>> iotests/300.out     2020-07-21 10:03:54.468104764 +0200
>> +++ /home/claudio/git/qemu-pristine/qemu-build/tests/qemu-
>> iotests/300.out.bad   2020-07-21 10:14:06.098090543 +0200
>> @@ -12,6 +12,9 @@
>>   ID        TAG                 VM SIZE                DATE       VM
>> CLOCK
>>   --        snap0                  SIZE yyyy-mm-dd
>> hh:mm:ss   00:00:00.000
>>   (qemu) loadvm snap0
>> +Unexpected storage key data: 0
>> +error while loading state for instance 0x0 of device 's390-skeys'
>> +Error: Error -22 while loading VM state
>>   (qemu) quit
>>   
>>   *** done
>> Failures: 300
>> Failed 1 of 1 iotests
>>
>>
>> At this point somebody more knowledgeable about QCOW2, coroutines and
>> backing files could chime in?
>>
> <trim>
> 
> I used the reproducer you provide here to do a git bisect as I assume
> whatever is now broken wasn't always broken, and it pointed to the
> following commit:
> 
> commit df893d25ceea3c0dcbe6d6b425309317fab6b22e (refs/bisect/bad)
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date:   Tue Jun 4 19:15:13 2019 +0300
> 
>      block/qcow2: implement .bdrv_co_preadv_part
> 
> Indeed, I am currently able to reliable reproduce the issue with this
> commit applied, and not reproduce it without it.
> 
> That said, I've not been able to identify exactly what is going wrong.
> I'm fairly confident the savevm data is correctly written out, but on
> the loadvm side, somehow the last part of the s390 data is not
> correctly read in the data (it's in the second pass through the while
> loop in qcow2_co_preadv_part() where that happens.)
> 
> If anyone familiar with this code can have a look or provide some
> pointers, it would be much appreciated.
> 
> Adding commit author to CC.

Ooops :) I'll look at it of course.


-- 
Best regards,
Vladimir


  reply	other threads:[~2020-07-28  8:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 10:00 migration: broken snapshot saves appear on s390 when small fields in migration stream removed Claudio Fontana
2020-07-12 16:11 ` Paolo Bonzini
2020-07-13  9:11   ` Claudio Fontana
2020-07-14 14:29     ` Claudio Fontana
2020-07-14 14:35       ` Thomas Huth
2020-07-15 11:10         ` Claudio Fontana
2020-07-15 12:25           ` Claudio Fontana
2020-07-16 12:58           ` Claudio Fontana
2020-07-20 18:24             ` Claudio Fontana
2020-07-21  8:22               ` Claudio Fontana
2020-07-27 23:09                 ` Bruce Rogers
2020-07-28  8:15                   ` Vladimir Sementsov-Ogievskiy [this message]
2020-07-28  8:43                   ` Vladimir Sementsov-Ogievskiy
2020-07-28 13:23                     ` Bruce Rogers
2020-07-28 11:10                   ` Max Reitz
2020-07-28 11:27                     ` Vladimir Sementsov-Ogievskiy
2020-07-28 11:33                     ` Vladimir Sementsov-Ogievskiy
2020-07-28 11:35                       ` Paolo Bonzini
2020-07-28 11:45                         ` Max Reitz
2020-07-28 12:09                           ` Paolo Bonzini
2020-07-28 12:47                     ` Claudio Fontana
2020-07-13 11:03 ` Dr. David Alan Gilbert
2020-07-13 11:39   ` Cornelia Huck
2020-07-13 11:39   ` Claudio Fontana
2020-07-13 11:45     ` Claudio Fontana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6d1196b-8236-fbe8-75f4-47137241e10a@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=brogers@suse.com \
    --cc=cfontana@suse.de \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=fam@euphon.net \
    --cc=jjherne@linux.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=lyan@suse.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.