All of lore.kernel.org
 help / color / mirror / Atom feed
From: Claudio Fontana <cfontana@suse.de>
To: Thomas Huth <thuth@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
Cc: "Jason J. Herne" <jjherne@linux.ibm.com>,
	Fam Zheng <fam@euphon.net>, Liang Yan <lyan@suse.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Cornelia Huck <cohuck@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed
Date: Thu, 16 Jul 2020 14:58:54 +0200	[thread overview]
Message-ID: <1db6d502-73d1-5e3d-10d1-796d80ab8f07@suse.de> (raw)
In-Reply-To: <8ff7eeab-bef1-0957-a95c-72819680c431@suse.de>

Small update on this,

On 7/15/20 1:10 PM, Claudio Fontana wrote:
> Hi Thomas,
> 
> On 7/14/20 4:35 PM, Thomas Huth wrote:
>> On 14/07/2020 16.29, Claudio Fontana wrote:
>>> Hello,
>>>
>>> I have some tiny progress in narrowing down this issue, possibly a qcow2 issue, still unclear,
>>> but involving Kevin Wolf and Max Reitz.
>>>
>>>
>>> The reproducer again:
>>>
>>>> --------------------------------------------cut-------------------------------------------
>>>> diff --git a/cpus.c b/cpus.c
>>>> index 41d1c5099f..443b88697a 100644
>>>> --- a/cpus.c
>>>> +++ b/cpus.c
>>>> @@ -643,7 +643,7 @@ static void qemu_account_warp_timer(void)
>>>>  
>>>>  static bool icount_state_needed(void *opaque)
>>>>  {
>>>> -    return use_icount;
>>>> +    return 0;
>>>>  }
>>>>  
>>>>  static bool warp_timer_state_needed(void *opaque)
>>>> --------------------------------------------cut-------------------------------------------
>>>
>>> This issue for now appears on s390 only:
>>>
>>> On s390 hardware, test 267 fails (both kvm and tcg) in the qcow2 backing file part, with broken migration stream data in the s390-skeys vmsave (old style).
>> [...]
>>> If someone has a good idea let me know - first attempts to reproduce on x86 failed, but maybe more work could lead to it.
>>
> 
> small update: in the GOOD case (enough padding added) a qcow_merge() is triggered for the last write of 16202 bytes.
> In the BAD case (not enough padding added) a qcow_merge() is not triggered for the last write of 16201 bytes.
> 
> Note: manually flushing with qemu_fflush in s390-skeys vmsave also works (maybe got lost in the noise).
> 
> 
>> Two questions:
>>
>> 1) Can you also reproduce the issue manually, without running iotest
>> 267? ... I tried, but so far I failed.
> 
> Thanks for the suggestion, will try.

Currently trying to reproduce manually an environment similar to that of the test,
at the moment I am not able to reproduce the issue manually.

Not very familiar with s390,
I've been running with 

export QEMU=/home/cfontana/qemu-build/s390x-softmmu/qemu-system-s390x

$QEMU -nographic -monitor stdio -nodefaults -no-shutdown FILENAME

where FILENAME is the qcow2 produced by the test.

let me know if you have a suggestion on how to setup up something simple properly.


> 
>>
>> 2) Since all the information so far sounds like the problem could be
>> elsewhere in the code, and the skeys just catch it by accident ... have
>> you tried running with valgrind? Maybe it catches something useful?
> 
> Nothing yet, but will fiddle with the options a bit more.

Only thing I have seen so far:


+==33321== 
+==33321== Warning: client switching stacks?  SP change: 0x1ffeffe5e8 --> 0x5d9cf60
+==33321==          to suppress, use: --max-stackframe=137324009096 or greater
+==33321== Warning: client switching stacks?  SP change: 0x5d9cd18 --> 0x1ffeffe5e8
+==33321==          to suppress, use: --max-stackframe=137324009680 or greater
+==33321== Warning: client switching stacks?  SP change: 0x1ffeffe8b8 --> 0x5d9ce58
+==33321==          to suppress, use: --max-stackframe=137324010080 or greater
+==33321==          further instances of this message will not be shown.
+==33321== Thread 4:
+==33321== Conditional jump or move depends on uninitialised value(s)
+==33321==    at 0x3AEC70: process_queued_cpu_work (cpus-common.c:331)
+==33321==    by 0x2753E1: qemu_wait_io_event_common (cpus.c:1213)
+==33321==    by 0x2755CD: qemu_wait_io_event (cpus.c:1253)
+==33321==    by 0x27596D: qemu_dummy_cpu_thread_fn (cpus.c:1337)
+==33321==    by 0x725C87: qemu_thread_start (qemu-thread-posix.c:521)
+==33321==    by 0x4D504E9: start_thread (in /lib64/libpthread-2.26.so)
+==33321==    by 0x4E72BBD: ??? (in /lib64/libc-2.26.so)
+==33321== 
+==33321== Conditional jump or move depends on uninitialised value(s)
+==33321==    at 0x3AEC74: process_queued_cpu_work (cpus-common.c:331)
+==33321==    by 0x2753E1: qemu_wait_io_event_common (cpus.c:1213)
+==33321==    by 0x2755CD: qemu_wait_io_event (cpus.c:1253)
+==33321==    by 0x27596D: qemu_dummy_cpu_thread_fn (cpus.c:1337)
+==33321==    by 0x725C87: qemu_thread_start (qemu-thread-posix.c:521)
+==33321==    by 0x4D504E9: start_thread (in /lib64/libpthread-2.26.so)
+==33321==    by 0x4E72BBD: ??? (in /lib64/libc-2.26.so)
+==33321== 
+==33321== 
+==33321== HEAP SUMMARY:
+==33321==     in use at exit: 2,138,442 bytes in 13,935 blocks
+==33321==   total heap usage: 19,089 allocs, 5,154 frees, 5,187,670 bytes allocated
+==33321== 
+==33321== LEAK SUMMARY:
+==33321==    definitely lost: 0 bytes in 0 blocks
+==33321==    indirectly lost: 0 bytes in 0 blocks
+==33321==      possibly lost: 7,150 bytes in 111 blocks
+==33321==    still reachable: 2,131,292 bytes in 13,824 blocks
+==33321==         suppressed: 0 bytes in 0 blocks
+==33321== Rerun with --leak-check=full to see details of leaked memory


> 
>>
>>  Thomas
>>
> 
> Ciao,
> 
> Claudio
> 
> 

A more interesting update is what follows I think.

I was able to "fix" the problem shown by the reproducer:

@@ -643,7 +643,7 @@ static void qemu_account_warp_tim@@ -643,7 +643,7 @@ static void qemu_account_warp_timer(void)
 
 static bool icount_state_needed(void *opaque)
 {
-    return use_icount;
+    return 0;
 }
 
by just slowing down qcow2_co_pwritev_task_entry with some tight loops,
without changing any fields between runs (other than the reproducer icount field removal).

I tried to insert the same slowdown just in savevm.c at the end of save_snapshot, but that does not work, needs to be in the coroutine.

Thanks,

Claudio



  parent reply	other threads:[~2020-07-16 12:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 10:00 migration: broken snapshot saves appear on s390 when small fields in migration stream removed Claudio Fontana
2020-07-12 16:11 ` Paolo Bonzini
2020-07-13  9:11   ` Claudio Fontana
2020-07-14 14:29     ` Claudio Fontana
2020-07-14 14:35       ` Thomas Huth
2020-07-15 11:10         ` Claudio Fontana
2020-07-15 12:25           ` Claudio Fontana
2020-07-16 12:58           ` Claudio Fontana [this message]
2020-07-20 18:24             ` Claudio Fontana
2020-07-21  8:22               ` Claudio Fontana
2020-07-27 23:09                 ` Bruce Rogers
2020-07-28  8:15                   ` Vladimir Sementsov-Ogievskiy
2020-07-28  8:43                   ` Vladimir Sementsov-Ogievskiy
2020-07-28 13:23                     ` Bruce Rogers
2020-07-28 11:10                   ` Max Reitz
2020-07-28 11:27                     ` Vladimir Sementsov-Ogievskiy
2020-07-28 11:33                     ` Vladimir Sementsov-Ogievskiy
2020-07-28 11:35                       ` Paolo Bonzini
2020-07-28 11:45                         ` Max Reitz
2020-07-28 12:09                           ` Paolo Bonzini
2020-07-28 12:47                     ` Claudio Fontana
2020-07-13 11:03 ` Dr. David Alan Gilbert
2020-07-13 11:39   ` Cornelia Huck
2020-07-13 11:39   ` Claudio Fontana
2020-07-13 11:45     ` Claudio Fontana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1db6d502-73d1-5e3d-10d1-796d80ab8f07@suse.de \
    --to=cfontana@suse.de \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=fam@euphon.net \
    --cc=jjherne@linux.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=lyan@suse.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.