All of lore.kernel.org
 help / color / mirror / Atom feed
* How to tame CI?
@ 2023-07-26 12:06 Juan Quintela
  2023-07-26 13:00 ` Peter Maydell
  0 siblings, 1 reply; 8+ messages in thread
From: Juan Quintela @ 2023-07-26 12:06 UTC (permalink / raw)
  To: qemu-devel, peter.maydell, Daniel Berrange, richard.henderson


Hi

Now a not on CI, thas has been really bad.  After too many problems
with last PULLS, I decided to learn to use qemu CI.  On one hand, it
is not so difficult, even I can use it O:-)

On the other hand, the amount of problems that I got is inmense.  Some
of them dissapear when I rerun the checks, but I never know if it is
my PULL request, the CI system or the tests themselves.

So it ends going something like:

while (true); do
- git pull
- git rebase
- git push ci blah, blah
- Next day comes, and too many errors, so I rebase again

The last step takes more time than expected and not always trivial to
know how the failure is.

This (last) patch is not part of the PULL request, but I have found
that it _always_ makes gcov fail.  I had to use bisect to find where
the problem was.

https://gitlab.com/juan.quintela/qemu/-/jobs/4571878922

To make things easier, this is the part that show how it breaks (this is
the gcov test):

357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
--- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
+++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
@@ -1,5 +1,21 @@
-....
+...F
+======================================================================
+FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
+    self.assertEqual(log, """\
+AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
+  wrote 524288/524288 bytes at offset 0
+  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+  wrote 524288/524288 bytes at offset 524288
+  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++ read failed: Permission denied
+- read 1048576/1048576 bytes at offset 0
+- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+
 ----------------------------------------------------------------------
 Ran 4 tests
-OK
+FAILED (failures=1)
(test program exited with status code 1)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I could use help to know how a change in test/qtest/migration-test.c
can break block layer tests, I am all ears.

This is the commit:

https://gitlab.com/juan.quintela/qemu/-/commit/7455ee794c01662b5efa1ee67396d85943663ded

Yes, I tried several times.  It always fails on that patch.  The
previous commint passes CI with flying colors.

Later, Juan.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 12:06 How to tame CI? Juan Quintela
@ 2023-07-26 13:00 ` Peter Maydell
  2023-07-26 13:32   ` Thomas Huth
  2023-07-26 14:17   ` Daniel P. Berrangé
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Maydell @ 2023-07-26 13:00 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, Daniel Berrange, richard.henderson

On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> To make things easier, this is the part that show how it breaks (this is
> the gcov test):
>
> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> @@ -1,5 +1,21 @@
> -....
> +...F
> +======================================================================
> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> +    self.assertEqual(log, """\
> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
> +  wrote 524288/524288 bytes at offset 0
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +  wrote 524288/524288 bytes at offset 524288
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ++ read failed: Permission denied
> +- read 1048576/1048576 bytes at offset 0
> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +

This iotest failing is an intermittent that I've seen running
pullreqs on master. I tend to see it on the s390 host. I
suspect a race condition somewhere where it fails if the host
is heavily loaded.

-- PMM


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 13:00 ` Peter Maydell
@ 2023-07-26 13:32   ` Thomas Huth
  2023-10-05 12:35     ` Vladimir Sementsov-Ogievskiy
  2023-07-26 14:17   ` Daniel P. Berrangé
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2023-07-26 13:32 UTC (permalink / raw)
  To: Peter Maydell, quintela, Kevin Wolf, hreitz,
	Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Daniel Berrange, richard.henderson, Qemu-block

On 26/07/2023 15.00, Peter Maydell wrote:
> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>> To make things easier, this is the part that show how it breaks (this is
>> the gcov test):
>>
>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>> stderr:
>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>> @@ -1,5 +1,21 @@
>> -....
>> +...F
>> +======================================================================
>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>> +----------------------------------------------------------------------
>> +Traceback (most recent call last):
>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>> +    self.assertEqual(log, """\
>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
>> +  wrote 524288/524288 bytes at offset 0
>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +  wrote 524288/524288 bytes at offset 524288
>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> ++ read failed: Permission denied
>> +- read 1048576/1048576 bytes at offset 0
>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +
> 
> This iotest failing is an intermittent that I've seen running
> pullreqs on master. I tend to see it on the s390 host. I
> suspect a race condition somewhere where it fails if the host
> is heavily loaded.

It's obviously a failure in an iotest, so let's CC: the corresponding people 
(done now).

  Thomas



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 13:00 ` Peter Maydell
  2023-07-26 13:32   ` Thomas Huth
@ 2023-07-26 14:17   ` Daniel P. Berrangé
  2023-07-26 14:36     ` Juan Quintela
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2023-07-26 14:17 UTC (permalink / raw)
  To: Peter Maydell; +Cc: quintela, qemu-devel, richard.henderson

On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> > To make things easier, this is the part that show how it breaks (this is
> > the gcov test):
> >
> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> > stderr:
> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> > @@ -1,5 +1,21 @@
> > -....
> > +...F
> > +======================================================================
> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> > +    self.assertEqual(log, """\
> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
> > +  wrote 524288/524288 bytes at offset 0
> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +  wrote 524288/524288 bytes at offset 524288
> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > ++ read failed: Permission denied
> > +- read 1048576/1048576 bytes at offset 0
> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +
> 
> This iotest failing is an intermittent that I've seen running
> pullreqs on master. I tend to see it on the s390 host. I
> suspect a race condition somewhere where it fails if the host
> is heavily loaded.

Since it is known flakey, we should just commit the change

--- a/tests/qemu-iotests/tests/copy-before-write
+++ b/tests/qemu-iotests/tests/copy-before-write
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# group: auto backup
+# group: backup
 #
 # Copyright (c) 2022 Virtuozzo International GmbH
 #


and if someone wants to re-enable it, they get the job of fixing its
reliability first.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 14:17   ` Daniel P. Berrangé
@ 2023-07-26 14:36     ` Juan Quintela
  2023-07-26 14:43       ` Daniel P. Berrangé
  0 siblings, 1 reply; 8+ messages in thread
From: Juan Quintela @ 2023-07-26 14:36 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Peter Maydell, qemu-devel, richard.henderson

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>> > To make things easier, this is the part that show how it breaks (this is
>> > the gcov test):
>> >
>> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
>> > MALLOC_PERTURB_=44
>> > /builds/juan.quintela/qemu/build/pyvenv/bin/python3
>> > /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
>> > -qcow2 copy-before-write --source-dir
>> > /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
>> > /builds/juan.quintela/qemu/build/tests/qemu-iotests
>> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>> > stderr:
>> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>> > @@ -1,5 +1,21 @@
>> > -....
>> > +...F
>> > +======================================================================
>> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>> > +----------------------------------------------------------------------
>> > +Traceback (most recent call last):
>> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>> > +    self.assertEqual(log, """\
>> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
>> > offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
>> > Permission denied\n'
>> > +  wrote 524288/524288 bytes at offset 0
>> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > +  wrote 524288/524288 bytes at offset 524288
>> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > ++ read failed: Permission denied
>> > +- read 1048576/1048576 bytes at offset 0
>> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> > +
>> 
>> This iotest failing is an intermittent that I've seen running
>> pullreqs on master. I tend to see it on the s390 host. I
>> suspect a race condition somewhere where it fails if the host
>> is heavily loaded.

What is weird to me is that I was unable to reproduce it on the previous
commit.  But with this one happened always.  No, I have no clue why, and
as said, it makes zero sense, it is for a binary that it is not used on
the block test.

Later, Juan.

>
> Since it is known flakey, we should just commit the change
>
> --- a/tests/qemu-iotests/tests/copy-before-write
> +++ b/tests/qemu-iotests/tests/copy-before-write
> @@ -1,5 +1,5 @@
>  #!/usr/bin/env python3
> -# group: auto backup
> +# group: backup
>  #
>  # Copyright (c) 2022 Virtuozzo International GmbH
>  #
>
>
> and if someone wants to re-enable it, they get the job of fixing its
> reliability first.
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 14:36     ` Juan Quintela
@ 2023-07-26 14:43       ` Daniel P. Berrangé
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel P. Berrangé @ 2023-07-26 14:43 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Peter Maydell, qemu-devel, richard.henderson

On Wed, Jul 26, 2023 at 04:36:32PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Wed, Jul 26, 2023 at 02:00:03PM +0100, Peter Maydell wrote:
> >> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
> >> > To make things easier, this is the part that show how it breaks (this is
> >> > the gcov test):
> >> >
> >> > 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
> >> > >>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
> >> > MALLOC_PERTURB_=44
> >> > /builds/juan.quintela/qemu/build/pyvenv/bin/python3
> >> > /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
> >> > -qcow2 copy-before-write --source-dir
> >> > /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
> >> > /builds/juan.quintela/qemu/build/tests/qemu-iotests
> >> > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> >> > stderr:
> >> > --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
> >> > +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> >> > @@ -1,5 +1,21 @@
> >> > -....
> >> > +...F
> >> > +======================================================================
> >> > +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
> >> > +----------------------------------------------------------------------
> >> > +Traceback (most recent call last):
> >> > +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
> >> > +    self.assertEqual(log, """\
> >> > +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
> >> > offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
> >> > Permission denied\n'
> >> > +  wrote 524288/524288 bytes at offset 0
> >> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > +  wrote 524288/524288 bytes at offset 524288
> >> > +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > ++ read failed: Permission denied
> >> > +- read 1048576/1048576 bytes at offset 0
> >> > +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> > +
> >> 
> >> This iotest failing is an intermittent that I've seen running
> >> pullreqs on master. I tend to see it on the s390 host. I
> >> suspect a race condition somewhere where it fails if the host
> >> is heavily loaded.
> 
> What is weird to me is that I was unable to reproduce it on the previous
> commit.  But with this one happened always.  No, I have no clue why, and
> as said, it makes zero sense, it is for a binary that it is not used on
> the block test.

Your commit changes the migration test, which could change the overall
tests running time, and thus impact what tests are running in parallel.
This could be enough to trigger the race more reliably.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-07-26 13:32   ` Thomas Huth
@ 2023-10-05 12:35     ` Vladimir Sementsov-Ogievskiy
  2023-10-05 14:36       ` Juan Quintela
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-05 12:35 UTC (permalink / raw)
  To: Thomas Huth, Peter Maydell, quintela, Kevin Wolf, hreitz
  Cc: qemu-devel, Daniel Berrange, richard.henderson, Qemu-block

On 26.07.23 16:32, Thomas Huth wrote:
> On 26/07/2023 15.00, Peter Maydell wrote:
>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>>> To make things easier, this is the part that show how it breaks (this is
>>> the gcov test):
>>>
>>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>> stderr:
>>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>>> @@ -1,5 +1,21 @@
>>> -....
>>> +...F
>>> +======================================================================
>>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>>> +----------------------------------------------------------------------
>>> +Traceback (most recent call last):
>>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>>> +    self.assertEqual(log, """\
>>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
>>> +  wrote 524288/524288 bytes at offset 0
>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> +  wrote 524288/524288 bytes at offset 524288
>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> ++ read failed: Permission denied
>>> +- read 1048576/1048576 bytes at offset 0
>>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> +
>>
>> This iotest failing is an intermittent that I've seen running
>> pullreqs on master. I tend to see it on the s390 host. I
>> suspect a race condition somewhere where it fails if the host
>> is heavily loaded.
> 
> It's obviously a failure in an iotest, so let's CC: the corresponding people (done now).
> 

Sorry for long delay.

Does it still fail?

In the test we expect that copy-before-write operation fails (because of throttling and timeout), and therefore snapshot is broken and next read from snapshot should fail.

But most probably the copy-before-write operation succeeded in this case for some reason.. I don't think that throttling and timeouts in block layer can guarantee some determinism.. But usually it works.

we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and cbw-timeout is set to 1 second.

Then we do write 512K,

then the comment say:
# We need second write to trigger throttling

and we write another 512K.

first 512K are written, and we should wait 512/300 = 1.7 seconds since _start_ of that write before issuing the second one.. But if write was slow we may have to wait less than a second from finish of the first write start the second one. Then timeout will not fire.

====

I see two possible ways to fix that:

1. decrease bps-write a bit. For example to 200 BPS.

2. rework the test to use null-co instead of real images. This way we will not suffer from unstable IO duration.


So, is the problem still fire sometimes?	

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to tame CI?
  2023-10-05 12:35     ` Vladimir Sementsov-Ogievskiy
@ 2023-10-05 14:36       ` Juan Quintela
  0 siblings, 0 replies; 8+ messages in thread
From: Juan Quintela @ 2023-10-05 14:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Thomas Huth, Peter Maydell, Kevin Wolf, hreitz, qemu-devel,
	Daniel Berrange, richard.henderson, Qemu-block

Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> On 26.07.23 16:32, Thomas Huth wrote:
>> On 26/07/2023 15.00, Peter Maydell wrote:
>>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>>>> To make things easier, this is the part that show how it breaks (this is
>>>> the gcov test):
>>>>
>>>> 357/423 qemu:block / io-qcow2-copy-before-write                            ERROR           6.38s   exit status 1
>>>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> MALLOC_PERTURB_=44
>>>> /builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
>>>> -qcow2 copy-before-write --source-dir
>>>> /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
>>>> /builds/juan.quintela/qemu/build/tests/qemu-iotests
>>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>>> stderr:
>>>> --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>>>> +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>>>> @@ -1,5 +1,21 @@
>>>> -....
>>>> +...F
>>>> +======================================================================
>>>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>>>> +----------------------------------------------------------------------
>>>> +Traceback (most recent call last):
>>>> +  File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot
>>>> +    self.assertEqual(log, """\
>>>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
>>>> offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
>>>> Permission denied\n'
>>>> +  wrote 524288/524288 bytes at offset 0
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +  wrote 524288/524288 bytes at offset 524288
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ++ read failed: Permission denied
>>>> +- read 1048576/1048576 bytes at offset 0
>>>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +
>>>
>>> This iotest failing is an intermittent that I've seen running
>>> pullreqs on master. I tend to see it on the s390 host. I
>>> suspect a race condition somewhere where it fails if the host
>>> is heavily loaded.
>> It's obviously a failure in an iotest, so let's CC: the
>> corresponding people (done now).
>> 
>
> Sorry for long delay.
>
> Does it still fail?
>
> In the test we expect that copy-before-write operation fails (because
> of throttling and timeout), and therefore snapshot is broken and next
> read from snapshot should fail.
>
> But most probably the copy-before-write operation succeeded in this
> case for some reason.. I don't think that throttling and timeouts in
> block layer can guarantee some determinism.. But usually it works.
>
> we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and cbw-timeout is set to 1 second.
>
> Then we do write 512K,
>
> then the comment say:
> # We need second write to trigger throttling
>
> and we write another 512K.
>
> first 512K are written, and we should wait 512/300 = 1.7 seconds since
> _start_ of that write before issuing the second one.. But if write was
> slow we may have to wait less than a second from finish of the first
> write start the second one. Then timeout will not fire.
>
> ====
>
> I see two possible ways to fix that:
>
> 1. decrease bps-write a bit. For example to 200 BPS.
>
> 2. rework the test to use null-co instead of real images. This way we will not suffer from unstable IO duration.
>
>
> So, is the problem still fire sometimes?	

For me it is random.  When it happens, it do it forever.
And then it stops, and don't happens for a while.

It is not happening for me now.

Later, Juan.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-05 14:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-26 12:06 How to tame CI? Juan Quintela
2023-07-26 13:00 ` Peter Maydell
2023-07-26 13:32   ` Thomas Huth
2023-10-05 12:35     ` Vladimir Sementsov-Ogievskiy
2023-10-05 14:36       ` Juan Quintela
2023-07-26 14:17   ` Daniel P. Berrangé
2023-07-26 14:36     ` Juan Quintela
2023-07-26 14:43       ` Daniel P. Berrangé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.