All of lore.kernel.org
 help / color / mirror / Atom feed
* Transient fail of iotests 215 and 197
@ 2021-07-21 17:22 Daniel P. Berrangé
  2021-07-27 14:23 ` Thomas Huth
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel P. Berrangé @ 2021-07-21 17:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Eric Blake, Max Reitz

Peter caught the following transient fail on the staging tree:

  https://gitlab.com/qemu-project/qemu/-/jobs/1438817749

--- /builds/qemu-project/qemu/tests/qemu-iotests/197.out
+++ 197.out.bad
@@ -12,13 +12,12 @@
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 0/0 bytes at offset 0
 0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
 read 1024/1024 bytes at offset 3221226496
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only device
-2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
+2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
+1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
 64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
 1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
 No errors were found on the image.


--- /builds/qemu-project/qemu/tests/qemu-iotests/215.out
+++ 215.out.bad
@@ -12,13 +12,12 @@
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 0/0 bytes at offset 0
 0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
 read 1024/1024 bytes at offset 3221226496
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
-2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
+2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
+1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
 64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
 1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
 No errors were found on the image.


Looks like the process might have been killed off by the OS part way
through.

Interestingly both test cases have a comment:

  #                                        Since a 2G read may exhaust
  # memory on some machines (particularly 32-bit), we skip the test if
  # that fails due to memory pressure.


I'm wondering if the logic for handling this failure is flawed, as being
killed by the OS for exhuasting memory limits for the CI container looks
like a plausible scenario to explain the failure.

The CI shared runners supposedly have 3.75 GB of RAM for the VM as a whole.
If the tests are run in parallel this could still be an issue.

Maybe we need to skip these tests by default if they are known to require
a significant amount of memory to run ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Transient fail of iotests 215 and 197
  2021-07-21 17:22 Transient fail of iotests 215 and 197 Daniel P. Berrangé
@ 2021-07-27 14:23 ` Thomas Huth
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Huth @ 2021-07-27 14:23 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel, qemu-block
  Cc: Kevin Wolf, Peter Maydell, Eric Blake, Max Reitz

On 21/07/2021 19.22, Daniel P. Berrangé wrote:
> Peter caught the following transient fail on the staging tree:
> 
>    https://gitlab.com/qemu-project/qemu/-/jobs/1438817749
> 
> --- /builds/qemu-project/qemu/tests/qemu-iotests/197.out
> +++ 197.out.bad
> @@ -12,13 +12,12 @@
>   128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   read 0/0 bytes at offset 0
>   0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -read 2147483136/2147483136 bytes at offset 1024
> -2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
>   read 1024/1024 bytes at offset 3221226496
>   1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only device
> -2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
> -1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
> +2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
> +1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
>   64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
>   1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
>   No errors were found on the image.
> 
> 
> --- /builds/qemu-project/qemu/tests/qemu-iotests/215.out
> +++ 215.out.bad
> @@ -12,13 +12,12 @@
>   128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   read 0/0 bytes at offset 0
>   0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -read 2147483136/2147483136 bytes at offset 1024
> -2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
>   read 1024/1024 bytes at offset 3221226496
>   1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
> -2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
> -1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
> +2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
> +1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
>   64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
>   1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
>   No errors were found on the image.
> 
> 
> Looks like the process might have been killed off by the OS part way
> through.
> 
> Interestingly both test cases have a comment:
> 
>    #                                        Since a 2G read may exhaust
>    # memory on some machines (particularly 32-bit), we skip the test if
>    # that fails due to memory pressure.
> 
> 
> I'm wondering if the logic for handling this failure is flawed, as being
> killed by the OS for exhuasting memory limits for the CI container looks
> like a plausible scenario to explain the failure.
> 
> The CI shared runners supposedly have 3.75 GB of RAM for the VM as a whole.
> If the tests are run in parallel this could still be an issue.
> 
> Maybe we need to skip these tests by default if they are known to require
> a significant amount of memory to run ?

The tests are not in the "auto" group, so they are not running by default - 
but I once added them to the build-tcg-disabled job since they were working 
fine in the gitlab-CI.

If they are now dying because of out-of-memory issues, that means that 
either they are using more memory now, or that the containers changed and 
provide less free memory now. Anyway, it sounds like the tests are not 
suited for the gitlab-CI anymore, and since they are not in the "auto" group 
anyway, I'd suggest to simply disable them in the build-tcg-disabled job 
again. I'll send a patch...

  Thomas



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-27 14:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 17:22 Transient fail of iotests 215 and 197 Daniel P. Berrangé
2021-07-27 14:23 ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.