All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu()
@ 2018-07-30 22:08 Eric Blake
  2018-07-31 10:28 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Blake @ 2018-07-30 22:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, armbru, mst, peter.maydell, alex.bennee, f4bug, rth

In kill_qemu() we have an assert that checks that the QEMU process
didn't dump core:
            assert(!WCOREDUMP(wstatus));

Unfortunately the WCOREDUMP macro here means the resulting message
is not very easy to comprehend on at least some systems:

ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.

and it doesn't identify what signal the process took.

Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
happening if we didn't install signal handlers, it's still better
to always be robust), and also want to log unexpected non-zero status
that was not accompanied by a core dump.

Instead of using a raw assert, print the information in an
easier to understand way:

/i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault)
Aborted (core dumped)

(Of course, the really useful information would be why the QEMU
process dumped core in the first place, but we don't have that
by the time the test program has picked up the exit status.)

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: use TFR() instead of open-coding the retry loop [Thomas]
---
 tests/libqtest.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 098af6aec44..3aa3e4c2a46 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -105,12 +105,43 @@ static void kill_qemu(QTestState *s)
     if (s->qemu_pid != -1) {
         int wstatus = 0;
         pid_t pid;
+        bool die = false;

         kill(s->qemu_pid, SIGTERM);
-        pid = waitpid(s->qemu_pid, &wstatus, 0);
+        TFR(pid = waitpid(s->qemu_pid, &wstatus, 0));

-        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
-            assert(!WCOREDUMP(wstatus));
+        assert(pid == s->qemu_pid);
+        /*
+         * We expect qemu to exit with status 0; anything else is
+         * fishy and should be logged.  Abort except when death by
+         * signal is not accompanied by a coredump (as that's the only
+         * time it was likely that the user is trying to kill the
+         * testsuite early).
+         */
+        if (wstatus) {
+            die = true;
+            if (WIFEXITED(wstatus)) {
+                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
+                        "process but encountered exit status %d\n",
+                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
+            } else if (WIFSIGNALED(wstatus)) {
+                int sig = WTERMSIG(wstatus);
+                const char *signame = strsignal(sig) ?: "unknown ???";
+
+                if (!WCOREDUMP(wstatus)) {
+                    die = false;
+                    fprintf(stderr, "%s:%d: kill_qemu() ignoring QEMU death "
+                            "by signal %d (%s)\n",
+                            __FILE__, __LINE__, sig, signame);
+                } else {
+                    fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
+                            "with core dump from signal %d (%s)\n",
+                            __FILE__, __LINE__, sig, signame);
+                }
+            }
+        }
+        if (die) {
+            abort();
         }
     }
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu()
  2018-07-30 22:08 [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu() Eric Blake
@ 2018-07-31 10:28 ` Richard Henderson
  2018-08-01  7:59 ` Thomas Huth
  2018-08-03 15:51 ` Markus Armbruster
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2018-07-31 10:28 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: thuth, armbru, mst, peter.maydell, alex.bennee, f4bug

On 07/30/2018 06:08 PM, Eric Blake wrote:
> In kill_qemu() we have an assert that checks that the QEMU process
> didn't dump core:
>             assert(!WCOREDUMP(wstatus));
> 
> Unfortunately the WCOREDUMP macro here means the resulting message
> is not very easy to comprehend on at least some systems:
> 
> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.
> 
> and it doesn't identify what signal the process took.
> 
> Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
> happening if we didn't install signal handlers, it's still better
> to always be robust), and also want to log unexpected non-zero status
> that was not accompanied by a core dump.
> 
> Instead of using a raw assert, print the information in an
> easier to understand way:
> 
> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault)
> Aborted (core dumped)
> 
> (Of course, the really useful information would be why the QEMU
> process dumped core in the first place, but we don't have that
> by the time the test program has picked up the exit status.)
> 
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: use TFR() instead of open-coding the retry loop [Thomas]
> ---
>  tests/libqtest.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu()
  2018-07-30 22:08 [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu() Eric Blake
  2018-07-31 10:28 ` Richard Henderson
@ 2018-08-01  7:59 ` Thomas Huth
  2018-08-03 15:51 ` Markus Armbruster
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Huth @ 2018-08-01  7:59 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: armbru, mst, peter.maydell, alex.bennee, f4bug, rth

On 07/31/2018 12:08 AM, Eric Blake wrote:
> In kill_qemu() we have an assert that checks that the QEMU process
> didn't dump core:
>             assert(!WCOREDUMP(wstatus));
> 
> Unfortunately the WCOREDUMP macro here means the resulting message
> is not very easy to comprehend on at least some systems:
> 
> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.
> 
> and it doesn't identify what signal the process took.
> 
> Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
> happening if we didn't install signal handlers, it's still better
> to always be robust), and also want to log unexpected non-zero status
> that was not accompanied by a core dump.
> 
> Instead of using a raw assert, print the information in an
> easier to understand way:
> 
> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault)
> Aborted (core dumped)
> 
> (Of course, the really useful information would be why the QEMU
> process dumped core in the first place, but we don't have that
> by the time the test program has picked up the exit status.)
> 
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu()
  2018-07-30 22:08 [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu() Eric Blake
  2018-07-31 10:28 ` Richard Henderson
  2018-08-01  7:59 ` Thomas Huth
@ 2018-08-03 15:51 ` Markus Armbruster
  2018-08-03 17:11   ` Eric Blake
  2 siblings, 1 reply; 5+ messages in thread
From: Markus Armbruster @ 2018-08-03 15:51 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, peter.maydell, thuth, mst, f4bug, armbru, alex.bennee, rth

Eric Blake <eblake@redhat.com> writes:

> In kill_qemu() we have an assert that checks that the QEMU process
> didn't dump core:
>             assert(!WCOREDUMP(wstatus));
>
> Unfortunately the WCOREDUMP macro here means the resulting message
> is not very easy to comprehend on at least some systems:
>
> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.
>
> and it doesn't identify what signal the process took.
>
> Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
> happening if we didn't install signal handlers, it's still better
> to always be robust), and also want to log unexpected non-zero status
> that was not accompanied by a core dump.
>
> Instead of using a raw assert, print the information in an
> easier to understand way:
>
> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault)
> Aborted (core dumped)
>
> (Of course, the really useful information would be why the QEMU
> process dumped core in the first place, but we don't have that
> by the time the test program has picked up the exit status.)
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> v3: use TFR() instead of open-coding the retry loop [Thomas]
> ---
>  tests/libqtest.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 098af6aec44..3aa3e4c2a46 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -105,12 +105,43 @@ static void kill_qemu(QTestState *s)
>      if (s->qemu_pid != -1) {
>          int wstatus = 0;
>          pid_t pid;
> +        bool die = false;
>
>          kill(s->qemu_pid, SIGTERM);
> -        pid = waitpid(s->qemu_pid, &wstatus, 0);
> +        TFR(pid = waitpid(s->qemu_pid, &wstatus, 0));
>
> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> -            assert(!WCOREDUMP(wstatus));
> +        assert(pid == s->qemu_pid);
> +        /*
> +         * We expect qemu to exit with status 0; anything else is
> +         * fishy and should be logged.  Abort except when death by
> +         * signal is not accompanied by a coredump (as that's the only
> +         * time it was likely that the user is trying to kill the
> +         * testsuite early).
> +         */
> +        if (wstatus) {
> +            die = true;
> +            if (WIFEXITED(wstatus)) {
> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
> +                        "process but encountered exit status %d\n",
> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
> +            } else if (WIFSIGNALED(wstatus)) {
> +                int sig = WTERMSIG(wstatus);
> +                const char *signame = strsignal(sig) ?: "unknown ???";
> +
> +                if (!WCOREDUMP(wstatus)) {
> +                    die = false;
> +                    fprintf(stderr, "%s:%d: kill_qemu() ignoring QEMU death "
> +                            "by signal %d (%s)\n",
> +                            __FILE__, __LINE__, sig, signame);
> +                } else {
> +                    fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
> +                            "with core dump from signal %d (%s)\n",
> +                            __FILE__, __LINE__, sig, signame);
> +                }
> +            }
> +        }
> +        if (die) {
> +            abort();
>          }
>      }
>  }

In review of v2, we found that WCOREDUMP() depends on the user's
environment.  I doubt we should use it this way.

Why is suppressing the abort() when "the user is trying to kill the
testsuite early" important?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu()
  2018-08-03 15:51 ` Markus Armbruster
@ 2018-08-03 17:11   ` Eric Blake
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Blake @ 2018-08-03 17:11 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, peter.maydell, thuth, mst, f4bug, alex.bennee, rth

On 08/03/2018 10:51 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> In kill_qemu() we have an assert that checks that the QEMU process
>> didn't dump core:
>>              assert(!WCOREDUMP(wstatus));
>>
>> Unfortunately the WCOREDUMP macro here means the resulting message
>> is not very easy to comprehend on at least some systems:
>>
>> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.
>>
>> and it doesn't identify what signal the process took.
>>
>> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>> -            assert(!WCOREDUMP(wstatus));
>> +        assert(pid == s->qemu_pid);
>> +        /*
>> +         * We expect qemu to exit with status 0; anything else is
>> +         * fishy and should be logged.  Abort except when death by
>> +         * signal is not accompanied by a coredump (as that's the only
>> +         * time it was likely that the user is trying to kill the
>> +         * testsuite early).
>> +         */
>> +        if (wstatus) {
>> +            die = true;
>> +            if (WIFEXITED(wstatus)) {
>> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
>> +                        "process but encountered exit status %d\n",
>> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
>> +            } else if (WIFSIGNALED(wstatus)) {
>> +                int sig = WTERMSIG(wstatus);

> 
> In review of v2, we found that WCOREDUMP() depends on the user's
> environment.  I doubt we should use it this way.
> 
> Why is suppressing the abort() when "the user is trying to kill the
> testsuite early" important?
> 

Only because the old code was special-casing based on whether WCOREDUMP. 
I'm not particularly attached to it, and am just fine submitting a v4 
that just quits on ALL death-by-signal, rather than caring whether a 
core dump was present.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-03 17:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-30 22:08 [Qemu-devel] [PATCH v3 for-3.0] tests/libqtest: Improve kill_qemu() Eric Blake
2018-07-31 10:28 ` Richard Henderson
2018-08-01  7:59 ` Thomas Huth
2018-08-03 15:51 ` Markus Armbruster
2018-08-03 17:11   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.