All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] iotests: Fix intermittent 030 hang
@ 2020-03-13  8:36 Kevin Wolf
  2020-03-13  8:36 ` [PATCH 1/3] iotests.py: Enable faulthandler Kevin Wolf
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Kevin Wolf @ 2020-03-13  8:36 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

Peter ran into a 030 hang while testing a pull request. This turned out
to be two bugs in the test suite at once: First was the test failing
because a timeout was apparently too short, second was that the timeout
would actually cause the test to hang instead of failing. This series
should fix both.

Kevin Wolf (3):
  iotests.py: Enable faulthandler
  python/qemu: Kill QEMU process if 'quit' doesn't work
  iotests: Increase pause_wait() timeout

 python/qemu/machine.py        | 1 +
 tests/qemu-iotests/iotests.py | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] iotests.py: Enable faulthandler
  2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
@ 2020-03-13  8:36 ` Kevin Wolf
  2020-03-16 19:56   ` John Snow
  2020-03-13  8:36 ` [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work Kevin Wolf
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2020-03-13  8:36 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

With this, you can send SIGABRT to a hanging test case and you'll get a
Python stack trace so you know where it was hanging.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/iotests.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 23043baa26..b859c303a2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -30,12 +30,15 @@ import logging
 import atexit
 import io
 from collections import OrderedDict
+import faulthandler
 
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qtest
 
 assert sys.version_info >= (3,6)
 
+faulthandler.enable()
+
 # This will not work if arguments contain spaces but is necessary if we
 # want to support the override options that ./check supports.
 qemu_img_args = [os.environ.get('QEMU_IMG_PROG', 'qemu-img')]
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work
  2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
  2020-03-13  8:36 ` [PATCH 1/3] iotests.py: Enable faulthandler Kevin Wolf
@ 2020-03-13  8:36 ` Kevin Wolf
  2020-03-16 19:57   ` John Snow
  2020-03-20 10:15   ` Philippe Mathieu-Daudé
  2020-03-13  8:36 ` [PATCH 3/3] iotests: Increase pause_wait() timeout Kevin Wolf
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Kevin Wolf @ 2020-03-13  8:36 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

With a QEMU bug, it can happen that the QEMU process doesn't react to a
'quit' QMP command. If we got an exception during previous QMP
communication (e.g. iotests Timeout expiring), we could also be in an
inconsistent state where after sending 'quit' we immediately read an old
response and close the socket even though the 'quit' command wasn't
processed yet. Both cases would lead to a hanging test.

Fix this by waiting for the QEMU process to exit after sending 'quit'
with a timeout, and if it doesn't happen within three seconds, send
SIGKILL.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 python/qemu/machine.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 183d8f3d38..c837ee8723 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -358,6 +358,7 @@ class QEMUMachine(object):
                     if not has_quit:
                         self._qmp.cmd('quit')
                     self._qmp.close()
+                    self._popen.wait(timeout=3)
                 except:
                     self._popen.kill()
             self._popen.wait()
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] iotests: Increase pause_wait() timeout
  2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
  2020-03-13  8:36 ` [PATCH 1/3] iotests.py: Enable faulthandler Kevin Wolf
  2020-03-13  8:36 ` [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work Kevin Wolf
@ 2020-03-13  8:36 ` Kevin Wolf
  2020-03-16 19:58   ` John Snow
  2020-03-20 10:21   ` Philippe Mathieu-Daudé
  2020-03-20 10:12 ` [PATCH 0/3] iotests: Fix intermittent 030 hang Philippe Mathieu-Daudé
  2020-03-22 11:16 ` Peter Maydell
  4 siblings, 2 replies; 12+ messages in thread
From: Kevin Wolf @ 2020-03-13  8:36 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

Waiting for only 1 second proved to be too short on a loaded system,
resulting in false positives when testing pull requests. Increase the
timeout a bit to make this less likely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/iotests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b859c303a2..7bc4934cd2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -925,7 +925,7 @@ class QMPTestCase(unittest.TestCase):
         self.assert_qmp(event, 'data/type', 'mirror')
 
     def pause_wait(self, job_id='job0'):
-        with Timeout(1, "Timeout waiting for job to pause"):
+        with Timeout(3, "Timeout waiting for job to pause"):
             while True:
                 result = self.vm.qmp('query-block-jobs')
                 found = False
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] iotests.py: Enable faulthandler
  2020-03-13  8:36 ` [PATCH 1/3] iotests.py: Enable faulthandler Kevin Wolf
@ 2020-03-16 19:56   ` John Snow
  0 siblings, 0 replies; 12+ messages in thread
From: John Snow @ 2020-03-16 19:56 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: qemu-devel, mreitz



On 3/13/20 4:36 AM, Kevin Wolf wrote:
> With this, you can send SIGABRT to a hanging test case and you'll get a
> Python stack trace so you know where it was hanging.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work
  2020-03-13  8:36 ` [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work Kevin Wolf
@ 2020-03-16 19:57   ` John Snow
  2020-03-20 10:15   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 12+ messages in thread
From: John Snow @ 2020-03-16 19:57 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: qemu-devel, mreitz



On 3/13/20 4:36 AM, Kevin Wolf wrote:
> With a QEMU bug, it can happen that the QEMU process doesn't react to a
> 'quit' QMP command. If we got an exception during previous QMP
> communication (e.g. iotests Timeout expiring), we could also be in an
> inconsistent state where after sending 'quit' we immediately read an old
> response and close the socket even though the 'quit' command wasn't
> processed yet. Both cases would lead to a hanging test.
> 
> Fix this by waiting for the QEMU process to exit after sending 'quit'
> with a timeout, and if it doesn't happen within three seconds, send
> SIGKILL.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

> ---
>  python/qemu/machine.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 183d8f3d38..c837ee8723 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -358,6 +358,7 @@ class QEMUMachine(object):
>                      if not has_quit:
>                          self._qmp.cmd('quit')
>                      self._qmp.close()
> +                    self._popen.wait(timeout=3)
>                  except:
>                      self._popen.kill()
>              self._popen.wait()
> 

-- 
—js



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iotests: Increase pause_wait() timeout
  2020-03-13  8:36 ` [PATCH 3/3] iotests: Increase pause_wait() timeout Kevin Wolf
@ 2020-03-16 19:58   ` John Snow
  2020-03-20 10:21   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 12+ messages in thread
From: John Snow @ 2020-03-16 19:58 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: qemu-devel, mreitz



On 3/13/20 4:36 AM, Kevin Wolf wrote:
> Waiting for only 1 second proved to be too short on a loaded system,
> resulting in false positives when testing pull requests. Increase the
> timeout a bit to make this less likely.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

> ---
>  tests/qemu-iotests/iotests.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index b859c303a2..7bc4934cd2 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -925,7 +925,7 @@ class QMPTestCase(unittest.TestCase):
>          self.assert_qmp(event, 'data/type', 'mirror')
>  
>      def pause_wait(self, job_id='job0'):
> -        with Timeout(1, "Timeout waiting for job to pause"):
> +        with Timeout(3, "Timeout waiting for job to pause"):
>              while True:
>                  result = self.vm.qmp('query-block-jobs')
>                  found = False
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] iotests: Fix intermittent 030 hang
  2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
                   ` (2 preceding siblings ...)
  2020-03-13  8:36 ` [PATCH 3/3] iotests: Increase pause_wait() timeout Kevin Wolf
@ 2020-03-20 10:12 ` Philippe Mathieu-Daudé
  2020-03-22 11:16 ` Peter Maydell
  4 siblings, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-03-20 10:12 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: jsnow, qemu-devel, mreitz

On 3/13/20 9:36 AM, Kevin Wolf wrote:
> Peter ran into a 030 hang while testing a pull request. This turned out
> to be two bugs in the test suite at once: First was the test failing
> because a timeout was apparently too short, second was that the timeout
> would actually cause the test to hang instead of failing. This series
> should fix both.
> 
> Kevin Wolf (3):
>    iotests.py: Enable faulthandler
>    python/qemu: Kill QEMU process if 'quit' doesn't work
>    iotests: Increase pause_wait() timeout
> 
>   python/qemu/machine.py        | 1 +
>   tests/qemu-iotests/iotests.py | 5 ++++-
>   2 files changed, 5 insertions(+), 1 deletion(-)
> 

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work
  2020-03-13  8:36 ` [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work Kevin Wolf
  2020-03-16 19:57   ` John Snow
@ 2020-03-20 10:15   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-03-20 10:15 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: jsnow, qemu-devel, mreitz

On 3/13/20 9:36 AM, Kevin Wolf wrote:
> With a QEMU bug, it can happen that the QEMU process doesn't react to a
> 'quit' QMP command. If we got an exception during previous QMP
> communication (e.g. iotests Timeout expiring), we could also be in an
> inconsistent state where after sending 'quit' we immediately read an old
> response and close the socket even though the 'quit' command wasn't
> processed yet. Both cases would lead to a hanging test.
> 
> Fix this by waiting for the QEMU process to exit after sending 'quit'
> with a timeout, and if it doesn't happen within three seconds, send
> SIGKILL.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   python/qemu/machine.py | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 183d8f3d38..c837ee8723 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -358,6 +358,7 @@ class QEMUMachine(object):
>                       if not has_quit:
>                           self._qmp.cmd('quit')
>                       self._qmp.close()
> +                    self._popen.wait(timeout=3)
>                   except:
>                       self._popen.kill()
>               self._popen.wait()
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iotests: Increase pause_wait() timeout
  2020-03-13  8:36 ` [PATCH 3/3] iotests: Increase pause_wait() timeout Kevin Wolf
  2020-03-16 19:58   ` John Snow
@ 2020-03-20 10:21   ` Philippe Mathieu-Daudé
  2020-03-23 18:30     ` John Snow
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-03-20 10:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: jsnow, qemu-devel, mreitz

On 3/13/20 9:36 AM, Kevin Wolf wrote:
> Waiting for only 1 second proved to be too short on a loaded system,
> resulting in false positives when testing pull requests. Increase the
> timeout a bit to make this less likely.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index b859c303a2..7bc4934cd2 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -925,7 +925,7 @@ class QMPTestCase(unittest.TestCase):
>           self.assert_qmp(event, 'data/type', 'mirror')
>   
>       def pause_wait(self, job_id='job0'):
> -        with Timeout(1, "Timeout waiting for job to pause"):
> +        with Timeout(3, "Timeout waiting for job to pause"):
>               while True:
>                   result = self.vm.qmp('query-block-jobs')
>                   found = False
> 

I wonder if this might be more accurate:

   load_timeout = math.ceil(os.getloadavg()[0])
   with Timeout(1 + load_timeout, "Timeout waiting for job to pause"):
     ...

Anyhow:
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] iotests: Fix intermittent 030 hang
  2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
                   ` (3 preceding siblings ...)
  2020-03-20 10:12 ` [PATCH 0/3] iotests: Fix intermittent 030 hang Philippe Mathieu-Daudé
@ 2020-03-22 11:16 ` Peter Maydell
  4 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2020-03-22 11:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: John Snow, QEMU Developers, Qemu-block, Max Reitz

On Fri, 13 Mar 2020 at 08:37, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Peter ran into a 030 hang while testing a pull request. This turned out
> to be two bugs in the test suite at once: First was the test failing
> because a timeout was apparently too short, second was that the timeout
> would actually cause the test to hang instead of failing. This series
> should fix both.
>
> Kevin Wolf (3):
>   iotests.py: Enable faulthandler
>   python/qemu: Kill QEMU process if 'quit' doesn't work
>   iotests: Increase pause_wait() timeout
>
>  python/qemu/machine.py        | 1 +
>  tests/qemu-iotests/iotests.py | 5 ++++-
>  2 files changed, 5 insertions(+), 1 deletion(-)

Applied to master since I've been seeing this hang off-and-on.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iotests: Increase pause_wait() timeout
  2020-03-20 10:21   ` Philippe Mathieu-Daudé
@ 2020-03-23 18:30     ` John Snow
  0 siblings, 0 replies; 12+ messages in thread
From: John Snow @ 2020-03-23 18:30 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Kevin Wolf, qemu-block; +Cc: qemu-devel, mreitz



On 3/20/20 6:21 AM, Philippe Mathieu-Daudé wrote:
> On 3/13/20 9:36 AM, Kevin Wolf wrote:
>> Waiting for only 1 second proved to be too short on a loaded system,
>> resulting in false positives when testing pull requests. Increase the
>> timeout a bit to make this less likely.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/iotests.py
>> b/tests/qemu-iotests/iotests.py
>> index b859c303a2..7bc4934cd2 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -925,7 +925,7 @@ class QMPTestCase(unittest.TestCase):
>>           self.assert_qmp(event, 'data/type', 'mirror')
>>         def pause_wait(self, job_id='job0'):
>> -        with Timeout(1, "Timeout waiting for job to pause"):
>> +        with Timeout(3, "Timeout waiting for job to pause"):
>>               while True:
>>                   result = self.vm.qmp('query-block-jobs')
>>                   found = False
>>
> 
> I wonder if this might be more accurate:
> 
>   load_timeout = math.ceil(os.getloadavg()[0])
>   with Timeout(1 + load_timeout, "Timeout waiting for job to pause"):
>     ...
> 
> Anyhow:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> 

They're just tests ... and this is just a worst-case timeout. I don't
think we need to get too cute with it. This only affects cases where the
test (or the binary) is broken and we have to wait without getting a
response for 3 seconds before being able to declare that the test is
indeed broken.

Optimizing this waiting time is probably not helpful; as when the test
is passing we'll never see this delay.

--js



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-03-23 18:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13  8:36 [PATCH 0/3] iotests: Fix intermittent 030 hang Kevin Wolf
2020-03-13  8:36 ` [PATCH 1/3] iotests.py: Enable faulthandler Kevin Wolf
2020-03-16 19:56   ` John Snow
2020-03-13  8:36 ` [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work Kevin Wolf
2020-03-16 19:57   ` John Snow
2020-03-20 10:15   ` Philippe Mathieu-Daudé
2020-03-13  8:36 ` [PATCH 3/3] iotests: Increase pause_wait() timeout Kevin Wolf
2020-03-16 19:58   ` John Snow
2020-03-20 10:21   ` Philippe Mathieu-Daudé
2020-03-23 18:30     ` John Snow
2020-03-20 10:12 ` [PATCH 0/3] iotests: Fix intermittent 030 hang Philippe Mathieu-Daudé
2020-03-22 11:16 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.