All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Consistency of iotests 093 and 136
@ 2019-01-23 17:00 Max Reitz
  2019-01-24 10:11 ` Alberto Garcia
  0 siblings, 1 reply; 8+ messages in thread
From: Max Reitz @ 2019-01-23 17:00 UTC (permalink / raw)
  To: Qemu-block; +Cc: qemu-devel, Alberto Garcia

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

Hi,

093 and 136 seem really flaky to me.  I can reproduce that by running:

$ dd if=/dev/urandom of=/dev/null

in as many shells as I have CPU cores, and then run the tests:

$ while TEST_DIR=/tmp/t0 ./check -T -raw 93; do; done

or

$ while TEST_DIR=/tmp/t0 ./check -T -raw 136; do; done

which usually fail after one or two iterations.


The exact failures vary, but for 093 it's usually something that ends with:

[...]
    self.assertTrue(check_limit(params['iops'], rd_iops + wr_iops))
AssertionError: False is not true

Or:

[...]
    self.assertTrue(check_limit(params['iops_rd'], rd_iops))
AssertionError: False is not true

etc. -- so the 10 % error range doesn't seem to be enough, I'd say.  But
will just increasing it solve the problem?


And for 136 it's usually (always?):

[...]
  File "136", line 278, in do_test_stats
    self.check_values()
  File "136", line 204, in check_values
    self.assertLess(0, stats['idle_time_ns'])
AssertionError: 0 not less than 0


Any ideas on making these more reliable?


Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-23 17:00 [Qemu-devel] Consistency of iotests 093 and 136 Max Reitz
@ 2019-01-24 10:11 ` Alberto Garcia
  2019-01-24 14:34   ` Alberto Garcia
  0 siblings, 1 reply; 8+ messages in thread
From: Alberto Garcia @ 2019-01-24 10:11 UTC (permalink / raw)
  To: Max Reitz, Qemu-block; +Cc: qemu-devel

On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote:
> Hi,
>
> 093 and 136 seem really flaky to me.  I can reproduce that by running:

That's interesting, I can make 093 fail quite easily now (I haven't
tested the other one yet), but I don't think this happened earlier. I'll
try to figure out what's going on.

Berto

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-24 10:11 ` Alberto Garcia
@ 2019-01-24 14:34   ` Alberto Garcia
  2019-01-24 18:07     ` Eric Blake
  0 siblings, 1 reply; 8+ messages in thread
From: Alberto Garcia @ 2019-01-24 14:34 UTC (permalink / raw)
  To: Max Reitz, Qemu-block; +Cc: qemu-devel, Peter Xu

On Thu 24 Jan 2019 11:11:06 AM CET, Alberto Garcia wrote:
> On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote:
>> Hi,
>>
>> 093 and 136 seem really flaky to me.  I can reproduce that by running:
>
> That's interesting, I can make 093 fail quite easily now (I haven't
> tested the other one yet), but I don't think this happened
> earlier. I'll try to figure out what's going on.

I bisected this and it seems that 093 started to fail after this:

8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally

I'm not familiar with that option so I need to investigate.

Berto

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-24 14:34   ` Alberto Garcia
@ 2019-01-24 18:07     ` Eric Blake
  2019-01-28 15:18       ` Alberto Garcia
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Blake @ 2019-01-24 18:07 UTC (permalink / raw)
  To: Alberto Garcia, Max Reitz, Qemu-block
  Cc: qemu-devel, Peter Xu, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On 1/24/19 8:34 AM, Alberto Garcia wrote:
> On Thu 24 Jan 2019 11:11:06 AM CET, Alberto Garcia wrote:
>> On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote:
>>> Hi,
>>>
>>> 093 and 136 seem really flaky to me.  I can reproduce that by running:
>>
>> That's interesting, I can make 093 fail quite easily now (I haven't
>> tested the other one yet), but I don't think this happened
>> earlier. I'll try to figure out what's going on.
> 
> I bisected this and it seems that 093 started to fail after this:
> 
> 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally
> 
> I'm not familiar with that option so I need to investigate.

We've got several tests failing after making x-oob unconditional; here's
another thread:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html

Could it be that the test was using some sort of QMP command as an
attempt to synchronize state, but the OOB handling is now making it not
a reliable sync point?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-24 18:07     ` Eric Blake
@ 2019-01-28 15:18       ` Alberto Garcia
  2019-01-28 18:38         ` Markus Armbruster
  0 siblings, 1 reply; 8+ messages in thread
From: Alberto Garcia @ 2019-01-28 15:18 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, Qemu-block; +Cc: qemu-devel, Peter Xu, Markus Armbruster

On Thu 24 Jan 2019 07:07:47 PM CET, Eric Blake wrote:
>>>> 093 and 136 seem really flaky to me.  I can reproduce that by
>>>> running:
>>>
>>> That's interesting, I can make 093 fail quite easily now (I haven't
>>> tested the other one yet), but I don't think this happened
>>> earlier. I'll try to figure out what's going on.
>> 
>> I bisected this and it seems that 093 started to fail after this:
>> 
>> 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally
>> 
>> I'm not familiar with that option so I need to investigate.
>
> We've got several tests failing after making x-oob unconditional; here's
> another thread:
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html
>
> Could it be that the test was using some sort of QMP command as an
> attempt to synchronize state, but the OOB handling is now making it not
> a reliable sync point?

093 submits several I/O requests using aio_read and aio_write with
hmp_qemu_io(), then advances the clock using clock_step and finally
calls query-blockstats to see how much of the I/O has been completed
(it's an I/O throttling test).

The expectation is that by the time query-blockstats is called all
submitted I/O requests have been processed (up to the amount allowed by
the throttling limits).

Are the QMP (hmp_qemu_io, query-blockstats) and qtest (clock_step)
sockets maybe running in different threads?

Berto

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-28 15:18       ` Alberto Garcia
@ 2019-01-28 18:38         ` Markus Armbruster
  2019-01-29 10:03           ` Alberto Garcia
  0 siblings, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2019-01-28 18:38 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Eric Blake, Max Reitz, Qemu-block, qemu-devel, Peter Xu

Alberto Garcia <berto@igalia.com> writes:

> On Thu 24 Jan 2019 07:07:47 PM CET, Eric Blake wrote:
>>>>> 093 and 136 seem really flaky to me.  I can reproduce that by
>>>>> running:
>>>>
>>>> That's interesting, I can make 093 fail quite easily now (I haven't
>>>> tested the other one yet), but I don't think this happened
>>>> earlier. I'll try to figure out what's going on.
>>> 
>>> I bisected this and it seems that 093 started to fail after this:
>>> 
>>> 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally
>>> 
>>> I'm not familiar with that option so I need to investigate.
>>
>> We've got several tests failing after making x-oob unconditional; here's
>> another thread:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html
>>
>> Could it be that the test was using some sort of QMP command as an
>> attempt to synchronize state, but the OOB handling is now making it not
>> a reliable sync point?
>
> 093 submits several I/O requests using aio_read and aio_write with
> hmp_qemu_io(), then advances the clock using clock_step and finally
> calls query-blockstats to see how much of the I/O has been completed
> (it's an I/O throttling test).
>
> The expectation is that by the time query-blockstats is called all
> submitted I/O requests have been processed (up to the amount allowed by
> the throttling limits).

Assumptions like "when we see the reply to QMP command X, surely the
main loop has completed doing Y" are problematic.  When possible, rely
on something more direct, such as a query command that shows you whether
Y has been completed.

Mind, I'm not claiming the assumption you described is invalid.

> Are the QMP (hmp_qemu_io, query-blockstats) and qtest (clock_step)
> sockets maybe running in different threads?

First order approximation: the QMP *monitors* run in the monitor I/O
thread (as of commit 8258292e18c), but the QMP *commands* still run in
the main loop.  The QMP monitor suspends itself after reading a command
and sending it to the main loop.  The main loop resumes the monitor
after sending the reply.

Before this change, the QMP monitors also ran in the main loop, and
executed each command right after reading it.  The monitor suspend /
resume described above is designed to minimize observable differences in
behavior.

More exact version, may not be relevant to you now.

The QMP monitors run in the monitor I/O thread when the underlying
character can support that.  The character devices you typically want to
use with QMP, such as socket, all can.  Some character devices can't,
e.g. ringbuf, spice, braille, MUX.

Monitors offer capability "oob" when running in the I/O thread.  For
instance:

    $ qemu-system-x86_64 -nodefaults -S -display none -qmp-pretty stdio
    {
        "QMP": {
            "version": {
                "qemu": {
                    "micro": 50,
                    "minor": 1,
                    "major": 3
                },
                "package": "v3.1.0-1200-g9dd0d8111f"
            },
            "capabilities": [
-->             "oob"
            ]
        }
    }

QMP commands still run in the main loop.  The monitor reads commands and
sends them to the main loop.  The main loop executes them one after the
other, and sends replies.  The number of commands in flight is limited.

Unless the client accepted capability "oob", the limit is one.

Clients that accepted capability "oob" can execute oob-capable commands
out of band.  The monitor executes them right away, jumping the queue.
The only commands that can be executed out of band so far are
migrate-recover and migrate-pause.  See docs/interop/qmp-spec.txt for
more detailed information.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-28 18:38         ` Markus Armbruster
@ 2019-01-29 10:03           ` Alberto Garcia
  2019-01-29 12:11             ` Markus Armbruster
  0 siblings, 1 reply; 8+ messages in thread
From: Alberto Garcia @ 2019-01-29 10:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Eric Blake, Max Reitz, Qemu-block, qemu-devel, Peter Xu

On Mon 28 Jan 2019 07:38:08 PM CET, Markus Armbruster wrote:

>> 093 submits several I/O requests using aio_read and aio_write with
>> hmp_qemu_io(), then advances the clock using clock_step and finally
>> calls query-blockstats to see how much of the I/O has been completed
>> (it's an I/O throttling test).
>>
>> The expectation is that by the time query-blockstats is called all
>> submitted I/O requests have been processed (up to the amount allowed
>> by the throttling limits).
>
> Assumptions like "when we see the reply to QMP command X, surely the
> main loop has completed doing Y" are problematic.  When possible, rely
> on something more direct, such as a query command that shows you
> whether Y has been completed.

Right, but how to do that for aio_read / aio_write ?

Berto

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Consistency of iotests 093 and 136
  2019-01-29 10:03           ` Alberto Garcia
@ 2019-01-29 12:11             ` Markus Armbruster
  0 siblings, 0 replies; 8+ messages in thread
From: Markus Armbruster @ 2019-01-29 12:11 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Peter Xu, qemu-devel, Qemu-block, Max Reitz

Alberto Garcia <berto@igalia.com> writes:

> On Mon 28 Jan 2019 07:38:08 PM CET, Markus Armbruster wrote:
>
>>> 093 submits several I/O requests using aio_read and aio_write with
>>> hmp_qemu_io(), then advances the clock using clock_step and finally
>>> calls query-blockstats to see how much of the I/O has been completed
>>> (it's an I/O throttling test).
>>>
>>> The expectation is that by the time query-blockstats is called all
>>> submitted I/O requests have been processed (up to the amount allowed
>>> by the throttling limits).
>>
>> Assumptions like "when we see the reply to QMP command X, surely the
>> main loop has completed doing Y" are problematic.  When possible, rely
>> on something more direct, such as a query command that shows you
>> whether Y has been completed.
>
> Right, but how to do that for aio_read / aio_write ?

Fair question.

What exactly do you need to wait for?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-29 12:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 17:00 [Qemu-devel] Consistency of iotests 093 and 136 Max Reitz
2019-01-24 10:11 ` Alberto Garcia
2019-01-24 14:34   ` Alberto Garcia
2019-01-24 18:07     ` Eric Blake
2019-01-28 15:18       ` Alberto Garcia
2019-01-28 18:38         ` Markus Armbruster
2019-01-29 10:03           ` Alberto Garcia
2019-01-29 12:11             ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.