All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185
@ 2018-05-08 13:54 Stefan Hajnoczi
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2018-05-08 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Jeff Cody, Stefan Hajnoczi

v2:
 * Rebase onto Jeff's block tree to get mirror ratelimit fix [Jeff, Vladimir]
 * Update comments in 185 since drain no longer causes a spurious iteration

The 185 qemu-iotests test case was in a bad state for the QEMU 2.12 release.
We fudged the expected test output to make it pass, except for
non-deterministic behavior.

These patches get us back to pre-QEMU 2.12.  Notably the offsets reported in
block job events now correspond to smaller buffer sizes because the job doesn't
iterate during drain.

It's worth mentioning that the test case is still non-deterministic.  For more
on that, see the first patch.

Stefan Hajnoczi (2):
  qemu-iotests: reduce chance of races in 185
  blockjob: do not cancel timer in resume

 blockjob.c                 | 22 +++++++++++++++-------
 tests/qemu-iotests/185     | 17 +++++++++++++----
 tests/qemu-iotests/185.out | 12 +++++-------
 3 files changed, 33 insertions(+), 18 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
  2018-05-08 13:54 [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Stefan Hajnoczi
@ 2018-05-08 13:54 ` Stefan Hajnoczi
  2018-05-08 14:26   ` Eric Blake
  2018-05-10 14:01   ` Vladimir Sementsov-Ogievskiy
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume Stefan Hajnoczi
  2018-05-14 11:40 ` [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Jeff Cody
  2 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2018-05-08 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Jeff Cody, Stefan Hajnoczi,
	Vladimir Sementsov-Ogievskiy

Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
185") identified a race condition in a sub-test.

Similar issues also affect the other sub-tests.  If disk I/O completes
quickly, it races with the QMP 'quit' command.  This causes spurious
test failures because QMP events are emitted in an unpredictable order.

This test relies on QEMU internals and there is no QMP API for getting
deterministic behavior needed to make this test 100% reliable.  At the
same time, the test is useful and it would be a shame to remove it.

Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
appears to reduce spurious failures in practice.

Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/185 | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 298d88d04e..975404c99d 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -118,6 +118,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -137,6 +140,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -183,6 +189,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -201,6 +210,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume
  2018-05-08 13:54 [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Stefan Hajnoczi
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
@ 2018-05-08 13:54 ` Stefan Hajnoczi
  2018-05-08 14:28   ` Eric Blake
  2018-05-14  8:35   ` QingFeng Hao
  2018-05-14 11:40 ` [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Jeff Cody
  2 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2018-05-08 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Jeff Cody, Stefan Hajnoczi,
	QingFeng Hao

Currently the timer is cancelled and the block job is entered by
block_job_resume().  This behavior causes drain to run extra blockjob
iterations when the job was sleeping due to the ratelimit.

This patch leaves the job asleep when block_job_resume() is called.
Jobs can still be forcibly woken up using block_job_enter(), which is
used to cancel jobs.

After this patch drain no longer runs extra blockjob iterations.  This
is the expected behavior that qemu-iotests 185 used to rely on.  We
temporarily changed the 185 test output to make it pass for the QEMU
2.12 release but now it's time to address this issue.

Cc: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c                 | 22 +++++++++++++++-------
 tests/qemu-iotests/185     |  5 +----
 tests/qemu-iotests/185.out | 12 +++++-------
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 27f957e571..70643890be 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -209,6 +209,18 @@ static void block_job_txn_del_job(BlockJob *job)
     }
 }
 
+/* Assumes the block_job_mutex is held */
+static bool block_job_timer_pending(BlockJob *job)
+{
+    return timer_pending(&job->sleep_timer);
+}
+
+/* Assumes the block_job_mutex is held */
+static bool block_job_timer_not_pending(BlockJob *job)
+{
+    return !block_job_timer_pending(job);
+}
+
 static void block_job_pause(BlockJob *job)
 {
     job->pause_count++;
@@ -221,7 +233,9 @@ static void block_job_resume(BlockJob *job)
     if (job->pause_count) {
         return;
     }
-    block_job_enter(job);
+
+    /* kick only if no timer is pending */
+    block_job_enter_cond(job, block_job_timer_not_pending);
 }
 
 void block_job_ref(BlockJob *job)
@@ -651,12 +665,6 @@ static void block_job_completed_txn_success(BlockJob *job)
     }
 }
 
-/* Assumes the block_job_mutex is held */
-static bool block_job_timer_pending(BlockJob *job)
-{
-    return timer_pending(&job->sleep_timer);
-}
-
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
     Error *local_err = NULL;
diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 975404c99d..9a2d317414 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -101,14 +101,11 @@ echo
 # command to be received (after receiving the command, the rest runs
 # synchronously, so jobs can arbitrarily continue or complete).
 #
-# Jobs present while QEMU is terminating iterate once more due to
-# bdrv_drain_all().
-#
 # The buffer size for commit and streaming is 512k (waiting for 8 seconds after
 # the first request), for active commit and mirror it's large enough to cover
 # the full 4M, and for backup it's the qcow2 cluster size, which we know is
 # 64k. As all of these are at least as large as the speed, we are sure that the
-# offset advances exactly twice before qemu exits.
+# offset advances exactly once before qemu exits.
 
 _send_qemu_cmd $h \
     "{ 'execute': 'block-commit',
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 992162f418..57eaf8d699 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -20,7 +20,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 1048576, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "commit"}}
 
 === Start active commit job and exit qemu ===
 
@@ -28,8 +28,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
 
 === Start mirror job and exit qemu ===
 
@@ -38,8 +37,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
 
 === Start backup job and exit qemu ===
 
@@ -48,7 +46,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 131072, "speed": 65536, "type": "backup"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 65536, "speed": 65536, "type": "backup"}}
 
 === Start streaming job and exit qemu ===
 
@@ -56,6 +54,6 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 1048576, "speed": 65536, "type": "stream"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "stream"}}
 No errors were found on the image.
 *** done
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
@ 2018-05-08 14:26   ` Eric Blake
  2018-05-10 10:05     ` Stefan Hajnoczi
  2018-05-10 14:01   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Blake @ 2018-05-08 14:26 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Jeff Cody,
	Max Reitz

On 05/08/2018 08:54 AM, Stefan Hajnoczi wrote:
> Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> 185") identified a race condition in a sub-test.
> 
> Similar issues also affect the other sub-tests.  If disk I/O completes
> quickly, it races with the QMP 'quit' command.  This causes spurious
> test failures because QMP events are emitted in an unpredictable order.
> 
> This test relies on QEMU internals and there is no QMP API for getting
> deterministic behavior needed to make this test 100% reliable.  At the
> same time, the test is useful and it would be a shame to remove it.
> 
> Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> appears to reduce spurious failures in practice.
> 
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tests/qemu-iotests/185 | 12 ++++++++++++
>   1 file changed, 12 insertions(+)

I'm not opposed to this patch, but is there any way to write the test to 
take both events in either order, without logging the events as they 
arrive, but instead summarizing in a deterministic order which events 
were received after the fact?  That way, no matter which way the race is 
won, we merely log that we got two expected events, and could avoid the 
extra sleep.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume Stefan Hajnoczi
@ 2018-05-08 14:28   ` Eric Blake
  2018-05-14  8:35   ` QingFeng Hao
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Blake @ 2018-05-08 14:28 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jeff Cody, Max Reitz, QingFeng Hao

On 05/08/2018 08:54 AM, Stefan Hajnoczi wrote:
> Currently the timer is cancelled and the block job is entered by
> block_job_resume().  This behavior causes drain to run extra blockjob
> iterations when the job was sleeping due to the ratelimit.
> 
> This patch leaves the job asleep when block_job_resume() is called.
> Jobs can still be forcibly woken up using block_job_enter(), which is
> used to cancel jobs.
> 
> After this patch drain no longer runs extra blockjob iterations.  This
> is the expected behavior that qemu-iotests 185 used to rely on.  We
> temporarily changed the 185 test output to make it pass for the QEMU
> 2.12 release but now it's time to address this issue.
> 
> Cc: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockjob.c                 | 22 +++++++++++++++-------
>   tests/qemu-iotests/185     |  5 +----
>   tests/qemu-iotests/185.out | 12 +++++-------
>   3 files changed, 21 insertions(+), 18 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
  2018-05-08 14:26   ` Eric Blake
@ 2018-05-10 10:05     ` Stefan Hajnoczi
  2018-05-10 13:24       ` Eric Blake
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2018-05-10 10:05 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block,
	Jeff Cody, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]

On Tue, May 08, 2018 at 09:26:03AM -0500, Eric Blake wrote:
> On 05/08/2018 08:54 AM, Stefan Hajnoczi wrote:
> > Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> > 185") identified a race condition in a sub-test.
> > 
> > Similar issues also affect the other sub-tests.  If disk I/O completes
> > quickly, it races with the QMP 'quit' command.  This causes spurious
> > test failures because QMP events are emitted in an unpredictable order.
> > 
> > This test relies on QEMU internals and there is no QMP API for getting
> > deterministic behavior needed to make this test 100% reliable.  At the
> > same time, the test is useful and it would be a shame to remove it.
> > 
> > Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> > appears to reduce spurious failures in practice.
> > 
> > Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   tests/qemu-iotests/185 | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> 
> I'm not opposed to this patch, but is there any way to write the test to
> take both events in either order, without logging the events as they arrive,
> but instead summarizing in a deterministic order which events were received
> after the fact?  That way, no matter which way the race is won, we merely
> log that we got two expected events, and could avoid the extra sleep.

I don't think there is a practical way of doing that without big changes
to the test.  It could be rewritten in Python to make filtering the QMP
events easier.

Hiding the race doesn't solve the deeper problem though: the test case
doesn't exercise the same code path each time.  The test should really
cover all cancellation points in the block job lifecycle instead of just
one at random.  If we solve this problem then we don't need to filter
the QMP event sequence.

Maybe it can be done with blkdebug.  If not then maybe a blockjobdbg
interface is necessary to perform deterministic tests (eliminating the
need for the ratelimiting trick used by this test!).

Please share ideas, but I think this is a long-term item that shouldn't
block this series.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
  2018-05-10 10:05     ` Stefan Hajnoczi
@ 2018-05-10 13:24       ` Eric Blake
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Blake @ 2018-05-10 13:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block,
	Jeff Cody, Max Reitz

On 05/10/2018 05:05 AM, Stefan Hajnoczi wrote:

>>> Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
>>> appears to reduce spurious failures in practice.
>>>
>>> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>    tests/qemu-iotests/185 | 12 ++++++++++++
>>>    1 file changed, 12 insertions(+)
>>
>> I'm not opposed to this patch, but is there any way to write the test to
>> take both events in either order, without logging the events as they arrive,
>> but instead summarizing in a deterministic order which events were received
>> after the fact?  That way, no matter which way the race is won, we merely
>> log that we got two expected events, and could avoid the extra sleep.
> 
> I don't think there is a practical way of doing that without big changes
> to the test.  It could be rewritten in Python to make filtering the QMP
> events easier.
> 
> Hiding the race doesn't solve the deeper problem though: the test case
> doesn't exercise the same code path each time.  The test should really
> cover all cancellation points in the block job lifecycle instead of just
> one at random.  If we solve this problem then we don't need to filter
> the QMP event sequence.
> 
> Maybe it can be done with blkdebug.  If not then maybe a blockjobdbg
> interface is necessary to perform deterministic tests (eliminating the
> need for the ratelimiting trick used by this test!).

So trying to restate your question - can blkdebug be used to pause I/O, 
or does it just cause an error?  If the rate limiting is in effect, then 
we expect that the job will only write to the first half of a 
destination, so a blkdebug injected error for writes to the second half 
would either not trigger (the normal cancel won the race) or does 
trigger (the job advanced before the normal cancel, but blkdebug's 
injected error also serves as a means of cancelling the job, so while 
we'd have to filter things, we at least have a deterministic way of 
ending the job before it runs to completion). Except that I'm writing 
that without even re-reading the test in question to see how it would 
all fit in.  It may or may not be worth pursuing.

> 
> Please share ideas, but I think this is a long-term item that shouldn't
> block this series.

I agree that any further improvements don't need to hold up this patch 
from making things at least more reliable, even if not perfect.  So:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
  2018-05-08 14:26   ` Eric Blake
@ 2018-05-10 14:01   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 10+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-10 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Jeff Cody

08.05.2018 16:54, Stefan Hajnoczi wrote:
> Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> 185") identified a race condition in a sub-test.
>
> Similar issues also affect the other sub-tests.  If disk I/O completes
> quickly, it races with the QMP 'quit' command.  This causes spurious
> test failures because QMP events are emitted in an unpredictable order.

Hmm, can you give an example? I'm looking at 8565c3ab537. and
it's not similar.

If disk I/O completes quickly, it will have large final offset, and test 
will
fail. And pause of 0.5 will not help.

My case was that quit is handled before the first iteration of mirror.

>
> This test relies on QEMU internals and there is no QMP API for getting
> deterministic behavior needed to make this test 100% reliable.  At the
> same time, the test is useful and it would be a shame to remove it.
>
> Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> appears to reduce spurious failures in practice.
>
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tests/qemu-iotests/185 | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
> index 298d88d04e..975404c99d 100755
> --- a/tests/qemu-iotests/185
> +++ b/tests/qemu-iotests/185
> @@ -118,6 +118,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -137,6 +140,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -183,6 +189,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -201,6 +210,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume Stefan Hajnoczi
  2018-05-08 14:28   ` Eric Blake
@ 2018-05-14  8:35   ` QingFeng Hao
  1 sibling, 0 replies; 10+ messages in thread
From: QingFeng Hao @ 2018-05-14  8:35 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Jeff Cody



在 2018/5/8 21:54, Stefan Hajnoczi 写道:
> Currently the timer is cancelled and the block job is entered by
> block_job_resume().  This behavior causes drain to run extra blockjob
> iterations when the job was sleeping due to the ratelimit.
> 
> This patch leaves the job asleep when block_job_resume() is called.
> Jobs can still be forcibly woken up using block_job_enter(), which is
> used to cancel jobs.
> 
> After this patch drain no longer runs extra blockjob iterations.  This
> is the expected behavior that qemu-iotests 185 used to rely on.  We
> temporarily changed the 185 test output to make it pass for the QEMU
> 2.12 release but now it's time to address this issue.
> 
Verified on s390x. Thx
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> Cc: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  blockjob.c                 | 22 +++++++++++++++-------
>  tests/qemu-iotests/185     |  5 +----
>  tests/qemu-iotests/185.out | 12 +++++-------
>  3 files changed, 21 insertions(+), 18 deletions(-)
> 

[...]
>  *** done
> 

-- 
Regards
QingFeng Hao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185
  2018-05-08 13:54 [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Stefan Hajnoczi
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
  2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume Stefan Hajnoczi
@ 2018-05-14 11:40 ` Jeff Cody
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff Cody @ 2018-05-14 11:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, Kevin Wolf, Max Reitz

On Tue, May 08, 2018 at 02:54:34PM +0100, Stefan Hajnoczi wrote:
> v2:
>  * Rebase onto Jeff's block tree to get mirror ratelimit fix [Jeff, Vladimir]
>  * Update comments in 185 since drain no longer causes a spurious iteration
> 
> The 185 qemu-iotests test case was in a bad state for the QEMU 2.12 release.
> We fudged the expected test output to make it pass, except for
> non-deterministic behavior.
> 
> These patches get us back to pre-QEMU 2.12.  Notably the offsets reported in
> block job events now correspond to smaller buffer sizes because the job doesn't
> iterate during drain.
> 
> It's worth mentioning that the test case is still non-deterministic.  For more
> on that, see the first patch.
> 
> Stefan Hajnoczi (2):
>   qemu-iotests: reduce chance of races in 185
>   blockjob: do not cancel timer in resume
> 
>  blockjob.c                 | 22 +++++++++++++++-------
>  tests/qemu-iotests/185     | 17 +++++++++++++----
>  tests/qemu-iotests/185.out | 12 +++++-------
>  3 files changed, 33 insertions(+), 18 deletions(-)
> 
> -- 
> 2.14.3
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc block

-Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-14 11:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08 13:54 [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Stefan Hajnoczi
2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 1/2] qemu-iotests: reduce chance of races in 185 Stefan Hajnoczi
2018-05-08 14:26   ` Eric Blake
2018-05-10 10:05     ` Stefan Hajnoczi
2018-05-10 13:24       ` Eric Blake
2018-05-10 14:01   ` Vladimir Sementsov-Ogievskiy
2018-05-08 13:54 ` [Qemu-devel] [PATCH v2 2/2] blockjob: do not cancel timer in resume Stefan Hajnoczi
2018-05-08 14:28   ` Eric Blake
2018-05-14  8:35   ` QingFeng Hao
2018-05-14 11:40 ` [Qemu-devel] [PATCH v2 0/2] qemu-iotests: post-QEMU 2.12 fixes for 185 Jeff Cody

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.