[Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default
@ 2018-06-20  7:32 Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 1/7] chardev: comment details for CLOSED event Peter Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

v5:
- more r-bs are collected
- make sure no patch contains extra s-o-bs by doing proper indents for
  diff quotes [Markus]
- misc commit message or comment changes [Markus]

v4:
- collect some r-bs
- remove one extra s-o-b of mine in one patch [Thomas, Markus]
- split the qmp response flush patch into two; apply changes to commit
  message [Markus]
- fix up the doc update patch [Markus]

v3:
- drop patch "tests: iotests: don't compare SHUTDOWN event", replace
  it with "monitor: flush qmp responses when CLOSED" to fix up the
  race. [Eric, Markus]
- tweak the oob revert patch to not break qmp-test [Eric]
- quite a few comment and commit message fix-ups [Eric]
- add more comment in commit message, mention about why MUX can't be
  used for Out-Of-Band [Markus]
- one new patch to comment on chardev CLOSED event [Stefan]
- one new patch to fix iotest breakage on 060 specifically

Tests: make check, iotests

Please review.  Thanks,

Peter Xu (7):
  chardev: comment details for CLOSED event
  monitor: rename *_pop_one to *_pop_any
  monitor: flush qmp responses when CLOSED
  tests: iotests: drop some stderr line
  docs: mention shared state protect for OOB
  monitor: remove "x-oob", turn oob on by default
  Revert "tests: Add parameter to qtest_init_without_qmp_handshake"

 docs/devel/qapi-code-gen.txt | 17 +++++---
 include/chardev/char.h       | 11 +++++-
 include/monitor/monitor.h    |  1 -
 tests/libqtest.h             |  4 +-
 monitor.c                    | 75 +++++++++++++++++++++---------------
 tests/libqtest.c             | 10 ++---
 tests/qmp-test.c             |  6 +--
 vl.c                         |  5 ---
 tests/qemu-iotests/060       | 10 ++++-
 tests/qemu-iotests/060.out   |  1 -
 10 files changed, 83 insertions(+), 57 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 1/7] chardev: comment details for CLOSED event
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 2/7] monitor: rename *_pop_one to *_pop_any Peter Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert,
	Paolo Bonzini

It was unclear before on what does the CLOSED event mean.  Meanwhile we
add a TODO to fix up the CLOSED event in the future when the in/out
ports are different for a chardev.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: "Marc-André Lureau" <marcandre.lureau@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/chardev/char.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/chardev/char.h b/include/chardev/char.h
index 04de45795e..6f0576e214 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -22,7 +22,16 @@ typedef enum {
     CHR_EVENT_OPENED, /* new connection established */
     CHR_EVENT_MUX_IN, /* mux-focus was set to this terminal */
     CHR_EVENT_MUX_OUT, /* mux-focus will move on */
-    CHR_EVENT_CLOSED /* connection closed */
+    CHR_EVENT_CLOSED /* connection closed.  NOTE: currently this event
+                      * is only bound to the read port of the chardev.
+                      * Normally the read port and write port of a
+                      * chardev should be the same, but it can be
+                      * different, e.g., for fd chardevs, when the two
+                      * fds are different.  So when we received the
+                      * CLOSED event it's still possible that the out
+                      * port is still open.  TODO: we should only send
+                      * the CLOSED event when both ports are closed.
+                      */
 } QEMUChrEvent;
 
 #define CHR_READ_BUF_LEN 4096
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 2/7] monitor: rename *_pop_one to *_pop_any
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 1/7] chardev: comment details for CLOSED event Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED Peter Xu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

The old names are confusing since both of the old functions are popping
an item from multiple queues rather than a single queue.  In that
sense, *_pop_any() suites better than *_pop_one().

Since at it, touch up the function monitor_qmp_response_pop_any() a bit
to let the callers pass in a QMPResponse struct instead of returning a
struct.  Change the return value to boolean to mark whether we have
popped a valid response instead.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 monitor.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/monitor.c b/monitor.c
index 885e000f9b..9c89c8695c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -541,10 +541,10 @@ struct QMPResponse {
 typedef struct QMPResponse QMPResponse;
 
 /*
- * Return one QMPResponse.  The response is only valid if
- * response.data is not NULL.
+ * Pop a QMPResponse from any monitor's response queue into @response.
+ * Return false if all the queues are empty; else true.
  */
-static QMPResponse monitor_qmp_response_pop_one(void)
+static bool monitor_qmp_response_pop_any(QMPResponse *response)
 {
     Monitor *mon;
     QObject *data = NULL;
@@ -555,22 +555,20 @@ static QMPResponse monitor_qmp_response_pop_one(void)
         data = g_queue_pop_head(mon->qmp.qmp_responses);
         qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
         if (data) {
+            response->mon = mon;
+            response->data = data;
             break;
         }
     }
     qemu_mutex_unlock(&monitor_lock);
-    return (QMPResponse) { .mon = mon, .data = data };
+    return data != NULL;
 }
 
 static void monitor_qmp_bh_responder(void *opaque)
 {
     QMPResponse response;
 
-    while (true) {
-        response = monitor_qmp_response_pop_one();
-        if (!response.data) {
-            break;
-        }
+    while (monitor_qmp_response_pop_any(&response)) {
         monitor_json_emitter_raw(response.mon, response.data);
         qobject_unref(response.data);
     }
@@ -4172,7 +4170,7 @@ static void monitor_qmp_dispatch_one(QMPRequest *req_obj)
  * when we process one request on a specific monitor, we put that
  * monitor to the end of mon_list queue.
  */
-static QMPRequest *monitor_qmp_requests_pop_one(void)
+static QMPRequest *monitor_qmp_requests_pop_any(void)
 {
     QMPRequest *req_obj = NULL;
     Monitor *mon;
@@ -4204,7 +4202,7 @@ static QMPRequest *monitor_qmp_requests_pop_one(void)
 
 static void monitor_qmp_bh_dispatcher(void *data)
 {
-    QMPRequest *req_obj = monitor_qmp_requests_pop_one();
+    QMPRequest *req_obj = monitor_qmp_requests_pop_any();
 
     if (req_obj) {
         trace_monitor_qmp_cmd_in_band(qobject_get_try_str(req_obj->id) ?: "");
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 1/7] chardev: comment details for CLOSED event Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 2/7] monitor: rename *_pop_one to *_pop_any Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  8:33   ` Markus Armbruster
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line Peter Xu
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

Previously we clean up the queues when we got CLOSED event.  It was used
to make sure we won't send leftover replies/events of a old client to a
new client which makes perfect sense. However this will also drop the
replies/events even if the output port of the previous chardev backend
is still open, which can lead to missing of the last replies/events.
Now this patch does an extra operation to flush the response queue
before cleaning up.

In most cases, a QMP session will be based on a bidirectional channel (a
TCP port, for example, we read/write to the same socket handle), so in
port and out port of the backend chardev are fundamentally the same
port. In these cases, it does not really matter much on whether we'll
flush the response queue since flushing will fail anyway.  However there
can be cases where in & out ports of the QMP monitor's backend chardev
are separated.  Here is an example:

  cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands

In this case, the backend is fd-typed, and it is connected to stdio
where in port is stdin and out port is stdout.  Now if we drop all the
events on the response queue then filter_command process might miss some
events that it might expect.  The thing is that, when stdin closes,
stdout might still be there alive!

In practice, I encountered SHUTDOWN event missing when running test with
iotest 087 with Out-Of-Band enabled.  Here is one of the ways that this
can happen (after "quit" command is executed and QEMU quits the main
loop):

1. [main thread] QEMU queues a SHUTDOWN event into response queue.

2. "cat" terminates (to distinguish it from the animal, I quote it).

3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin.

4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event
   hook for the monitor, which will destroy the response queue of the
   monitor, then the SHUTDOWN event is dropped.

5. [main thread] QEMU's main thread cleans up the monitors in
   monitor_cleanup().  When trying to flush pending responses, it sees
   nothing.  SHUTDOWN is lost forever.

Note that before the monitor iothread was introduced, step [4]/[5] could
never happen since the main loop was the only place to detect the EOF
event of stdin and run the CLOSED event hooks.  Now things can happen in
parallel in the iothread.

Without this patch, iotest 087 will have ~10% chance to miss the
SHUTDOWN event and fail when with Out-Of-Band enabled (the output is
manually touched up to suite line width requirement):

  --- /home/peterx/git/qemu/tests/qemu-iotests/087.out
  +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad
  @@ -8,7 +8,6 @@
  {"return": {}}
  {"error": {"class": "GenericError", "desc": "'node-name' must be
  specified for the root node"}}
  {"return": {}}
  -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}

  === Duplicate ID ===
  @@ -53,7 +52,6 @@
  {"return": {}}
  {"return": {}}
  {"return": {}}

  -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}

This patch fixes the problem.

Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 monitor.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index 9c89c8695c..ea93db4857 100644
--- a/monitor.c
+++ b/monitor.c
@@ -540,6 +540,27 @@ struct QMPResponse {
 };
 typedef struct QMPResponse QMPResponse;
 
+static QObject *monitor_qmp_response_pop_one(Monitor *mon)
+{
+    QObject *data;
+
+    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
+    data = g_queue_pop_head(mon->qmp.qmp_responses);
+    qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
+
+    return data;
+}
+
+static void monitor_qmp_response_flush(Monitor *mon)
+{
+    QObject *data;
+
+    while ((data = monitor_qmp_response_pop_one(mon))) {
+        monitor_json_emitter_raw(mon, data);
+        qobject_unref(data);
+    }
+}
+
 /*
  * Pop a QMPResponse from any monitor's response queue into @response.
  * Return false if all the queues are empty; else true.
@@ -551,9 +572,7 @@ static bool monitor_qmp_response_pop_any(QMPResponse *response)
 
     qemu_mutex_lock(&monitor_lock);
     QTAILQ_FOREACH(mon, &mon_list, entry) {
-        qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
-        data = g_queue_pop_head(mon->qmp.qmp_responses);
-        qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
+        data = monitor_qmp_response_pop_one(mon);
         if (data) {
             response->mon = mon;
             response->data = data;
@@ -4429,6 +4448,14 @@ static void monitor_qmp_event(void *opaque, int event)
         mon_refcount++;
         break;
     case CHR_EVENT_CLOSED:
+        /*
+         * Note: this is only useful when the output of the chardev
+         * backend is still open.  For example, when the backend is
+         * stdio, it's possible that stdout is still open when stdin
+         * is closed.  After all, CHR_EVENT_CLOSED event is currently
+         * only bound to stdin.
+         */
+        monitor_qmp_response_flush(mon);
         monitor_qmp_cleanup_queues(mon);
         json_message_parser_destroy(&mon->qmp.parser);
         json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (2 preceding siblings ...)
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  8:35   ` Markus Armbruster
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 5/7] docs: mention shared state protect for OOB Peter Xu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

In my Out-Of-Band test, "check -qcow2 060" fail with this (the output is
manually changed due to line width requirement):

  --- /home/peterx/git/qemu/tests/qemu-iotests/060.out
  +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/060.out.bad
  @@ -427,8 +427,8 @@
  QMP_VERSION
  {"return": {}}
  qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1
  index: 0); further non-fatal corruption events will be suppressed
  -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a0
  0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
  read failed: Input/output error
  +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a0
  0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
  {"return": ""}
  {"return": {}}
  {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
  "event": "SHUTDOWN", "data": {"guest": false}}

The order of the event and the in/out error line is swapped.  I didn't
dig up the reason, but AFAIU what we want to verify is the event rather
than stderr.  Let's drop the stderr line directly for this test.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qemu-iotests/060     | 10 +++++++++-
 tests/qemu-iotests/060.out |  1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 7bdf609f3f..74ad371885 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -33,6 +33,14 @@ _cleanup()
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
+# Sometimes the error line might be dumped before/after an event
+# randomly.  Mask it out for specific test that may trigger this
+# uncertainty for current test for now.
+_filter_io_error()
+{
+    sed '/Input\/output error/d'
+}
+
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
@@ -464,7 +472,7 @@ echo "{'execute': 'qmp_capabilities'}
                         }}" \
             -incoming exec:'cat /dev/null' \
             2>&1 \
-    | _filter_qmp | _filter_qemu_io
+    | _filter_qmp | _filter_qemu_io | _filter_io_error
 
 echo
 # Image should not have been marked corrupt
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index bff023d889..d67c6234a4 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -428,7 +428,6 @@ QMP_VERSION
 {"return": {}}
 qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a00 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
-read failed: Input/output error
 {"return": ""}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 5/7] docs: mention shared state protect for OOB
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (3 preceding siblings ...)
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 6/7] monitor: remove "x-oob", turn oob on by default Peter Xu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

Out-Of-Band handlers need to protect shared state if there is any.
Mention it in the document.  Meanwhile, touch up some other places too,
either with better English, or reordering of bullets.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/qapi-code-gen.txt | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index 1366228b2a..fb486d4050 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -664,22 +664,27 @@ command:
 
 - They are executed in order,
 - They run only in main thread of QEMU,
-- They have the BQL taken during execution.
+- They run with the BQL held.
 
 When a command is executed with OOB, the following changes occur:
 
 - They can be completed before a pending in-band command,
 - They run in a dedicated monitor thread,
-- They do not take the BQL during execution.
+- They run with the BQL not held.
 
 OOB command handlers must satisfy the following conditions:
 
-- It executes extremely fast,
-- It does not take any lock, or, it can take very small locks if all
-  critical regions also follow the rules for OOB command handler code,
+- It terminates quickly,
 - It does not invoke system calls that may block,
 - It does not access guest RAM that may block when userfaultfd is
-  enabled for postcopy live migration.
+  enabled for postcopy live migration,
+- It takes only "fast" locks, i.e. all critical sections protected by
+  any lock it takes also satisfy the conditions for OOB command
+  handler code.
+
+The restrictions on locking limit access to shared state.  Such access
+requires synchronization, but OOB commands can't take the BQL or any
+other "slow" lock.
 
 If in doubt, do not implement OOB execution support.
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 6/7] monitor: remove "x-oob", turn oob on by default
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (4 preceding siblings ...)
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 5/7] docs: mention shared state protect for OOB Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 7/7] Revert "tests: Add parameter to qtest_init_without_qmp_handshake" Peter Xu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

OOB commands were introduced in commit cf869d53172.  Unfortunately, we
ran into a regression, and had to disable them by default for 2.12
(commit be933ffc23).

  http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html

The regression has since been fixed (commit 951702f39c7 "monitor: bind
dispatch bh to iohandler context").  Time to re-enable OOB.

This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"),
and turns it on again for non-MUX QMPs.  Note that we can't enable
Out-Of-Band for monitors with MUX-typed chardev backends, because not
all the chardev frontends can run without main thread, or can run in
multiple threads.

Some trivial touch-up in the test code is required to make sure qmp-test
won't break.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/monitor/monitor.h |  1 -
 monitor.c                 | 22 ++++++----------------
 tests/libqtest.c          |  2 +-
 tests/qmp-test.c          |  2 +-
 vl.c                      |  5 -----
 5 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index d6ab70cae2..0cb0538a31 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -13,7 +13,6 @@ extern Monitor *cur_mon;
 #define MONITOR_USE_READLINE  0x02
 #define MONITOR_USE_CONTROL   0x04
 #define MONITOR_USE_PRETTY    0x08
-#define MONITOR_USE_OOB       0x10
 
 bool monitor_cur_is_qmp(void);
 
diff --git a/monitor.c b/monitor.c
index ea93db4857..99c5c902f9 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4653,19 +4653,12 @@ void monitor_init(Chardev *chr, int flags)
 {
     Monitor *mon = g_malloc(sizeof(*mon));
     bool use_readline = flags & MONITOR_USE_READLINE;
-    bool use_oob = flags & MONITOR_USE_OOB;
-
-    if (use_oob) {
-        if (CHARDEV_IS_MUX(chr)) {
-            error_report("Monitor Out-Of-Band is not supported with "
-                         "MUX typed chardev backend");
-            exit(1);
-        }
-        if (use_readline) {
-            error_report("Monitor Out-Of-band is only supported by QMP");
-            exit(1);
-        }
-    }
+    /*
+     * Note: we can't enable Out-Of-Band for monitors with MUX-typed
+     * chardev backends, because not all the chardev frontends can run
+     * without main thread, or can run in multiple threads.
+     */
+    bool use_oob = (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr);
 
     monitor_data_init(mon, false, use_oob);
 
@@ -4767,9 +4760,6 @@ QemuOptsList qemu_mon_opts = {
         },{
             .name = "pretty",
             .type = QEMU_OPT_BOOL,
-        },{
-            .name = "x-oob",
-            .type = QEMU_OPT_BOOL,
         },
         { /* end of list */ }
     },
diff --git a/tests/libqtest.c b/tests/libqtest.c
index 098af6aec4..c5cb3f925c 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -213,7 +213,7 @@ QTestState *qtest_init_without_qmp_handshake(bool use_oob,
                                   "-display none "
                                   "%s", qemu_binary, socket_path,
                                   getenv("QTEST_LOG") ? "/dev/fd/2" : "/dev/null",
-                                  qmp_socket_path, use_oob ? ",x-oob=on" : "",
+                                  qmp_socket_path, "",
                                   extra_args ?: "");
         execlp("/bin/sh", "sh", "-c", command, NULL);
         exit(1);
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index a49cbc6fde..3747bf7fbb 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
     g_assert(q);
     test_version(qdict_get(q, "version"));
     capabilities = qdict_get_qlist(q, "capabilities");
-    g_assert(capabilities && qlist_empty(capabilities));
+    g_assert(capabilities);
     qobject_unref(resp);
 
     /* Test valid command before handshake */
diff --git a/vl.c b/vl.c
index b3426e03d0..8ecdd8a1a0 100644
--- a/vl.c
+++ b/vl.c
@@ -2307,11 +2307,6 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
     if (qemu_opt_get_bool(opts, "pretty", 0))
         flags |= MONITOR_USE_PRETTY;
 
-    /* OOB is off by default */
-    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
-        flags |= MONITOR_USE_OOB;
-    }
-
     chardev = qemu_opt_get(opts, "chardev");
     chr = qemu_chr_find(chardev);
     if (chr == NULL) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v5 7/7] Revert "tests: Add parameter to qtest_init_without_qmp_handshake"
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (5 preceding siblings ...)
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 6/7] monitor: remove "x-oob", turn oob on by default Peter Xu
@ 2018-06-20  7:32 ` Peter Xu
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
  2018-06-30 16:26 ` [Qemu-devel] [PATCH v5 0/7] " Markus Armbruster
  8 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-20  7:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marc-André Lureau, Daniel P . Berrange, Thomas Huth,
	Christian Borntraeger, Fam Zheng, Kevin Wolf, Max Reitz, peterx,
	Eric Auger, Eric Blake, John Snow, Peter Maydell,
	Markus Armbruster, Stefan Hajnoczi, Dr . David Alan Gilbert

This reverts commit ddee57e0176f6ab53b13c6c97605b62737a8fd7a.

Meanwhile, revert one line from fa198ad9bdef to make sure
qtest_init_without_qmp_handshake() will only pass in one parameter.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/libqtest.h |  4 +---
 tests/libqtest.c | 10 ++++------
 tests/qmp-test.c |  4 ++--
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/tests/libqtest.h b/tests/libqtest.h
index ac52872cbe..180d2cc6ff 100644
--- a/tests/libqtest.h
+++ b/tests/libqtest.h
@@ -56,14 +56,12 @@ QTestState *qtest_init(const char *extra_args);
 
 /**
  * qtest_init_without_qmp_handshake:
- * @use_oob: true to have the server advertise OOB support
  * @extra_args: other arguments to pass to QEMU.  CAUTION: these
  * arguments are subject to word splitting and shell evaluation.
  *
  * Returns: #QTestState instance.
  */
-QTestState *qtest_init_without_qmp_handshake(bool use_oob,
-                                             const char *extra_args);
+QTestState *qtest_init_without_qmp_handshake(const char *extra_args);
 
 /**
  * qtest_quit:
diff --git a/tests/libqtest.c b/tests/libqtest.c
index c5cb3f925c..51af4a289e 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -173,8 +173,7 @@ static const char *qtest_qemu_binary(void)
     return qemu_bin;
 }
 
-QTestState *qtest_init_without_qmp_handshake(bool use_oob,
-                                             const char *extra_args)
+QTestState *qtest_init_without_qmp_handshake(const char *extra_args)
 {
     QTestState *s;
     int sock, qmpsock, i;
@@ -207,13 +206,12 @@ QTestState *qtest_init_without_qmp_handshake(bool use_oob,
         command = g_strdup_printf("exec %s "
                                   "-qtest unix:%s,nowait "
                                   "-qtest-log %s "
-                                  "-chardev socket,path=%s,nowait,id=char0 "
-                                  "-mon chardev=char0,mode=control%s "
+                                  "-qmp unix:%s,nowait "
                                   "-machine accel=qtest "
                                   "-display none "
                                   "%s", qemu_binary, socket_path,
                                   getenv("QTEST_LOG") ? "/dev/fd/2" : "/dev/null",
-                                  qmp_socket_path, "",
+                                  qmp_socket_path,
                                   extra_args ?: "");
         execlp("/bin/sh", "sh", "-c", command, NULL);
         exit(1);
@@ -248,7 +246,7 @@ QTestState *qtest_init_without_qmp_handshake(bool use_oob,
 
 QTestState *qtest_init(const char *extra_args)
 {
-    QTestState *s = qtest_init_without_qmp_handshake(false, extra_args);
+    QTestState *s = qtest_init_without_qmp_handshake(extra_args);
 
     /* Read the QMP greeting and then do the handshake */
     qtest_qmp_discard_response(s, "");
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 3747bf7fbb..7d23daba28 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -81,7 +81,7 @@ static void test_qmp_protocol(void)
     QList *capabilities;
     QTestState *qts;
 
-    qts = qtest_init_without_qmp_handshake(false, common_args);
+    qts = qtest_init_without_qmp_handshake(common_args);
 
     /* Test greeting */
     resp = qtest_qmp_receive(qts);
@@ -146,7 +146,7 @@ static void test_qmp_oob(void)
     QString *qstr;
     const char *cmd_id;
 
-    qts = qtest_init_without_qmp_handshake(true, common_args);
+    qts = qtest_init_without_qmp_handshake(common_args);
 
     /* Check the greeting message. */
     resp = qtest_qmp_receive(qts);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED Peter Xu
@ 2018-06-20  8:33   ` Markus Armbruster
  2018-06-20  8:38     ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-20  8:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Kevin Wolf, Peter Maydell, Thomas Huth, Fam Zheng,
	Eric Auger, John Snow, Max Reitz, Christian Borntraeger,
	Dr . David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Markus Armbruster

Peter Xu <peterx@redhat.com> writes:

> Previously we clean up the queues when we got CLOSED event.  It was used
> to make sure we won't send leftover replies/events of a old client to a
> new client which makes perfect sense. However this will also drop the
> replies/events even if the output port of the previous chardev backend
> is still open, which can lead to missing of the last replies/events.
> Now this patch does an extra operation to flush the response queue
> before cleaning up.
>
> In most cases, a QMP session will be based on a bidirectional channel (a
> TCP port, for example, we read/write to the same socket handle), so in
> port and out port of the backend chardev are fundamentally the same
> port. In these cases, it does not really matter much on whether we'll
> flush the response queue since flushing will fail anyway.  However there
> can be cases where in & out ports of the QMP monitor's backend chardev
> are separated.  Here is an example:
>
>   cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
>
> In this case, the backend is fd-typed, and it is connected to stdio
> where in port is stdin and out port is stdout.  Now if we drop all the
> events on the response queue then filter_command process might miss some
> events that it might expect.  The thing is that, when stdin closes,
> stdout might still be there alive!
>
> In practice, I encountered SHUTDOWN event missing when running test with
> iotest 087 with Out-Of-Band enabled.  Here is one of the ways that this
> can happen (after "quit" command is executed and QEMU quits the main
> loop):
>
> 1. [main thread] QEMU queues a SHUTDOWN event into response queue.
>
> 2. "cat" terminates (to distinguish it from the animal, I quote it).
>
> 3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin.
>
> 4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event
>    hook for the monitor, which will destroy the response queue of the
>    monitor, then the SHUTDOWN event is dropped.
>
> 5. [main thread] QEMU's main thread cleans up the monitors in
>    monitor_cleanup().  When trying to flush pending responses, it sees
>    nothing.  SHUTDOWN is lost forever.
>
> Note that before the monitor iothread was introduced, step [4]/[5] could
> never happen since the main loop was the only place to detect the EOF
> event of stdin and run the CLOSED event hooks.  Now things can happen in
> parallel in the iothread.
>
> Without this patch, iotest 087 will have ~10% chance to miss the
> SHUTDOWN event and fail when with Out-Of-Band enabled (the output is
> manually touched up to suite line width requirement):

I think the output is no longer wrapped.  Can drop the parenthesis when
I apply.

>
>   --- /home/peterx/git/qemu/tests/qemu-iotests/087.out
>   +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad
>   @@ -8,7 +8,6 @@
>   {"return": {}}
>   {"error": {"class": "GenericError", "desc": "'node-name' must be
>   specified for the root node"}}
>   {"return": {}}
>   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>
>   === Duplicate ID ===
>   @@ -53,7 +52,6 @@
>   {"return": {}}
>   {"return": {}}
>   {"return": {}}
>
>   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>
> This patch fixes the problem.
>
> Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  monitor.c | 33 ++++++++++++++++++++++++++++++---
>  1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/monitor.c b/monitor.c
> index 9c89c8695c..ea93db4857 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -540,6 +540,27 @@ struct QMPResponse {
>  };
>  typedef struct QMPResponse QMPResponse;
>  
> +static QObject *monitor_qmp_response_pop_one(Monitor *mon)
> +{
> +    QObject *data;
> +
> +    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> +    data = g_queue_pop_head(mon->qmp.qmp_responses);
> +    qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> +
> +    return data;
> +}
> +
> +static void monitor_qmp_response_flush(Monitor *mon)
> +{
> +    QObject *data;
> +
> +    while ((data = monitor_qmp_response_pop_one(mon))) {
> +        monitor_json_emitter_raw(mon, data);
> +        qobject_unref(data);
> +    }
> +}
> +
>  /*
>   * Pop a QMPResponse from any monitor's response queue into @response.
>   * Return false if all the queues are empty; else true.
> @@ -551,9 +572,7 @@ static bool monitor_qmp_response_pop_any(QMPResponse *response)
>  
>      qemu_mutex_lock(&monitor_lock);
>      QTAILQ_FOREACH(mon, &mon_list, entry) {
> -        qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> -        data = g_queue_pop_head(mon->qmp.qmp_responses);
> -        qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> +        data = monitor_qmp_response_pop_one(mon);
>          if (data) {
>              response->mon = mon;
>              response->data = data;
> @@ -4429,6 +4448,14 @@ static void monitor_qmp_event(void *opaque, int event)
>          mon_refcount++;
>          break;
>      case CHR_EVENT_CLOSED:
> +        /*
> +         * Note: this is only useful when the output of the chardev
> +         * backend is still open.  For example, when the backend is
> +         * stdio, it's possible that stdout is still open when stdin
> +         * is closed.  After all, CHR_EVENT_CLOSED event is currently
> +         * only bound to stdin.

Did you forget to delete the last sentence, or decide to keep it?

I could drop it when I apply.

> +         */
> +        monitor_qmp_response_flush(mon);
>          monitor_qmp_cleanup_queues(mon);
>          json_message_parser_destroy(&mon->qmp.parser);
>          json_message_parser_init(&mon->qmp.parser, handle_qmp_command);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line Peter Xu
@ 2018-06-20  8:35   ` Markus Armbruster
  0 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-20  8:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Kevin Wolf, Peter Maydell, Thomas Huth, Fam Zheng,
	Eric Auger, John Snow, Max Reitz, Christian Borntraeger,
	Dr . David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau

Peter Xu <peterx@redhat.com> writes:

> In my Out-Of-Band test, "check -qcow2 060" fail with this (the output is
> manually changed due to line width requirement):

I think the output is no longer wrapped.  Can drop the parenthesis when
I apply.

>   --- /home/peterx/git/qemu/tests/qemu-iotests/060.out
>   +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/060.out.bad
>   @@ -427,8 +427,8 @@
>   QMP_VERSION
>   {"return": {}}
>   qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1
>   index: 0); further non-fatal corruption events will be suppressed
>   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a0
>   0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
>   read failed: Input/output error
>   +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a0
>   0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
>   {"return": ""}
>   {"return": {}}
>   {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
>   "event": "SHUTDOWN", "data": {"guest": false}}
>
> The order of the event and the in/out error line is swapped.  I didn't
> dig up the reason, but AFAIU what we want to verify is the event rather
> than stderr.  Let's drop the stderr line directly for this test.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Max, may I have your R-by?

> ---
>  tests/qemu-iotests/060     | 10 +++++++++-
>  tests/qemu-iotests/060.out |  1 -
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
> index 7bdf609f3f..74ad371885 100755
> --- a/tests/qemu-iotests/060
> +++ b/tests/qemu-iotests/060
> @@ -33,6 +33,14 @@ _cleanup()
>  }
>  trap "_cleanup; exit \$status" 0 1 2 3 15
>  
> +# Sometimes the error line might be dumped before/after an event
> +# randomly.  Mask it out for specific test that may trigger this
> +# uncertainty for current test for now.
> +_filter_io_error()
> +{
> +    sed '/Input\/output error/d'
> +}
> +
>  # get standard environment, filters and checks
>  . ./common.rc
>  . ./common.filter
> @@ -464,7 +472,7 @@ echo "{'execute': 'qmp_capabilities'}
>                          }}" \
>              -incoming exec:'cat /dev/null' \
>              2>&1 \
> -    | _filter_qmp | _filter_qemu_io
> +    | _filter_qmp | _filter_qemu_io | _filter_io_error
>  
>  echo
>  # Image should not have been marked corrupt
> diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
> index bff023d889..d67c6234a4 100644
> --- a/tests/qemu-iotests/060.out
> +++ b/tests/qemu-iotests/060.out
> @@ -428,7 +428,6 @@ QMP_VERSION
>  {"return": {}}
>  qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a00 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
> -read failed: Input/output error
>  {"return": ""}
>  {"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED
  2018-06-20  8:33   ` Markus Armbruster
@ 2018-06-20  8:38     ` Peter Xu
  2018-06-20  9:50       ` Markus Armbruster
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-20  8:38 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Kevin Wolf, Peter Maydell, Thomas Huth, Fam Zheng,
	Eric Auger, John Snow, Max Reitz, Christian Borntraeger,
	Dr . David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau

On Wed, Jun 20, 2018 at 10:33:37AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Previously we clean up the queues when we got CLOSED event.  It was used
> > to make sure we won't send leftover replies/events of a old client to a
> > new client which makes perfect sense. However this will also drop the
> > replies/events even if the output port of the previous chardev backend
> > is still open, which can lead to missing of the last replies/events.
> > Now this patch does an extra operation to flush the response queue
> > before cleaning up.
> >
> > In most cases, a QMP session will be based on a bidirectional channel (a
> > TCP port, for example, we read/write to the same socket handle), so in
> > port and out port of the backend chardev are fundamentally the same
> > port. In these cases, it does not really matter much on whether we'll
> > flush the response queue since flushing will fail anyway.  However there
> > can be cases where in & out ports of the QMP monitor's backend chardev
> > are separated.  Here is an example:
> >
> >   cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
> >
> > In this case, the backend is fd-typed, and it is connected to stdio
> > where in port is stdin and out port is stdout.  Now if we drop all the
> > events on the response queue then filter_command process might miss some
> > events that it might expect.  The thing is that, when stdin closes,
> > stdout might still be there alive!
> >
> > In practice, I encountered SHUTDOWN event missing when running test with
> > iotest 087 with Out-Of-Band enabled.  Here is one of the ways that this
> > can happen (after "quit" command is executed and QEMU quits the main
> > loop):
> >
> > 1. [main thread] QEMU queues a SHUTDOWN event into response queue.
> >
> > 2. "cat" terminates (to distinguish it from the animal, I quote it).
> >
> > 3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin.
> >
> > 4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event
> >    hook for the monitor, which will destroy the response queue of the
> >    monitor, then the SHUTDOWN event is dropped.
> >
> > 5. [main thread] QEMU's main thread cleans up the monitors in
> >    monitor_cleanup().  When trying to flush pending responses, it sees
> >    nothing.  SHUTDOWN is lost forever.
> >
> > Note that before the monitor iothread was introduced, step [4]/[5] could
> > never happen since the main loop was the only place to detect the EOF
> > event of stdin and run the CLOSED event hooks.  Now things can happen in
> > parallel in the iothread.
> >
> > Without this patch, iotest 087 will have ~10% chance to miss the
> > SHUTDOWN event and fail when with Out-Of-Band enabled (the output is
> > manually touched up to suite line width requirement):
> 
> I think the output is no longer wrapped.  Can drop the parenthesis when
> I apply.

Indeed.

> 
> >
> >   --- /home/peterx/git/qemu/tests/qemu-iotests/087.out
> >   +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad
> >   @@ -8,7 +8,6 @@
> >   {"return": {}}
> >   {"error": {"class": "GenericError", "desc": "'node-name' must be
> >   specified for the root node"}}
> >   {"return": {}}
> >   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
> >
> >   === Duplicate ID ===
> >   @@ -53,7 +52,6 @@
> >   {"return": {}}
> >   {"return": {}}
> >   {"return": {}}
> >
> >   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
> >
> > This patch fixes the problem.
> >
> > Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
> > Suggested-by: Markus Armbruster <armbru@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  monitor.c | 33 ++++++++++++++++++++++++++++++---
> >  1 file changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 9c89c8695c..ea93db4857 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -540,6 +540,27 @@ struct QMPResponse {
> >  };
> >  typedef struct QMPResponse QMPResponse;
> >  
> > +static QObject *monitor_qmp_response_pop_one(Monitor *mon)
> > +{
> > +    QObject *data;
> > +
> > +    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> > +    data = g_queue_pop_head(mon->qmp.qmp_responses);
> > +    qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> > +
> > +    return data;
> > +}
> > +
> > +static void monitor_qmp_response_flush(Monitor *mon)
> > +{
> > +    QObject *data;
> > +
> > +    while ((data = monitor_qmp_response_pop_one(mon))) {
> > +        monitor_json_emitter_raw(mon, data);
> > +        qobject_unref(data);
> > +    }
> > +}
> > +
> >  /*
> >   * Pop a QMPResponse from any monitor's response queue into @response.
> >   * Return false if all the queues are empty; else true.
> > @@ -551,9 +572,7 @@ static bool monitor_qmp_response_pop_any(QMPResponse *response)
> >  
> >      qemu_mutex_lock(&monitor_lock);
> >      QTAILQ_FOREACH(mon, &mon_list, entry) {
> > -        qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> > -        data = g_queue_pop_head(mon->qmp.qmp_responses);
> > -        qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> > +        data = monitor_qmp_response_pop_one(mon);
> >          if (data) {
> >              response->mon = mon;
> >              response->data = data;
> > @@ -4429,6 +4448,14 @@ static void monitor_qmp_event(void *opaque, int event)
> >          mon_refcount++;
> >          break;
> >      case CHR_EVENT_CLOSED:
> > +        /*
> > +         * Note: this is only useful when the output of the chardev
> > +         * backend is still open.  For example, when the backend is
> > +         * stdio, it's possible that stdout is still open when stdin
> > +         * is closed.  After all, CHR_EVENT_CLOSED event is currently
> > +         * only bound to stdin.
> 
> Did you forget to delete the last sentence, or decide to keep it?
> 
> I could drop it when I apply.

I'm sorry; this one I forgot.

Please feel free to drop it.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED
  2018-06-20  8:38     ` Peter Xu
@ 2018-06-20  9:50       ` Markus Armbruster
  0 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-20  9:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Markus Armbruster, Kevin Wolf, Peter Maydell, Thomas Huth,
	Fam Zheng, Christian Borntraeger, qemu-devel, Max Reitz,
	Eric Auger, Stefan Hajnoczi, Marc-André Lureau, John Snow,
	Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jun 20, 2018 at 10:33:37AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Previously we clean up the queues when we got CLOSED event.  It was used
>> > to make sure we won't send leftover replies/events of a old client to a
>> > new client which makes perfect sense. However this will also drop the
>> > replies/events even if the output port of the previous chardev backend
>> > is still open, which can lead to missing of the last replies/events.
>> > Now this patch does an extra operation to flush the response queue
>> > before cleaning up.
>> >
>> > In most cases, a QMP session will be based on a bidirectional channel (a
>> > TCP port, for example, we read/write to the same socket handle), so in
>> > port and out port of the backend chardev are fundamentally the same
>> > port. In these cases, it does not really matter much on whether we'll
>> > flush the response queue since flushing will fail anyway.  However there
>> > can be cases where in & out ports of the QMP monitor's backend chardev
>> > are separated.  Here is an example:
>> >
>> >   cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
>> >
>> > In this case, the backend is fd-typed, and it is connected to stdio
>> > where in port is stdin and out port is stdout.  Now if we drop all the
>> > events on the response queue then filter_command process might miss some
>> > events that it might expect.  The thing is that, when stdin closes,
>> > stdout might still be there alive!
>> >
>> > In practice, I encountered SHUTDOWN event missing when running test with
>> > iotest 087 with Out-Of-Band enabled.  Here is one of the ways that this
>> > can happen (after "quit" command is executed and QEMU quits the main
>> > loop):
>> >
>> > 1. [main thread] QEMU queues a SHUTDOWN event into response queue.
>> >
>> > 2. "cat" terminates (to distinguish it from the animal, I quote it).
>> >
>> > 3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin.
>> >
>> > 4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event
>> >    hook for the monitor, which will destroy the response queue of the
>> >    monitor, then the SHUTDOWN event is dropped.
>> >
>> > 5. [main thread] QEMU's main thread cleans up the monitors in
>> >    monitor_cleanup().  When trying to flush pending responses, it sees
>> >    nothing.  SHUTDOWN is lost forever.
>> >
>> > Note that before the monitor iothread was introduced, step [4]/[5] could
>> > never happen since the main loop was the only place to detect the EOF
>> > event of stdin and run the CLOSED event hooks.  Now things can happen in
>> > parallel in the iothread.
>> >
>> > Without this patch, iotest 087 will have ~10% chance to miss the
>> > SHUTDOWN event and fail when with Out-Of-Band enabled (the output is
>> > manually touched up to suite line width requirement):
>> 
>> I think the output is no longer wrapped.  Can drop the parenthesis when
>> I apply.
>
> Indeed.
>
>> 
>> >
>> >   --- /home/peterx/git/qemu/tests/qemu-iotests/087.out
>> >   +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad
>> >   @@ -8,7 +8,6 @@
>> >   {"return": {}}
>> >   {"error": {"class": "GenericError", "desc": "'node-name' must be
>> >   specified for the root node"}}
>> >   {"return": {}}
>> >   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>> >
>> >   === Duplicate ID ===
>> >   @@ -53,7 +52,6 @@
>> >   {"return": {}}
>> >   {"return": {}}
>> >   {"return": {}}
>> >
>> >   -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>> >
>> > This patch fixes the problem.
>> >
>> > Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
>> > Suggested-by: Markus Armbruster <armbru@redhat.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  monitor.c | 33 ++++++++++++++++++++++++++++++---
>> >  1 file changed, 30 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/monitor.c b/monitor.c
>> > index 9c89c8695c..ea93db4857 100644
>> > --- a/monitor.c
>> > +++ b/monitor.c
>> > @@ -540,6 +540,27 @@ struct QMPResponse {
>> >  };
>> >  typedef struct QMPResponse QMPResponse;
>> >  
>> > +static QObject *monitor_qmp_response_pop_one(Monitor *mon)
>> > +{
>> > +    QObject *data;
>> > +
>> > +    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
>> > +    data = g_queue_pop_head(mon->qmp.qmp_responses);
>> > +    qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
>> > +
>> > +    return data;
>> > +}
>> > +
>> > +static void monitor_qmp_response_flush(Monitor *mon)
>> > +{
>> > +    QObject *data;
>> > +
>> > +    while ((data = monitor_qmp_response_pop_one(mon))) {
>> > +        monitor_json_emitter_raw(mon, data);
>> > +        qobject_unref(data);
>> > +    }
>> > +}
>> > +
>> >  /*
>> >   * Pop a QMPResponse from any monitor's response queue into @response.
>> >   * Return false if all the queues are empty; else true.
>> > @@ -551,9 +572,7 @@ static bool monitor_qmp_response_pop_any(QMPResponse *response)
>> >  
>> >      qemu_mutex_lock(&monitor_lock);
>> >      QTAILQ_FOREACH(mon, &mon_list, entry) {
>> > -        qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
>> > -        data = g_queue_pop_head(mon->qmp.qmp_responses);
>> > -        qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
>> > +        data = monitor_qmp_response_pop_one(mon);
>> >          if (data) {
>> >              response->mon = mon;
>> >              response->data = data;
>> > @@ -4429,6 +4448,14 @@ static void monitor_qmp_event(void *opaque, int event)
>> >          mon_refcount++;
>> >          break;
>> >      case CHR_EVENT_CLOSED:
>> > +        /*
>> > +         * Note: this is only useful when the output of the chardev
>> > +         * backend is still open.  For example, when the backend is
>> > +         * stdio, it's possible that stdout is still open when stdin
>> > +         * is closed.  After all, CHR_EVENT_CLOSED event is currently
>> > +         * only bound to stdin.
>> 
>> Did you forget to delete the last sentence, or decide to keep it?
>> 
>> I could drop it when I apply.
>
> I'm sorry; this one I forgot.
>
> Please feel free to drop it.

Will do.

Reviewed-by: Markus Armbruster <armbru@redhat.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] (no subject)
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (6 preceding siblings ...)
  2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 7/7] Revert "tests: Add parameter to qtest_init_without_qmp_handshake" Peter Xu
@ 2018-06-26 17:21 ` Markus Armbruster
  2018-06-27  7:38   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
                     ` (3 more replies)
  2018-06-30 16:26 ` [Qemu-devel] [PATCH v5 0/7] " Markus Armbruster
  8 siblings, 4 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-26 17:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

I fooled around a bit, and I think there are a few lose ends.

Lets update the examples in docs/interop/qmp-spec.txt to show the
current greeting (section 3.1) and how to accept a capability (section
3.2).  The capability negotiation documentation could use some polish.
I'll post a patch.

Talking to a QMP monitor that supports OOB:

    $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
    QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
    {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
    QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
    {"return": {}}
    QMP> { "execute": "query-qmp-schema" }
    {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}

Why does every command require 'id'?

Talking to a QMP monitor that doesn't support OOB:

    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": []}}
    QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
    {"error": {"class": "GenericError", "desc": "This monitor does not support Out-Of-Band (OOB)"}}
    QMP> { "execute": "qmp_capabilities" }
    {"return": {}}
    QMP> { "execute": "query-kvm" }
    {"return": {"enabled": true, "present": true}}
    QMP> { "execute": "query-kvm", "control": { "run-oob": true } }
    {"error": {"class": "GenericError", "desc": "Please enable Out-Of-Band first for the session during capabilities negotiation"}}

Telling people to enable OOB when that cannot be done is suboptimal.
More so when it cannot be used here anyway.  I'll post a patch.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
@ 2018-06-27  7:38   ` Markus Armbruster
  2018-06-27  8:41     ` Markus Armbruster
  2018-06-27  7:40   ` Markus Armbruster
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27  7:38 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

Markus Armbruster <armbru@redhat.com> writes:

> I fooled around a bit, and I think there are a few lose ends.
[...]
> Talking to a QMP monitor that supports OOB:
>
>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>     {"return": {}}
>     QMP> { "execute": "query-qmp-schema" }
>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>
> Why does every command require 'id'?

I found one reason: event COMMAND_DROPPED wants it.  Any other reason?

[...]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
  2018-06-27  7:38   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
@ 2018-06-27  7:40   ` Markus Armbruster
  2018-06-27  8:35     ` Markus Armbruster
  2018-06-27 13:36     ` Peter Xu
  2018-06-27 11:59   ` [Qemu-devel] your mail Peter Xu
  2018-07-02  5:43   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
  3 siblings, 2 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27  7:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

Another lose end: event COMMAND_DROPPED seems to lack test coverage.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  7:40   ` Markus Armbruster
@ 2018-06-27  8:35     ` Markus Armbruster
  2018-06-27 12:32       ` Peter Xu
  2018-06-27 13:36     ` Peter Xu
  1 sibling, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27  8:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Peter Xu, qemu-devel

Markus Armbruster <armbru@redhat.com> writes:

> Another lose end: event COMMAND_DROPPED seems to lack test coverage.

Hmm, dropping commands serves to limit the request queue.  What limits
the response queue?

Before OOB, the monitor read at most one command, and wrote its response
with monitor_puts().

For input, this leaves queueing to the kernel: if the client sends
commands faster than the server can execute them, eventually the kernel
refuses to buffer more, and the client's send either blocks or fails
with EAGAIN.

Output is a mess.  monitor_puts() uses an output buffer.  It tries to
flush at newline.  Issues:

* If flushing runs into a partial write, the unwritten remainder remains
  in the output buffer until the next newline.  That newline may take
  its own sweet time to arrive.  Could even lead to deadlocks, where a
  client awaits complete output before it sends more input.  Bug,
  predates OOB, doesn't block this series.

* If the client fails to read, the output buffer can grow without bound.
  Not a security issue; the client is trusted.  Just bad workmanship.

OOB doesn't change this for monitors running in the main thread.  Only
mux chardevs run there.

Aside: keeping special case code around just for mux is a really bad
idea.  We need to get rid of it.

For monitors running in an I/O thread, we add another buffer: the
response queue.  It's drained by monitor_qmp_bh_responder().  I guess
that means the response queue is effectively bounded by timely draining.
Correct?

Buffering twice seems silly, but that could be addressed in follow-up
patches.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  7:38   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
@ 2018-06-27  8:41     ` Markus Armbruster
  2018-06-27 10:20       ` Daniel P. Berrangé
  2018-06-27 13:13       ` Markus Armbruster
  0 siblings, 2 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27  8:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Peter Xu, qemu-devel

Markus Armbruster <armbru@redhat.com> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> I fooled around a bit, and I think there are a few lose ends.
> [...]
>> Talking to a QMP monitor that supports OOB:
>>
>>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>>     {"return": {}}
>>     QMP> { "execute": "query-qmp-schema" }
>>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>>
>> Why does every command require 'id'?
>
> I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
>
> [...]

Apropos COMMAND_DROPPED: we send an event rather than an error response
because we may send it out-of-order.  Makes sense.

However, broadcasting it to all monitors doesn't make sense.  We could
use a way to send an event to just one monitor.

Another use for that might be QMP "deprecated" notifications, because
those also can't be error responses, and also only make sense for the
client that caused them.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  8:41     ` Markus Armbruster
@ 2018-06-27 10:20       ` Daniel P. Berrangé
  2018-06-27 11:23         ` Markus Armbruster
  2018-06-27 13:13       ` Markus Armbruster
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrangé @ 2018-06-27 10:20 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Peter Xu

On Wed, Jun 27, 2018 at 10:41:38AM +0200, Markus Armbruster wrote:
> Markus Armbruster <armbru@redhat.com> writes:
> 
> > Markus Armbruster <armbru@redhat.com> writes:
> >
> >> I fooled around a bit, and I think there are a few lose ends.
> > [...]
> >> Talking to a QMP monitor that supports OOB:
> >>
> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
> >>     {"return": {}}
> >>     QMP> { "execute": "query-qmp-schema" }
> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> >>
> >> Why does every command require 'id'?
> >
> > I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
> >
> > [...]
> 
> Apropos COMMAND_DROPPED: we send an event rather than an error response
> because we may send it out-of-order.  Makes sense.
> 
> However, broadcasting it to all monitors doesn't make sense.  We could
> use a way to send an event to just one monitor.

Worse than that - broadcasting to all monitors is categorically broken.
Different monitors make use the same "id" formatting scheme, so if you
broadcast COMMAND_DROPPED to a different monitor you might have clashing
"id" and thus incorrectly tell a client its command was dropped when in
fact it was processed. You'd have to be fairly unlucky in timing, but
it could happen.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 10:20       ` Daniel P. Berrangé
@ 2018-06-27 11:23         ` Markus Armbruster
  2018-06-27 12:07           ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27 11:23 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Peter Xu

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Jun 27, 2018 at 10:41:38AM +0200, Markus Armbruster wrote:
>> Markus Armbruster <armbru@redhat.com> writes:
>> 
>> > Markus Armbruster <armbru@redhat.com> writes:
>> >
>> >> I fooled around a bit, and I think there are a few lose ends.
>> > [...]
>> >> Talking to a QMP monitor that supports OOB:
>> >>
>> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>> >>     {"return": {}}
>> >>     QMP> { "execute": "query-qmp-schema" }
>> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>> >>
>> >> Why does every command require 'id'?
>> >
>> > I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
>> >
>> > [...]
>> 
>> Apropos COMMAND_DROPPED: we send an event rather than an error response
>> because we may send it out-of-order.  Makes sense.
>> 
>> However, broadcasting it to all monitors doesn't make sense.  We could
>> use a way to send an event to just one monitor.
>
> Worse than that - broadcasting to all monitors is categorically broken.
> Different monitors make use the same "id" formatting scheme, so if you
> broadcast COMMAND_DROPPED to a different monitor you might have clashing
> "id" and thus incorrectly tell a client its command was dropped when in
> fact it was processed. You'd have to be fairly unlucky in timing, but
> it could happen.

Right.  Must fix bug.

I'm glad I went over this one more time, and in public!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
  2018-06-27  7:38   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
  2018-06-27  7:40   ` Markus Armbruster
@ 2018-06-27 11:59   ` Peter Xu
  2018-06-28  8:31     ` Markus Armbruster
  2018-07-02  5:43   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
  3 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-27 11:59 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Tue, Jun 26, 2018 at 07:21:49PM +0200, Markus Armbruster wrote:
> I fooled around a bit, and I think there are a few lose ends.
> 
> Lets update the examples in docs/interop/qmp-spec.txt to show the
> current greeting (section 3.1) and how to accept a capability (section
> 3.2).  The capability negotiation documentation could use some polish.
> I'll post a patch.

Thanks for doing that!

> 
> Talking to a QMP monitor that supports OOB:
> 
>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>     {"return": {}}
>     QMP> { "execute": "query-qmp-schema" }
>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> 
> Why does every command require 'id'?

The COMMAND_DROPPED event is one reason, as you mentioned in the other
thread. Meanwhile as long as we have OOB command it means that we can
have more than one commands running in parallel, then it's a must that
we can mark that out in the response to mark out which command we are
replying to.

> 
> Talking to a QMP monitor that doesn't support OOB:
> 
>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": []}}
>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>     {"error": {"class": "GenericError", "desc": "This monitor does not support Out-Of-Band (OOB)"}}
>     QMP> { "execute": "qmp_capabilities" }
>     {"return": {}}
>     QMP> { "execute": "query-kvm" }
>     {"return": {"enabled": true, "present": true}}
>     QMP> { "execute": "query-kvm", "control": { "run-oob": true } }
>     {"error": {"class": "GenericError", "desc": "Please enable Out-Of-Band first for the session during capabilities negotiation"}}
> 
> Telling people to enable OOB when that cannot be done is suboptimal.
> More so when it cannot be used here anyway.  I'll post a patch.

Thanks again!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 11:23         ` Markus Armbruster
@ 2018-06-27 12:07           ` Peter Xu
  2018-06-27 12:37             ` Eric Blake
  2018-06-28  6:55             ` Markus Armbruster
  0 siblings, 2 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-27 12:07 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Daniel P. Berrangé, qemu-devel

On Wed, Jun 27, 2018 at 01:23:07PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Jun 27, 2018 at 10:41:38AM +0200, Markus Armbruster wrote:
> >> Markus Armbruster <armbru@redhat.com> writes:
> >> 
> >> > Markus Armbruster <armbru@redhat.com> writes:
> >> >
> >> >> I fooled around a bit, and I think there are a few lose ends.
> >> > [...]
> >> >> Talking to a QMP monitor that supports OOB:
> >> >>
> >> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
> >> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
> >> >>     {"return": {}}
> >> >>     QMP> { "execute": "query-qmp-schema" }
> >> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> >> >>
> >> >> Why does every command require 'id'?
> >> >
> >> > I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
> >> >
> >> > [...]
> >> 
> >> Apropos COMMAND_DROPPED: we send an event rather than an error response
> >> because we may send it out-of-order.  Makes sense.
> >> 
> >> However, broadcasting it to all monitors doesn't make sense.  We could
> >> use a way to send an event to just one monitor.

True.

(Sorry for the late responses; I was on Linuxcon China in the past few
days)

> >
> > Worse than that - broadcasting to all monitors is categorically broken.
> > Different monitors make use the same "id" formatting scheme, so if you
> > broadcast COMMAND_DROPPED to a different monitor you might have clashing
> > "id" and thus incorrectly tell a client its command was dropped when in
> > fact it was processed. You'd have to be fairly unlucky in timing, but
> > it could happen.
> 
> Right.  Must fix bug.

Even more true.

> 
> I'm glad I went over this one more time, and in public!

I had a glance at current qmp-spec, it seems that we don't have any
restriction currently on "we must send events to all the monitors".
Does it mean that we should be free to have per-monitor events
starting from this event?

My current plan is that I can touch up scripts/qapi/events.py and
related stuff to allow QMPEventFuncEmit to take a monitor parameter,
then we pass in NULL when we want to send the event to all monitors.

Would that work?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  8:35     ` Markus Armbruster
@ 2018-06-27 12:32       ` Peter Xu
  2018-06-28  9:29         ` Markus Armbruster
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-27 12:32 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Wed, Jun 27, 2018 at 10:35:15AM +0200, Markus Armbruster wrote:
> Markus Armbruster <armbru@redhat.com> writes:
> 
> > Another lose end: event COMMAND_DROPPED seems to lack test coverage.
> 
> Hmm, dropping commands serves to limit the request queue.  What limits
> the response queue?

As long as we have a request queue limitation, that'll somehow be
"part of" the limitation of response queue.  Since the real responses
(let's not consider the events first) should be no more than the
maximum QMP requests that we allow in the request queue (one response
for one request).  In that sense it seems fine to me.

> 
> Before OOB, the monitor read at most one command, and wrote its response
> with monitor_puts().
> 
> For input, this leaves queueing to the kernel: if the client sends
> commands faster than the server can execute them, eventually the kernel
> refuses to buffer more, and the client's send either blocks or fails
> with EAGAIN.
> 
> Output is a mess.  monitor_puts() uses an output buffer.  It tries to
> flush at newline.  Issues:
> 
> * If flushing runs into a partial write, the unwritten remainder remains
>   in the output buffer until the next newline.  That newline may take
>   its own sweet time to arrive.  Could even lead to deadlocks, where a
>   client awaits complete output before it sends more input.  Bug,
>   predates OOB, doesn't block this series.

True.  Though I noticed that we have a "hackish" line in
monitor_json_emitter_raw():

    qstring_append_chr(json, '\n');

So it seems that at least we should never encounter a deadlock, after
all there will always be a newline there. But I'd say I agree with you
on that it's at least not that "beautiful". :-)

> 
> * If the client fails to read, the output buffer can grow without bound.
>   Not a security issue; the client is trusted.  Just bad workmanship.

True.

> 
> OOB doesn't change this for monitors running in the main thread.  Only
> mux chardevs run there.
> 
> Aside: keeping special case code around just for mux is a really bad
> idea.  We need to get rid of it.

We should be running the same code path even for MUX-ed typed, right?
Do you mean to put MUX-ed typed handling onto iothreads as well when
you say "get rid of it"?

> 
> For monitors running in an I/O thread, we add another buffer: the
> response queue.  It's drained by monitor_qmp_bh_responder().  I guess
> that means the response queue is effectively bounded by timely draining.
> Correct?

I don't see a timely manner to flush it, but as long as we queue
anything (including events) onto the response queue, we'll poke the
bottom half (in monitor_json_emitter() we call qemu_bh_schedule()) so
we'll possibly drain the queue very soon, and there should be no
chance to have a stale message in that queue.

> 
> Buffering twice seems silly, but that could be addressed in follow-up
> patches.

Do you mean that we can write the response immediately into
Monitor.outbuf, then only flush it in iothread?  IMHO that's fine -
after all, the response queue, as mentioned above, should have a
natural restriction as well due to the request queue, then we won't
waste too much resources for that.  Meanwhile using a queue with QMP
response objects seems to be a bit more cleaner to me from design pov
(though I might be wrong).

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 12:07           ` Peter Xu
@ 2018-06-27 12:37             ` Eric Blake
  2018-06-28  7:04               ` Markus Armbruster
  2018-06-28  6:55             ` Markus Armbruster
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Blake @ 2018-06-27 12:37 UTC (permalink / raw)
  To: Peter Xu, Markus Armbruster; +Cc: qemu-devel

On 06/27/2018 07:07 AM, Peter Xu wrote:

>>> Worse than that - broadcasting to all monitors is categorically broken.
>>> Different monitors make use the same "id" formatting scheme, so if you
>>> broadcast COMMAND_DROPPED to a different monitor you might have clashing
>>> "id" and thus incorrectly tell a client its command was dropped when in
>>> fact it was processed. You'd have to be fairly unlucky in timing, but
>>> it could happen.
>>
>> Right.  Must fix bug.
> 

> 
> My current plan is that I can touch up scripts/qapi/events.py and
> related stuff to allow QMPEventFuncEmit to take a monitor parameter,
> then we pass in NULL when we want to send the event to all monitors.
> 
> Would that work?

Makes sense to me. Also, right now, ALL callers of qapi_event_send_* 
pass &error_abort as their final parameter. If you're refactoring 
everything anyways, you could get rid of that parameter on the 
presumption that it doesn't buy us anything.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  8:41     ` Markus Armbruster
  2018-06-27 10:20       ` Daniel P. Berrangé
@ 2018-06-27 13:13       ` Markus Armbruster
  2018-06-27 13:28         ` Daniel P. Berrangé
  2018-06-27 13:34         ` Peter Xu
  1 sibling, 2 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-27 13:13 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

Monitor behavior changes even when the client rejects capability "oob".

Traditionally, the monitor reads, executes and responds to one command
after the other.  If the client sends in-band commands faster than the
server can execute them, the kernel will eventually refuse to buffer
more, and sending blocks or fails with EAGAIN.

To make OOB possible, we need to read and queue commands as we receive
them.  If the client sends in-band commands faster than the server can
execute them, the server will eventually drop commands to limit the
queue length.  The sever sends event COMMAND_DROPPED then.

However, we get the new behavior even when the client rejects capability
"oob".  We get the traditional behavior only when the server doesn't
offer "oob".

Is this what we want?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 13:13       ` Markus Armbruster
@ 2018-06-27 13:28         ` Daniel P. Berrangé
  2018-06-28 13:02           ` Markus Armbruster
  2018-06-27 13:34         ` Peter Xu
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrangé @ 2018-06-27 13:28 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Peter Xu, qemu-devel

On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
> Monitor behavior changes even when the client rejects capability "oob".
> 
> Traditionally, the monitor reads, executes and responds to one command
> after the other.  If the client sends in-band commands faster than the
> server can execute them, the kernel will eventually refuse to buffer
> more, and sending blocks or fails with EAGAIN.
> 
> To make OOB possible, we need to read and queue commands as we receive
> them.  If the client sends in-band commands faster than the server can
> execute them, the server will eventually drop commands to limit the
> queue length.  The sever sends event COMMAND_DROPPED then.
> 
> However, we get the new behavior even when the client rejects capability
> "oob".  We get the traditional behavior only when the server doesn't
> offer "oob".
> 
> Is this what we want?

IMHO the key benefit of allowing the client to reject the capability
is to enable backwards compat support. So this behaviour feels wrong.
Rejecting OOB should have same semantics as previous QEMU's without
OOB available, otherwise we now have 3 distinct ways the monitor
operates  (no OOB, OOB rejected, OOB accepted). This can only ever
lead to more bugs due to lack of testing of no OOB vs OOB rejected
scenarios.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 13:13       ` Markus Armbruster
  2018-06-27 13:28         ` Daniel P. Berrangé
@ 2018-06-27 13:34         ` Peter Xu
  2018-06-28 13:20           ` Markus Armbruster
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-27 13:34 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
> Monitor behavior changes even when the client rejects capability "oob".
> 
> Traditionally, the monitor reads, executes and responds to one command
> after the other.  If the client sends in-band commands faster than the
> server can execute them, the kernel will eventually refuse to buffer
> more, and sending blocks or fails with EAGAIN.
> 
> To make OOB possible, we need to read and queue commands as we receive
> them.  If the client sends in-band commands faster than the server can
> execute them, the server will eventually drop commands to limit the
> queue length.  The sever sends event COMMAND_DROPPED then.
> 
> However, we get the new behavior even when the client rejects capability
> "oob".  We get the traditional behavior only when the server doesn't
> offer "oob".
> 
> Is this what we want?

Not really.  But I thought we kept that old behavior, haven't we?

In handle_qmp_command() we have this:

    /*
     * If OOB is not enabled on the current monitor, we'll emulate the
     * old behavior that we won't process the current monitor any more
     * until it has responded.  This helps make sure that as long as
     * OOB is not enabled, the server will never drop any command.
     */
    if (!qmp_oob_enabled(mon)) {
        monitor_suspend(mon);
        req_obj->need_resume = true;
    } else {
        /* Drop the request if queue is full. */
        if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
            qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
            qapi_event_send_command_dropped(id,
                                            COMMAND_DROP_REASON_QUEUE_FULL,
                                            &error_abort);
            qmp_request_free(req_obj);
            return;
        }
    }

Here assuming that we are not enabling OOB, then since we will suspend
the monitor when we receive one command, then monitor_can_read() later
will check with a result that "we should not read the chardev", then
it'll leave all the data in the input buffer.  Then AFAIU the QMP
client that didn't enable OOB will have similar behavior as before.

Also, we will never receive a COMMAND_DROPPED event in that legacy
mode as well since it's in the "else" block.   Am I right?

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27  7:40   ` Markus Armbruster
  2018-06-27  8:35     ` Markus Armbruster
@ 2018-06-27 13:36     ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-27 13:36 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Wed, Jun 27, 2018 at 09:40:40AM +0200, Markus Armbruster wrote:
> Another lose end: event COMMAND_DROPPED seems to lack test coverage.

This one I can do.  Since I'll try to prepare some patches to fix the
event issue, I can try to add the test alongside.

Thanks for mentioning!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 12:07           ` Peter Xu
  2018-06-27 12:37             ` Eric Blake
@ 2018-06-28  6:55             ` Markus Armbruster
  2018-06-28 11:43               ` Eric Blake
  2018-06-29  8:18               ` Peter Xu
  1 sibling, 2 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28  6:55 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jun 27, 2018 at 01:23:07PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Jun 27, 2018 at 10:41:38AM +0200, Markus Armbruster wrote:
>> >> Markus Armbruster <armbru@redhat.com> writes:
>> >> 
>> >> > Markus Armbruster <armbru@redhat.com> writes:
>> >> >
>> >> >> I fooled around a bit, and I think there are a few lose ends.
>> >> > [...]
>> >> >> Talking to a QMP monitor that supports OOB:
>> >> >>
>> >> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>> >> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>> >> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>> >> >>     {"return": {}}
>> >> >>     QMP> { "execute": "query-qmp-schema" }
>> >> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>> >> >>
>> >> >> Why does every command require 'id'?
>> >> >
>> >> > I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
>> >> >
>> >> > [...]
>> >> 
>> >> Apropos COMMAND_DROPPED: we send an event rather than an error response
>> >> because we may send it out-of-order.  Makes sense.
>> >> 
>> >> However, broadcasting it to all monitors doesn't make sense.  We could
>> >> use a way to send an event to just one monitor.
>
> True.
>
> (Sorry for the late responses; I was on Linuxcon China in the past few
> days)

No need to be sorry :)

>> >
>> > Worse than that - broadcasting to all monitors is categorically broken.
>> > Different monitors make use the same "id" formatting scheme, so if you
>> > broadcast COMMAND_DROPPED to a different monitor you might have clashing
>> > "id" and thus incorrectly tell a client its command was dropped when in
>> > fact it was processed. You'd have to be fairly unlucky in timing, but
>> > it could happen.
>> 
>> Right.  Must fix bug.
>
> Even more true.
>
>> 
>> I'm glad I went over this one more time, and in public!
>
> I had a glance at current qmp-spec, it seems that we don't have any
> restriction currently on "we must send events to all the monitors".
> Does it mean that we should be free to have per-monitor events
> starting from this event?

Changing an existing event from broadcast to unicast is an observable
change in existing behavior.  Compatibility break unless we can show
nobody can use / uses the observation.

Creating a new event is not a compatibility break by itself[*],
regardless of broadcast vs. unicast.

> My current plan is that I can touch up scripts/qapi/events.py and
> related stuff to allow QMPEventFuncEmit to take a monitor parameter,
> then we pass in NULL when we want to send the event to all monitors.
>
> Would that work?

Think so.

Alternatively, a pair of functions:

    void qapi_event_bcast_EVENT(... event params ..., Error **errp);
    void qapi_event_send_EVENT(Monitor *mon, ... event params ..., Error **errp);

Slightly more compact in the broadcast case, might be a bit easier to
read.


[*] In the case of COMMAND_DROPPED, the compatibility break is dropping
commands (the event's trigger), caused by the command queuing feature.
That's why command queuing has to be opt-in.  Details discussed
elsewhere in this thread.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 12:37             ` Eric Blake
@ 2018-06-28  7:04               ` Markus Armbruster
  2018-06-29  7:20                 ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28  7:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: Peter Xu, Markus Armbruster, qemu-devel

Eric Blake <eblake@redhat.com> writes:

> On 06/27/2018 07:07 AM, Peter Xu wrote:
>
>>>> Worse than that - broadcasting to all monitors is categorically broken.
>>>> Different monitors make use the same "id" formatting scheme, so if you
>>>> broadcast COMMAND_DROPPED to a different monitor you might have clashing
>>>> "id" and thus incorrectly tell a client its command was dropped when in
>>>> fact it was processed. You'd have to be fairly unlucky in timing, but
>>>> it could happen.
>>>
>>> Right.  Must fix bug.
>>
>
>>
>> My current plan is that I can touch up scripts/qapi/events.py and
>> related stuff to allow QMPEventFuncEmit to take a monitor parameter,
>> then we pass in NULL when we want to send the event to all monitors.
>>
>> Would that work?
>
> Makes sense to me. Also, right now, ALL callers of qapi_event_send_*
> pass &error_abort as their final parameter. If you're refactoring
> everything anyways, you could get rid of that parameter on the
> presumption that it doesn't buy us anything.

Let me try to turn presumption into fact :)

The generated qapi_event_send_FOO() can detect the following error
conditions:

* Visitor fails.  But the QObject output visitor can't actually fail.

  Since the errp parameter is part of the abstract visitor interface, we
  can't eliminate it.  We can pass &error_abort.

* emit() function fails.  Can't happen, either.

  Let's eliminate the unused errp parameter of QMPEventFuncEmit.

  If we can eliminate the somewhat silly indirection through
  qmp_event_get_func_emit(), even better.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-27 11:59   ` [Qemu-devel] your mail Peter Xu
@ 2018-06-28  8:31     ` Markus Armbruster
  2018-06-28 11:51       ` Eric Blake
  2018-06-28 12:00       ` Daniel P. Berrangé
  0 siblings, 2 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28  8:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Tue, Jun 26, 2018 at 07:21:49PM +0200, Markus Armbruster wrote:
>> I fooled around a bit, and I think there are a few lose ends.
[...]
>> Talking to a QMP monitor that supports OOB:
>> 
>>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>>     {"return": {}}
>>     QMP> { "execute": "query-qmp-schema" }
>>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>> 
>> Why does every command require 'id'?

Why am I asking?  Requiring 'id' is rather onerous when you play with
QMP by hand.

> The COMMAND_DROPPED event is one reason, as you mentioned in the other
> thread. Meanwhile as long as we have OOB command it means that we can
> have more than one commands running in parallel, then it's a must that
> we can mark that out in the response to mark out which command we are
> replying to.

Without OOB, the client can match response to command, because it'll
receive responses in command issue order.

Example 1: after sending

    { "execute": "cmd1", ... }
    { "execute": "cmd2", ... }
    { "execute": "cmd3", ... }

the client will receive three responses, first one is cmd1's, second one
is cmd2's, third one is cmd3's.

With OOB, we have to independent command streams, in-band and
out-of-band.  Each separate stream's responses arrive in-order, but the
two streams may be interleaved.

Example 2: after sending

    { "execute": "cmd1", ... }
    { "execute": "cmd2", ... }
    { "execute": "cmd3", ... }
    { "execute": "oob1", "control": { "run-oob": true }, ... }
    { "execute": "oob2", "control": { "run-oob": true }, ... }
    { "execute": "oob3", "control": { "run-oob": true }, ... }

the client will receive six responses: cmd1's before cmd2's before
cmd3's, and oob1's before oob2's before oob3's.

If the client sends "id" with each command, it can match responses to
commands by id.

But that's not the only way to match.  Imagine the client had a perfect
oracle to tell it whether a response is in-band or out-of-band.  Then
matching can work pretty much as in example 1: the first in-band
response is cmd1's, the second one is cmd2's, and the third one is
cmd3's; the first out-of-band response is oob1's, the second one is
oob2's and the third one is oob3's.

Possible solutions other than requiring "id":

1. Make out-of-band responses recognizable

   Just like we copy "id" to the response, we could copy "run-oob", or
   maybe "control".

   Solves the "match response to command" problem, doesn't solve the
   "match COMMAND_DROPPED event to command" problem.

   Permit me a digression.  "control": { "run-oob": true } is awfully
   verbose.  Back when we hashed out the OOB design, I proposed to use
   "exec-oob" instead of "execute" to run a command OOB.  I missed when
   that morphed into flag "run-oob" wrapped in object "control".  Was it
   in response to reviewer feedback?  Can you give me a pointer?

   The counterpart to "exec-oob" would be "return-oob" and "error-oob".
   Hmm.

2. Do nothing; it's the client's problem

   If the client needs to match responses and events to commands, it
   should use the feature that was expressly designed for helping with
   that: "id".

   Note that QMP's initial design assumed asynchronous commands,
   i.e. respones that may arrive in any order.  Nevertheless, it did not
   require "id".  Same reasoning: if the client needs to match, it can
   use "id".

As so often when "do nothing" is a viable solution, it looks mighty
attractive to me :)

[...]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 12:32       ` Peter Xu
@ 2018-06-28  9:29         ` Markus Armbruster
  2018-06-29  9:42           ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28  9:29 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jun 27, 2018 at 10:35:15AM +0200, Markus Armbruster wrote:
>> Markus Armbruster <armbru@redhat.com> writes:
>> 
>> > Another lose end: event COMMAND_DROPPED seems to lack test coverage.
>> 
>> Hmm, dropping commands serves to limit the request queue.  What limits
>> the response queue?
>
> As long as we have a request queue limitation, that'll somehow be
> "part of" the limitation of response queue.  Since the real responses
> (let's not consider the events first) should be no more than the
> maximum QMP requests that we allow in the request queue (one response
> for one request).  In that sense it seems fine to me.

"Normal" flow of the request through the server (QEMU):

    receive
 -> handle_qmp_command()

 -> request queue: mon->qmp.qmp_requests

 -> monitor_qmp_bh_dispatcher()
 -> monitor_qmp_dispatch_one()
 -> monitor_qmp_respond()
 -> monitor_json_emitter()

 -> response queue: mon->qmp.qmp_requests

 -> monitor_qmp_response_pop_one()
 -> monitor_qmp_bh_responder()
 -> monitor_json_emitter_raw()
 -> monitor_puts()

 -> output buffer: mon->outbuf

 -> monitor_flush_locked()

The purpose of event COMMAND_DROPPED is flow control notification: when
the client sends more than we're willing to buffer, we drop the excess
and notify the client.

If the client sends too many requests too quickly, the request queue
fills up, and flow control kicks in.  Good.

As long as the client sends requests at a moderate pace, the request
queue never fills up: the dispatch & execute code sitting between
request queue and response queue drains it just fine.

The response queue proper also doesn't fill up: the emitter code sitting
between response queue and output buffer drains it just fine.

However, output buffer can still grow without bounds!
monitor_flush_locked() requires the client to keep up to make progress.
Our flow control fails then.

Extreme case: a (misbehaving!) client that keeps sending requests at a
moderate pace while not reading any responses.  output buffer grows
without bounds.

Less extreme case: a client sends a small number of requests quickly,
then reads responses very slowly or not at all for some reason, say
because the network goes bad right at this time.  Here, the size of the
request queue does limit the size of output buffer, as you proposed, but
the size multiplier isn't really known.

My point is: the idea "limiting the request queue also limits the
response queue + output buffer" isn't entirely wrong, but it's not
entirely right, either.

Can we improve flow control to cover the complete flow, not just the
flow into the request queue?

>> Before OOB, the monitor read at most one command, and wrote its response
>> with monitor_puts().
>> 
>> For input, this leaves queueing to the kernel: if the client sends
>> commands faster than the server can execute them, eventually the kernel
>> refuses to buffer more, and the client's send either blocks or fails
>> with EAGAIN.
>> 
>> Output is a mess.  monitor_puts() uses an output buffer.  It tries to
>> flush at newline.  Issues:
>> 
>> * If flushing runs into a partial write, the unwritten remainder remains
>>   in the output buffer until the next newline.  That newline may take
>>   its own sweet time to arrive.

Hmm, it's also flushed via mon->out_watch.  If that works how I guess it
does, there's no deadlock.

>>                                  Could even lead to deadlocks, where a
>>   client awaits complete output before it sends more input.  Bug,
>>   predates OOB, doesn't block this series.
>
> True.  Though I noticed that we have a "hackish" line in
> monitor_json_emitter_raw():
>
>     qstring_append_chr(json, '\n');
>
> So it seems that at least we should never encounter a deadlock, after
> all there will always be a newline there. But I'd say I agree with you
> on that it's at least not that "beautiful". :-)

The newline ensures responses arrive on their own line.  JSON doesn't
care about lines (makes sense), and qmp-spec.txt doesn't care, either
(that's wrong, in my opinion).  Anyone playing with QMP by hand will
definitely care.  Even QMP clients might.

The newline also triggers the actual write(), because monitor_puts() is
line-buffered.  That buffering makes sense for HMP, but it's useless for
QMP.  Let's not worry about that right now.

The flushing, however, is not guaranteed to write anything!  If
qemu_chr_fe_write() fails with EAGAIN, mon->outbuf remains unchanged.

>> * If the client fails to read, the output buffer can grow without bound.
>>   Not a security issue; the client is trusted.  Just bad workmanship.
>
> True.
>
>> 
>> OOB doesn't change this for monitors running in the main thread.  Only
>> mux chardevs run there.
>> 
>> Aside: keeping special case code around just for mux is a really bad
>> idea.  We need to get rid of it.
>
> We should be running the same code path even for MUX-ed typed, right?
> Do you mean to put MUX-ed typed handling onto iothreads as well when
> you say "get rid of it"?

I figure I'll cover this in my reply to Daniel.  If not, I'll reply to
this one again.

>> For monitors running in an I/O thread, we add another buffer: the
>> response queue.  It's drained by monitor_qmp_bh_responder().  I guess
>> that means the response queue is effectively bounded by timely draining.
>> Correct?
>
> I don't see a timely manner to flush it, but as long as we queue
> anything (including events) onto the response queue, we'll poke the
> bottom half (in monitor_json_emitter() we call qemu_bh_schedule()) so
> we'll possibly drain the queue very soon, and there should be no
> chance to have a stale message in that queue.

As long as bottom halves work, the response queue remains small.  That's
okay.

>> Buffering twice seems silly, but that could be addressed in follow-up
>> patches.
>
> Do you mean that we can write the response immediately into
> Monitor.outbuf, then only flush it in iothread?  IMHO that's fine -
> after all, the response queue, as mentioned above, should have a
> natural restriction as well due to the request queue, then we won't
> waste too much resources for that.  Meanwhile using a queue with QMP
> response objects seems to be a bit more cleaner to me from design pov
> (though I might be wrong).

Again, let's not worry about this right now.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-28  6:55             ` Markus Armbruster
@ 2018-06-28 11:43               ` Eric Blake
  2018-06-29  8:18               ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Blake @ 2018-06-28 11:43 UTC (permalink / raw)
  To: Markus Armbruster, Peter Xu; +Cc: qemu-devel

On 06/28/2018 01:55 AM, Markus Armbruster wrote:

> Changing an existing event from broadcast to unicast is an observable
> change in existing behavior.  Compatibility break unless we can show
> nobody can use / uses the observation.

And no one could have been relying on the broadcast COMMAND_DROPPED 
event semantics, since OOB was previously experimental and since we just 
proved that they are wrong if not limited to one monitor.

> 
> Creating a new event is not a compatibility break by itself[*],
> regardless of broadcast vs. unicast.
> 
>> My current plan is that I can touch up scripts/qapi/events.py and
>> related stuff to allow QMPEventFuncEmit to take a monitor parameter,
>> then we pass in NULL when we want to send the event to all monitors.
>>
>> Would that work?
> 
> Think so.
> 
> Alternatively, a pair of functions:
> 
>      void qapi_event_bcast_EVENT(... event params ..., Error **errp);
>      void qapi_event_send_EVENT(Monitor *mon, ... event params ..., Error **errp);
> 
> Slightly more compact in the broadcast case, might be a bit easier to
> read.

Also, fewer NULL arguments to add into existing call sites (although 
existing call sites would change to use the _bcast_ form, so you're 
already touching all call sites, which brings back the topic from the 
other mail on dropping useless errp while at it).

> 
> 
> [*] In the case of COMMAND_DROPPED, the compatibility break is dropping
> commands (the event's trigger), caused by the command queuing feature.
> That's why command queuing has to be opt-in.  Details discussed
> elsewhere in this thread.

Indeed, and I thought the conclusion there was that we DO promise that 
COMMAND_DROPPED (and dropped commands in general) won't happen unless 
you negotiate oob at initial connection.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-28  8:31     ` Markus Armbruster
@ 2018-06-28 11:51       ` Eric Blake
  2018-06-28 12:00       ` Daniel P. Berrangé
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Blake @ 2018-06-28 11:51 UTC (permalink / raw)
  To: Markus Armbruster, Peter Xu; +Cc: qemu-devel

On 06/28/2018 03:31 AM, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
>> On Tue, Jun 26, 2018 at 07:21:49PM +0200, Markus Armbruster wrote:
>>> I fooled around a bit, and I think there are a few lose ends.
> [...]
>>> Talking to a QMP monitor that supports OOB:
>>>
>>>      $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>>>      {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
>>>      QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
>>>      {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
>>>      QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
>>>      {"return": {}}
>>>      QMP> { "execute": "query-qmp-schema" }
>>>      {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
>>>
>>> Why does every command require 'id'?
> 
> Why am I asking?  Requiring 'id' is rather onerous when you play with
> QMP by hand.
> 

> Possible solutions other than requiring "id":
> 
> 1. Make out-of-band responses recognizable
> 
>     Just like we copy "id" to the response, we could copy "run-oob", or
>     maybe "control".
> 
>     Solves the "match response to command" problem, doesn't solve the
>     "match COMMAND_DROPPED event to command" problem.
> 
>     Permit me a digression.  "control": { "run-oob": true } is awfully
>     verbose.  Back when we hashed out the OOB design, I proposed to use
>     "exec-oob" instead of "execute" to run a command OOB.  I missed when
>     that morphed into flag "run-oob" wrapped in object "control".  Was it
>     in response to reviewer feedback?  Can you give me a pointer?

It's not too late to change back.  Since OOB is still new to 3.0, we 
could indeed go with a shorter invocation of 'exec-oob' (and I'd 
actually be in favor of such a change).

> 
>     The counterpart to "exec-oob" would be "return-oob" and "error-oob".
>     Hmm.

In other words, the change in the keyword of the response will let you 
know that it is replying to the most recent oob command.  This works 
well if we guarantee that we never have more than one unprocessed oob 
command in the pipeline at a time; but fails if oob commands can be 
threaded against one another and start replying in a different order 
than original submission.  Or, we could state that if you use 
'exec-oob', then 'id' is mandatory (and 'id' in the 'return' or 'error' 
is then sufficient to tie it back); but when using plain 'execute', 'id' 
remains optional.

> 
> 2. Do nothing; it's the client's problem
> 
>     If the client needs to match responses and events to commands, it
>     should use the feature that was expressly designed for helping with
>     that: "id".
> 
>     Note that QMP's initial design assumed asynchronous commands,
>     i.e. respones that may arrive in any order.  Nevertheless, it did not
>     require "id".  Same reasoning: if the client needs to match, it can
>     use "id".
> 
> As so often when "do nothing" is a viable solution, it looks mighty
> attractive to me :)

Indeed, although the documentation should still be explicit on 
recommending the use of 'id' when oob is enabled.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-28  8:31     ` Markus Armbruster
  2018-06-28 11:51       ` Eric Blake
@ 2018-06-28 12:00       ` Daniel P. Berrangé
  2018-06-29  9:57         ` Peter Xu
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrangé @ 2018-06-28 12:00 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Peter Xu, qemu-devel

On Thu, Jun 28, 2018 at 10:31:42AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Tue, Jun 26, 2018 at 07:21:49PM +0200, Markus Armbruster wrote:
> >> I fooled around a bit, and I think there are a few lose ends.
> [...]
> >> Talking to a QMP monitor that supports OOB:
> >> 
> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
> >>     {"return": {}}
> >>     QMP> { "execute": "query-qmp-schema" }
> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> >> 
> >> Why does every command require 'id'?
> 
> Why am I asking?  Requiring 'id' is rather onerous when you play with
> QMP by hand.
> 
> > The COMMAND_DROPPED event is one reason, as you mentioned in the other
> > thread. Meanwhile as long as we have OOB command it means that we can
> > have more than one commands running in parallel, then it's a must that
> > we can mark that out in the response to mark out which command we are
> > replying to.
> 
> Without OOB, the client can match response to command, because it'll
> receive responses in command issue order.
> 
> Example 1: after sending
> 
>     { "execute": "cmd1", ... }
>     { "execute": "cmd2", ... }
>     { "execute": "cmd3", ... }
> 
> the client will receive three responses, first one is cmd1's, second one
> is cmd2's, third one is cmd3's.
> 
> With OOB, we have to independent command streams, in-band and
> out-of-band.  Each separate stream's responses arrive in-order, but the
> two streams may be interleaved.
> 
> Example 2: after sending
> 
>     { "execute": "cmd1", ... }
>     { "execute": "cmd2", ... }
>     { "execute": "cmd3", ... }
>     { "execute": "oob1", "control": { "run-oob": true }, ... }
>     { "execute": "oob2", "control": { "run-oob": true }, ... }
>     { "execute": "oob3", "control": { "run-oob": true }, ... }
> 
> the client will receive six responses: cmd1's before cmd2's before
> cmd3's, and oob1's before oob2's before oob3's.

I thought the intention was that oob1, oob2, and oob3 are defined
to return in any order. It just happens that we only hve a single
oob processing stream right now, but we may have more in future.

> If the client sends "id" with each command, it can match responses to
> commands by id.
> 
> But that's not the only way to match.  Imagine the client had a perfect
> oracle to tell it whether a response is in-band or out-of-band.  Then
> matching can work pretty much as in example 1: the first in-band
> response is cmd1's, the second one is cmd2's, and the third one is
> cmd3's; the first out-of-band response is oob1's, the second one is
> oob2's and the third one is oob3's.

Not if oob1/2/3 can retunr in any order in future, which I think we
should be allowing fore.

IMHO 'id' should be mandatory for oob usage.

> Possible solutions other than requiring "id":
> 
> 1. Make out-of-band responses recognizable
> 
>    Just like we copy "id" to the response, we could copy "run-oob", or
>    maybe "control".
> 
>    Solves the "match response to command" problem, doesn't solve the
>    "match COMMAND_DROPPED event to command" problem.
> 
>    Permit me a digression.  "control": { "run-oob": true } is awfully
>    verbose.  Back when we hashed out the OOB design, I proposed to use
>    "exec-oob" instead of "execute" to run a command OOB.  I missed when
>    that morphed into flag "run-oob" wrapped in object "control".  Was it
>    in response to reviewer feedback?  Can you give me a pointer?
> 
>    The counterpart to "exec-oob" would be "return-oob" and "error-oob".
>    Hmm.

I do prefer the less verbose proposal too.

> 2. Do nothing; it's the client's problem
> 
>    If the client needs to match responses and events to commands, it
>    should use the feature that was expressly designed for helping with
>    that: "id".
> 
>    Note that QMP's initial design assumed asynchronous commands,
>    i.e. respones that may arrive in any order.  Nevertheless, it did not
>    require "id".  Same reasoning: if the client needs to match, it can
>    use "id".

IMHO we should just mandate 'id' when using "exec-oob", as it is
conceptually broken to use OOB without a way to match responses.
We don't want clients relying on all OOB replies coming in sequence
now, and then breaking when we allow multiple OOB processing threads.
That would require us to add a eec-oob-no-really-i-mean-it command
to allow overlapping OOB responses later.

> 
> As so often when "do nothing" is a viable solution, it looks mighty
> attractive to me :)
> 
> [...]
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 13:28         ` Daniel P. Berrangé
@ 2018-06-28 13:02           ` Markus Armbruster
  0 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28 13:02 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Markus Armbruster, qemu-devel, Peter Xu

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
>> Monitor behavior changes even when the client rejects capability "oob".
>> 
>> Traditionally, the monitor reads, executes and responds to one command
>> after the other.  If the client sends in-band commands faster than the
>> server can execute them, the kernel will eventually refuse to buffer
>> more, and sending blocks or fails with EAGAIN.
>> 
>> To make OOB possible, we need to read and queue commands as we receive
>> them.  If the client sends in-band commands faster than the server can
>> execute them, the server will eventually drop commands to limit the
>> queue length.  The sever sends event COMMAND_DROPPED then.
>> 
>> However, we get the new behavior even when the client rejects capability
>> "oob".  We get the traditional behavior only when the server doesn't
>> offer "oob".
>> 
>> Is this what we want?
>
> IMHO the key benefit of allowing the client to reject the capability
> is to enable backwards compat support. So this behaviour feels wrong.
> Rejecting OOB should have same semantics as previous QEMU's without
> OOB available, otherwise we now have 3 distinct ways the monitor
> operates  (no OOB, OOB rejected, OOB accepted). This can only ever
> lead to more bugs due to lack of testing of no OOB vs OOB rejected
> scenarios.

Agreed.

We have three configuration cases

* OOB not offered (because MUX)

* OOB offered, but rejected by client

* OOB offered and accepted

We want to map them to two run time cases

* OOB off

* OOB on

We may use "server offered OOB" only for configuration purposes.

Keep that in mind when reading my reply to Peter's reply.

Aside: it would be nice to get rid of the configuration case "OOB not
offered", but that's for later.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-27 13:34         ` Peter Xu
@ 2018-06-28 13:20           ` Markus Armbruster
  2018-06-29  9:01             ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-06-28 13:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Daniel P. Berrangé

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
>> Monitor behavior changes even when the client rejects capability "oob".
>> 
>> Traditionally, the monitor reads, executes and responds to one command
>> after the other.  If the client sends in-band commands faster than the
>> server can execute them, the kernel will eventually refuse to buffer
>> more, and sending blocks or fails with EAGAIN.
>> 
>> To make OOB possible, we need to read and queue commands as we receive
>> them.  If the client sends in-band commands faster than the server can
>> execute them, the server will eventually drop commands to limit the
>> queue length.  The sever sends event COMMAND_DROPPED then.
>> 
>> However, we get the new behavior even when the client rejects capability
>> "oob".  We get the traditional behavior only when the server doesn't
>> offer "oob".
>> 
>> Is this what we want?
>
> Not really.  But I thought we kept that old behavior, haven't we?
>
> In handle_qmp_command() we have this:
>
>     /*
>      * If OOB is not enabled on the current monitor, we'll emulate the
>      * old behavior that we won't process the current monitor any more
>      * until it has responded.  This helps make sure that as long as
>      * OOB is not enabled, the server will never drop any command.
>      */
>     if (!qmp_oob_enabled(mon)) {
>         monitor_suspend(mon);
>         req_obj->need_resume = true;
>     } else {
>         /* Drop the request if queue is full. */
>         if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
>             qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
>             qapi_event_send_command_dropped(id,
>                                             COMMAND_DROP_REASON_QUEUE_FULL,
>                                             &error_abort);
>             qmp_request_free(req_obj);
>             return;
>         }
>     }
>
> Here assuming that we are not enabling OOB, then since we will suspend
> the monitor when we receive one command, then monitor_can_read() later
> will check with a result that "we should not read the chardev", then
> it'll leave all the data in the input buffer.  Then AFAIU the QMP
> client that didn't enable OOB will have similar behavior as before.
>
> Also, we will never receive a COMMAND_DROPPED event in that legacy
> mode as well since it's in the "else" block.   Am I right?

Hmm.  Did I get confused?  Let me look again.

Some places check qmp_oob_enabled(), which is true when the server
offered capability "oob" and the client accepted.  In terms of my reply
to Daniel, it distinguishes the two run time cases "OOB off" and "OOB
on".

Other places check ->use_io_thr, which is true when

    (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr)

in monitor_init().

One of these places is get_qmp_greeting().  It makes the server offer
"oob" when ->use_io_thr.  Makes sense.

In terms of my reply to Daniel, ->use_io_thr distinguishes between "OOB
not offered" and ("OOB offered, but rejected by client" or "OOB offered
and accepted").

Uses could to violate 'may use "server offered OOB" only for
configuration purposes', so let's check them:

* monitor_json_emitter()

  If mon->use_io_thr, push to response queue.  Else emit directly.

  I'm afraid run time behavior differs for "OOB not offered" (emit
  directly) and "OOB offered by rejected" (queue).

* qmp_caps_check()

  If !mon->use_io_thr, reject client's acceptance of "oob".  Implicitly
  recomputing the set of offered capabilities here is less than elegant,
  but it's not wrong.

* monitor_resume()

  If mon->use_io_thr(), kick the monitor I/O thread.

  Again, different behavior for the two variations of "OOB off".

* get_qmp_greeting()

  Covered above, looks good to me.

* monitor_qmp_setup_handlers_bh()

  If mon->use_io_thr(), pass the monitor I/O thread's AIOContext to
  qemu_chr_fe_set_handlers(), else pass NULL.

  Again, different behavior for the two variations of "OOB off".

* monitor_init()

  If mon->use_io_thr, set up bottom half
  monitor_qmp_setup_handlers_bh(), else call qemu_chr_fe_set_handlers()
  directly.

  Again, different behavior for the two variations of "OOB off".

Either I am confused now, or the two variations of "OOB off" execute
different code at run time.  Which is it?

If it's different code, are the differences observable for the client?

Observable or not, I suspect the differences should not exist.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-28  7:04               ` Markus Armbruster
@ 2018-06-29  7:20                 ` Peter Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-29  7:20 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Eric Blake, qemu-devel

On Thu, Jun 28, 2018 at 09:04:19AM +0200, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
> > On 06/27/2018 07:07 AM, Peter Xu wrote:
> >
> >>>> Worse than that - broadcasting to all monitors is categorically broken.
> >>>> Different monitors make use the same "id" formatting scheme, so if you
> >>>> broadcast COMMAND_DROPPED to a different monitor you might have clashing
> >>>> "id" and thus incorrectly tell a client its command was dropped when in
> >>>> fact it was processed. You'd have to be fairly unlucky in timing, but
> >>>> it could happen.
> >>>
> >>> Right.  Must fix bug.
> >>
> >
> >>
> >> My current plan is that I can touch up scripts/qapi/events.py and
> >> related stuff to allow QMPEventFuncEmit to take a monitor parameter,
> >> then we pass in NULL when we want to send the event to all monitors.
> >>
> >> Would that work?
> >
> > Makes sense to me. Also, right now, ALL callers of qapi_event_send_*
> > pass &error_abort as their final parameter. If you're refactoring
> > everything anyways, you could get rid of that parameter on the
> > presumption that it doesn't buy us anything.
> 
> Let me try to turn presumption into fact :)
> 
> The generated qapi_event_send_FOO() can detect the following error
> conditions:
> 
> * Visitor fails.  But the QObject output visitor can't actually fail.
> 
>   Since the errp parameter is part of the abstract visitor interface, we
>   can't eliminate it.  We can pass &error_abort.
> 
> * emit() function fails.  Can't happen, either.
> 
>   Let's eliminate the unused errp parameter of QMPEventFuncEmit.

Hmm, I think I got your point.

> 
>   If we can eliminate the somewhat silly indirection through
>   qmp_event_get_func_emit(), even better.

It confused me before on why we had that after all. Let me give it a
shot.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-28  6:55             ` Markus Armbruster
  2018-06-28 11:43               ` Eric Blake
@ 2018-06-29  8:18               ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-29  8:18 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Thu, Jun 28, 2018 at 08:55:48AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jun 27, 2018 at 01:23:07PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Wed, Jun 27, 2018 at 10:41:38AM +0200, Markus Armbruster wrote:
> >> >> Markus Armbruster <armbru@redhat.com> writes:
> >> >> 
> >> >> > Markus Armbruster <armbru@redhat.com> writes:
> >> >> >
> >> >> >> I fooled around a bit, and I think there are a few lose ends.
> >> >> > [...]
> >> >> >> Talking to a QMP monitor that supports OOB:
> >> >> >>
> >> >> >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> >> >> >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
> >> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
> >> >> >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
> >> >> >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
> >> >> >>     {"return": {}}
> >> >> >>     QMP> { "execute": "query-qmp-schema" }
> >> >> >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> >> >> >>
> >> >> >> Why does every command require 'id'?
> >> >> >
> >> >> > I found one reason: event COMMAND_DROPPED wants it.  Any other reason?
> >> >> >
> >> >> > [...]
> >> >> 
> >> >> Apropos COMMAND_DROPPED: we send an event rather than an error response
> >> >> because we may send it out-of-order.  Makes sense.
> >> >> 
> >> >> However, broadcasting it to all monitors doesn't make sense.  We could
> >> >> use a way to send an event to just one monitor.
> >
> > True.
> >
> > (Sorry for the late responses; I was on Linuxcon China in the past few
> > days)
> 
> No need to be sorry :)
> 
> >> >
> >> > Worse than that - broadcasting to all monitors is categorically broken.
> >> > Different monitors make use the same "id" formatting scheme, so if you
> >> > broadcast COMMAND_DROPPED to a different monitor you might have clashing
> >> > "id" and thus incorrectly tell a client its command was dropped when in
> >> > fact it was processed. You'd have to be fairly unlucky in timing, but
> >> > it could happen.
> >> 
> >> Right.  Must fix bug.
> >
> > Even more true.
> >
> >> 
> >> I'm glad I went over this one more time, and in public!
> >
> > I had a glance at current qmp-spec, it seems that we don't have any
> > restriction currently on "we must send events to all the monitors".
> > Does it mean that we should be free to have per-monitor events
> > starting from this event?
> 
> Changing an existing event from broadcast to unicast is an observable
> change in existing behavior.  Compatibility break unless we can show
> nobody can use / uses the observation.
> 
> Creating a new event is not a compatibility break by itself[*],
> regardless of broadcast vs. unicast.

Yeah.

Though this change will be a bit unique in that basically we should
not break anyone but fix potential problems that the clients might
encounter.

Firstly I really suspect anyone has really started to use OOB
commands...  But let's just assume there is.

Then if a client has implemented the COMMAND_DROPPED event, our change
should still keep that event be there when any of the command is
dropped for that monitor.  What we really changed in the behavior is
that we won't send incorrect COMMAND_DROPPED events to them which
should not belong to them.  From that point of view, it's not really
the spec/api that is changed, we just try to fix a bug from a broken
implementation (from me) of the spec. :)

So if you would agree I'd prefer to directly change that bit to change
COMMAND_DROPPED as per-monitor event.  It should only benefit existing
clients (from not receiving broken events) rather than breaking any of
them.

> 
> > My current plan is that I can touch up scripts/qapi/events.py and
> > related stuff to allow QMPEventFuncEmit to take a monitor parameter,
> > then we pass in NULL when we want to send the event to all monitors.
> >
> > Would that work?
> 
> Think so.
> 
> Alternatively, a pair of functions:
> 
>     void qapi_event_bcast_EVENT(... event params ..., Error **errp);
>     void qapi_event_send_EVENT(Monitor *mon, ... event params ..., Error **errp);
> 
> Slightly more compact in the broadcast case, might be a bit easier to
> read.

Yeah I can try this.

> 
> 
> [*] In the case of COMMAND_DROPPED, the compatibility break is dropping
> commands (the event's trigger), caused by the command queuing feature.
> That's why command queuing has to be opt-in.  Details discussed
> elsewhere in this thread.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-28 13:20           ` Markus Armbruster
@ 2018-06-29  9:01             ` Peter Xu
  2018-07-18 15:08               ` Markus Armbruster
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-29  9:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Daniel P. Berrangé

On Thu, Jun 28, 2018 at 03:20:41PM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
> >> Monitor behavior changes even when the client rejects capability "oob".
> >> 
> >> Traditionally, the monitor reads, executes and responds to one command
> >> after the other.  If the client sends in-band commands faster than the
> >> server can execute them, the kernel will eventually refuse to buffer
> >> more, and sending blocks or fails with EAGAIN.
> >> 
> >> To make OOB possible, we need to read and queue commands as we receive
> >> them.  If the client sends in-band commands faster than the server can
> >> execute them, the server will eventually drop commands to limit the
> >> queue length.  The sever sends event COMMAND_DROPPED then.
> >> 
> >> However, we get the new behavior even when the client rejects capability
> >> "oob".  We get the traditional behavior only when the server doesn't
> >> offer "oob".
> >> 
> >> Is this what we want?
> >
> > Not really.  But I thought we kept that old behavior, haven't we?
> >
> > In handle_qmp_command() we have this:
> >
> >     /*
> >      * If OOB is not enabled on the current monitor, we'll emulate the
> >      * old behavior that we won't process the current monitor any more
> >      * until it has responded.  This helps make sure that as long as
> >      * OOB is not enabled, the server will never drop any command.
> >      */
> >     if (!qmp_oob_enabled(mon)) {
> >         monitor_suspend(mon);
> >         req_obj->need_resume = true;
> >     } else {
> >         /* Drop the request if queue is full. */
> >         if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
> >             qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> >             qapi_event_send_command_dropped(id,
> >                                             COMMAND_DROP_REASON_QUEUE_FULL,
> >                                             &error_abort);
> >             qmp_request_free(req_obj);
> >             return;
> >         }
> >     }
> >
> > Here assuming that we are not enabling OOB, then since we will suspend
> > the monitor when we receive one command, then monitor_can_read() later
> > will check with a result that "we should not read the chardev", then
> > it'll leave all the data in the input buffer.  Then AFAIU the QMP
> > client that didn't enable OOB will have similar behavior as before.
> >
> > Also, we will never receive a COMMAND_DROPPED event in that legacy
> > mode as well since it's in the "else" block.   Am I right?
> 
> Hmm.  Did I get confused?  Let me look again.
> 
> Some places check qmp_oob_enabled(), which is true when the server
> offered capability "oob" and the client accepted.  In terms of my reply
> to Daniel, it distinguishes the two run time cases "OOB off" and "OOB
> on".

Exactly.

> 
> Other places check ->use_io_thr, which is true when
> 
>     (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr)
> 
> in monitor_init().
> 
> One of these places is get_qmp_greeting().  It makes the server offer
> "oob" when ->use_io_thr.  Makes sense.
> 
> In terms of my reply to Daniel, ->use_io_thr distinguishes between "OOB
> not offered" and ("OOB offered, but rejected by client" or "OOB offered
> and accepted").

Exactly.

To be more clear, let's name the three cases (as you mentioned):

(A) OOB not offered
(B) OOB offered, but rejected by client
(C) OOB offered, and accepted

Then:

(1) qmp_oob_enabled() will only be return true if (C).
(2) ->use_io_thr will only be true if (B) or (C).

> 
> Uses could to violate 'may use "server offered OOB" only for
> configuration purposes', so let's check them:
> 
> * monitor_json_emitter()
> 
>   If mon->use_io_thr, push to response queue.  Else emit directly.
> 
>   I'm afraid run time behavior differs for "OOB not offered" (emit
>   directly) and "OOB offered by rejected" (queue).

Yeah it's different, but logically it's the same.  IMHO it's only a
fast path for main thread.  In other word, I think it can still work
correctly if we remove that else clause.  In that sense, it's not
really a "true" behavior change.

> 
> * qmp_caps_check()
> 
>   If !mon->use_io_thr, reject client's acceptance of "oob".  Implicitly
>   recomputing the set of offered capabilities here is less than elegant,
>   but it's not wrong.
> 
> * monitor_resume()
> 
>   If mon->use_io_thr(), kick the monitor I/O thread.
> 
>   Again, different behavior for the two variations of "OOB off".

This is another example like above - IMHO we can just kick it no
matter what, but we just don't need to do that for main thread (since
we are in main thread so we are sure it is not asleep).  It's just
another trivial enhancement on top of the main logic.

> 
> * get_qmp_greeting()
> 
>   Covered above, looks good to me.
> 
> * monitor_qmp_setup_handlers_bh()
> 
>   If mon->use_io_thr(), pass the monitor I/O thread's AIOContext to
>   qemu_chr_fe_set_handlers(), else pass NULL.
> 
>   Again, different behavior for the two variations of "OOB off".

This is different indeed, but IMHO it's not really "behavior
difference", it's just the context (thread to run the code) that is
different.  The major code path for case (A) and (B) (which are the
two "off" cases) should be the same (when I say "major", I excluded
those trivial enhancements that I mentioned).

> 
> * monitor_init()
> 
>   If mon->use_io_thr, set up bottom half
>   monitor_qmp_setup_handlers_bh(), else call qemu_chr_fe_set_handlers()
>   directly.

This is indeed a bit tricky (to avoid a potential race that Stefan has
pointed out), however after things are setup it'll be exactly the same
code path for both OFF cases.

And actually when I read the code I noticed that we can actually
simplify the code with this:

diff --git a/monitor.c b/monitor.c
index 0730a27172..5d6bff8d51 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4635,18 +4635,14 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
     Monitor *mon = opaque;
     GMainContext *context;

-    if (mon->use_io_thr) {
-        /*
-         * When use_io_thr is set, we use the global shared dedicated
-         * IO thread for this monitor to handle input/output.
-         */
-        context = monitor_get_io_context();
-        /* We should have inited globals before reaching here. */
-        assert(context);
-    } else {
-        /* The default main loop, which is the main thread */
-        context = NULL;
-    }
+    assert(mon->use_io_thr);
+    /*
+     * When use_io_thr is set, we use the global shared dedicated
+     * IO thread for this monitor to handle input/output.
+     */
+    context = monitor_get_io_context();
+    /* We should have inited globals before reaching here. */
+    assert(context);

     qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
                              monitor_qmp_event, NULL, mon, context, true);

Since monitor_qmp_setup_handlers_bh() will only be used for IOThread
case.  I can post a patch for that.

> 
>   Again, different behavior for the two variations of "OOB off".
> 
> Either I am confused now, or the two variations of "OOB off" execute
> different code at run time.  Which is it?

As mentioned, they should be running the same code at run time.
Though with some trivial differences, but if you see above most of
them should only be small enhancements which can actually be removed.

> 
> If it's different code, are the differences observable for the client?

AFAIU since the code path should mostly be the same for the two OOF
cases, IMHO there should have no difference observable from the
client, otherwise it's buggy and I'd be willing to fix it.

> 
> Observable or not, I suspect the differences should not exist.

We can remove those "trivial enhancements" if we want to make sure the
code paths are exactly the same, but I'm not sure whether that's
really what we need (or we just need to make sure it's unobservable).

Thanks!

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-28  9:29         ` Markus Armbruster
@ 2018-06-29  9:42           ` Peter Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-06-29  9:42 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Thu, Jun 28, 2018 at 11:29:30AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jun 27, 2018 at 10:35:15AM +0200, Markus Armbruster wrote:
> >> Markus Armbruster <armbru@redhat.com> writes:
> >> 
> >> > Another lose end: event COMMAND_DROPPED seems to lack test coverage.
> >> 
> >> Hmm, dropping commands serves to limit the request queue.  What limits
> >> the response queue?
> >
> > As long as we have a request queue limitation, that'll somehow be
> > "part of" the limitation of response queue.  Since the real responses
> > (let's not consider the events first) should be no more than the
> > maximum QMP requests that we allow in the request queue (one response
> > for one request).  In that sense it seems fine to me.
> 
> "Normal" flow of the request through the server (QEMU):
> 
>     receive
>  -> handle_qmp_command()
> 
>  -> request queue: mon->qmp.qmp_requests
> 
>  -> monitor_qmp_bh_dispatcher()
>  -> monitor_qmp_dispatch_one()
>  -> monitor_qmp_respond()
>  -> monitor_json_emitter()
> 
>  -> response queue: mon->qmp.qmp_requests
> 
>  -> monitor_qmp_response_pop_one()
>  -> monitor_qmp_bh_responder()
>  -> monitor_json_emitter_raw()
>  -> monitor_puts()
> 
>  -> output buffer: mon->outbuf
> 
>  -> monitor_flush_locked()
> 
> The purpose of event COMMAND_DROPPED is flow control notification: when
> the client sends more than we're willing to buffer, we drop the excess
> and notify the client.
> 
> If the client sends too many requests too quickly, the request queue
> fills up, and flow control kicks in.  Good.
> 
> As long as the client sends requests at a moderate pace, the request
> queue never fills up: the dispatch & execute code sitting between
> request queue and response queue drains it just fine.
> 
> The response queue proper also doesn't fill up: the emitter code sitting
> between response queue and output buffer drains it just fine.
> 
> However, output buffer can still grow without bounds!
> monitor_flush_locked() requires the client to keep up to make progress.
> Our flow control fails then.
> 
> Extreme case: a (misbehaving!) client that keeps sending requests at a
> moderate pace while not reading any responses.  output buffer grows
> without bounds.
> 
> Less extreme case: a client sends a small number of requests quickly,
> then reads responses very slowly or not at all for some reason, say
> because the network goes bad right at this time.  Here, the size of the
> request queue does limit the size of output buffer, as you proposed, but
> the size multiplier isn't really known.
> 
> My point is: the idea "limiting the request queue also limits the
> response queue + output buffer" isn't entirely wrong, but it's not
> entirely right, either.

Good point!  I obviously overlooked this.

> 
> Can we improve flow control to cover the complete flow, not just the
> flow into the request queue?

How about we simply apply the queue length to the response queue as
well?  Here's the steps:

(1) A new CommandDropReason called COMMAND_DROP_REASON_RESPONSE_FULL,
    showing that one command is dropped since response queue of the
    monitor is full.

(2) When handle QMP command, we not only check request queue, but also
    response queue.  If response queue length is bigger than
    QMP_REQ_QUEUE_LEN_MAX, then we drop the command with the reason
    COMMAND_DROP_REASON_RESPONSE_FULL.

We can do more fine-grained flow control in the future, though I hope
this would work for us as the first step.

> 
> >> Before OOB, the monitor read at most one command, and wrote its response
> >> with monitor_puts().
> >> 
> >> For input, this leaves queueing to the kernel: if the client sends
> >> commands faster than the server can execute them, eventually the kernel
> >> refuses to buffer more, and the client's send either blocks or fails
> >> with EAGAIN.
> >> 
> >> Output is a mess.  monitor_puts() uses an output buffer.  It tries to
> >> flush at newline.  Issues:
> >> 
> >> * If flushing runs into a partial write, the unwritten remainder remains
> >>   in the output buffer until the next newline.  That newline may take
> >>   its own sweet time to arrive.
> 
> Hmm, it's also flushed via mon->out_watch.  If that works how I guess it
> does, there's no deadlock.
> 
> >>                                  Could even lead to deadlocks, where a
> >>   client awaits complete output before it sends more input.  Bug,
> >>   predates OOB, doesn't block this series.
> >
> > True.  Though I noticed that we have a "hackish" line in
> > monitor_json_emitter_raw():
> >
> >     qstring_append_chr(json, '\n');
> >
> > So it seems that at least we should never encounter a deadlock, after
> > all there will always be a newline there. But I'd say I agree with you
> > on that it's at least not that "beautiful". :-)
> 
> The newline ensures responses arrive on their own line.  JSON doesn't
> care about lines (makes sense), and qmp-spec.txt doesn't care, either
> (that's wrong, in my opinion).  Anyone playing with QMP by hand will
> definitely care.  Even QMP clients might.
> 
> The newline also triggers the actual write(), because monitor_puts() is
> line-buffered.  That buffering makes sense for HMP, but it's useless for
> QMP.  Let's not worry about that right now.

Yeah.

> 
> The flushing, however, is not guaranteed to write anything!  If
> qemu_chr_fe_write() fails with EAGAIN, mon->outbuf remains unchanged.

Ouch!  I know that watch, but I didn't notice that it's actually not
following the general semantics of "flush".  Though it's for sure
understandable since we don't want to hang-death the main thread, but
the name "flush" might be misleading.

> 
> >> * If the client fails to read, the output buffer can grow without bound.
> >>   Not a security issue; the client is trusted.  Just bad workmanship.
> >
> > True.
> >
> >> 
> >> OOB doesn't change this for monitors running in the main thread.  Only
> >> mux chardevs run there.
> >> 
> >> Aside: keeping special case code around just for mux is a really bad
> >> idea.  We need to get rid of it.
> >
> > We should be running the same code path even for MUX-ed typed, right?
> > Do you mean to put MUX-ed typed handling onto iothreads as well when
> > you say "get rid of it"?
> 
> I figure I'll cover this in my reply to Daniel.  If not, I'll reply to
> this one again.
> 
> >> For monitors running in an I/O thread, we add another buffer: the
> >> response queue.  It's drained by monitor_qmp_bh_responder().  I guess
> >> that means the response queue is effectively bounded by timely draining.
> >> Correct?
> >
> > I don't see a timely manner to flush it, but as long as we queue
> > anything (including events) onto the response queue, we'll poke the
> > bottom half (in monitor_json_emitter() we call qemu_bh_schedule()) so
> > we'll possibly drain the queue very soon, and there should be no
> > chance to have a stale message in that queue.
> 
> As long as bottom halves work, the response queue remains small.  That's
> okay.
> 
> >> Buffering twice seems silly, but that could be addressed in follow-up
> >> patches.
> >
> > Do you mean that we can write the response immediately into
> > Monitor.outbuf, then only flush it in iothread?  IMHO that's fine -
> > after all, the response queue, as mentioned above, should have a
> > natural restriction as well due to the request queue, then we won't
> > waste too much resources for that.  Meanwhile using a queue with QMP
> > response objects seems to be a bit more cleaner to me from design pov
> > (though I might be wrong).
> 
> Again, let's not worry about this right now.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-28 12:00       ` Daniel P. Berrangé
@ 2018-06-29  9:57         ` Peter Xu
  2018-06-29 15:40           ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-06-29  9:57 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Markus Armbruster, qemu-devel

On Thu, Jun 28, 2018 at 01:00:44PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 28, 2018 at 10:31:42AM +0200, Markus Armbruster wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > On Tue, Jun 26, 2018 at 07:21:49PM +0200, Markus Armbruster wrote:
> > >> I fooled around a bit, and I think there are a few lose ends.
> > [...]
> > >> Talking to a QMP monitor that supports OOB:
> > >> 
> > >>     $ socat UNIX:test-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> > >>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1703-gb909799463"}, "capabilities": ["oob"]}}
> > >>     QMP> { "execute": "qmp_capabilities", "arguments": { "oob": true } }
> > >>     {"error": {"class": "GenericError", "desc": "Parameter 'oob' is unexpected"}}
> > >>     QMP> { "execute": "qmp_capabilities", "arguments": { "enable": ["oob"] } }
> > >>     {"return": {}}
> > >>     QMP> { "execute": "query-qmp-schema" }
> > >>     {"error": {"class": "GenericError", "desc": "Out-Of-Band capability requires that every command contains an 'id' field"}}
> > >> 
> > >> Why does every command require 'id'?
> > 
> > Why am I asking?  Requiring 'id' is rather onerous when you play with
> > QMP by hand.
> > 
> > > The COMMAND_DROPPED event is one reason, as you mentioned in the other
> > > thread. Meanwhile as long as we have OOB command it means that we can
> > > have more than one commands running in parallel, then it's a must that
> > > we can mark that out in the response to mark out which command we are
> > > replying to.
> > 
> > Without OOB, the client can match response to command, because it'll
> > receive responses in command issue order.
> > 
> > Example 1: after sending
> > 
> >     { "execute": "cmd1", ... }
> >     { "execute": "cmd2", ... }
> >     { "execute": "cmd3", ... }
> > 
> > the client will receive three responses, first one is cmd1's, second one
> > is cmd2's, third one is cmd3's.
> > 
> > With OOB, we have to independent command streams, in-band and
> > out-of-band.  Each separate stream's responses arrive in-order, but the
> > two streams may be interleaved.
> > 
> > Example 2: after sending
> > 
> >     { "execute": "cmd1", ... }
> >     { "execute": "cmd2", ... }
> >     { "execute": "cmd3", ... }
> >     { "execute": "oob1", "control": { "run-oob": true }, ... }
> >     { "execute": "oob2", "control": { "run-oob": true }, ... }
> >     { "execute": "oob3", "control": { "run-oob": true }, ... }
> > 
> > the client will receive six responses: cmd1's before cmd2's before
> > cmd3's, and oob1's before oob2's before oob3's.
> 
> I thought the intention was that oob1, oob2, and oob3 are defined
> to return in any order. It just happens that we only hve a single
> oob processing stream right now, but we may have more in future.

Exactly.

> 
> > If the client sends "id" with each command, it can match responses to
> > commands by id.
> > 
> > But that's not the only way to match.  Imagine the client had a perfect
> > oracle to tell it whether a response is in-band or out-of-band.  Then
> > matching can work pretty much as in example 1: the first in-band
> > response is cmd1's, the second one is cmd2's, and the third one is
> > cmd3's; the first out-of-band response is oob1's, the second one is
> > oob2's and the third one is oob3's.
> 
> Not if oob1/2/3 can retunr in any order in future, which I think we
> should be allowing fore.
> 
> IMHO 'id' should be mandatory for oob usage.
> 
> > Possible solutions other than requiring "id":
> > 
> > 1. Make out-of-band responses recognizable
> > 
> >    Just like we copy "id" to the response, we could copy "run-oob", or
> >    maybe "control".
> > 
> >    Solves the "match response to command" problem, doesn't solve the
> >    "match COMMAND_DROPPED event to command" problem.
> > 
> >    Permit me a digression.  "control": { "run-oob": true } is awfully
> >    verbose.  Back when we hashed out the OOB design, I proposed to use
> >    "exec-oob" instead of "execute" to run a command OOB.  I missed when
> >    that morphed into flag "run-oob" wrapped in object "control".  Was it
> >    in response to reviewer feedback?  Can you give me a pointer?

It's me who proposed this, not from any reviewer's feedback.  It just
happened that no one was against it.

> > 
> >    The counterpart to "exec-oob" would be "return-oob" and "error-oob".
> >    Hmm.
> 
> I do prefer the less verbose proposal too.

But frankly speaking I still prefer current way.  QMP is majorly for
clients (computer programs) rather than users to use it.  Comparing to
verbosity, I would care more about coherency for how we define the
interface.  Say, OOB execution is still one way to execute a command,
so naturally it should still be using the same way that we execute a
QMP command, we just need some extra fields to tell the server that
"this message is more urgent than the other ones".

If we are discussing HMP interfaces, I'll have the same preference
with both of you to consider more about whether it's user-friendly or
not.  But now we are talking about QMP, then I'll prefer "control".

In the future, it's also possible to add some more sub-fields into the
"control" field.  When that happens, do we want to introduce another
standalone wording for that?  I would assume the answer is no.

> 
> > 2. Do nothing; it's the client's problem
> > 
> >    If the client needs to match responses and events to commands, it
> >    should use the feature that was expressly designed for helping with
> >    that: "id".
> > 
> >    Note that QMP's initial design assumed asynchronous commands,
> >    i.e. respones that may arrive in any order.  Nevertheless, it did not
> >    require "id".  Same reasoning: if the client needs to match, it can
> >    use "id".
> 
> IMHO we should just mandate 'id' when using "exec-oob", as it is
> conceptually broken to use OOB without a way to match responses.
> We don't want clients relying on all OOB replies coming in sequence
> now, and then breaking when we allow multiple OOB processing threads.
> That would require us to add a eec-oob-no-really-i-mean-it command
> to allow overlapping OOB responses later.

Agreed, again.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] your mail
  2018-06-29  9:57         ` Peter Xu
@ 2018-06-29 15:40           ` Eric Blake
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Blake @ 2018-06-29 15:40 UTC (permalink / raw)
  To: Peter Xu, Daniel P. Berrangé; +Cc: Markus Armbruster, qemu-devel

On 06/29/2018 04:57 AM, Peter Xu wrote:

>>>     Permit me a digression.  "control": { "run-oob": true } is awfully
>>>     verbose.  Back when we hashed out the OOB design, I proposed to use
>>>     "exec-oob" instead of "execute" to run a command OOB.  I missed when
>>>     that morphed into flag "run-oob" wrapped in object "control".  Was it
>>>     in response to reviewer feedback?  Can you give me a pointer?
> 
> It's me who proposed this, not from any reviewer's feedback.  It just
> happened that no one was against it.

Only because I didn't notice the difference, and was helping clear the 
QAPI backlog before the release.  You've now had both the maintainer and 
the backup express a desire for the shorter form.

> 
>>>
>>>     The counterpart to "exec-oob" would be "return-oob" and "error-oob".
>>>     Hmm.
>>
>> I do prefer the less verbose proposal too.
> 
> But frankly speaking I still prefer current way.  QMP is majorly for
> clients (computer programs) rather than users to use it.  Comparing to
> verbosity, I would care more about coherency for how we define the
> interface.  Say, OOB execution is still one way to execute a command,
> so naturally it should still be using the same way that we execute a
> QMP command, we just need some extra fields to tell the server that
> "this message is more urgent than the other ones".

Code-wise, it's actually simpler for libvirt to write:

if (oob) {
     virJSONValueObjectCreate(&cmd, "s:exec-oob", cmdname, ...);
} else {
     virJSONValueObjectCreate(&cmd, "s:execute", cmdname, ...);
}

that it is to write:

virJSONValueObjectCreate(&cmd, "s:execute", cmdname, ...);
if (oob) {
     virJSONValuePtr control;
     virJSONValueObjectCreate(&control, "b:run-oob", true, NULL);
     virJSONValueObjectAppend(&cmd, "control", control);
}

(plus error-checking that I omitted).

In short, adding extra fields is MORE work than just using the command 
name AS the additional bit of information.

> 
> If we are discussing HMP interfaces, I'll have the same preference
> with both of you to consider more about whether it's user-friendly or
> not.  But now we are talking about QMP, then I'll prefer "control".

If you don't want to write the patch, then probably Markus or I will.

> 
> In the future, it's also possible to add some more sub-fields into the
> "control" field.  When that happens, do we want to introduce another
> standalone wording for that?  I would assume the answer is no.

We may add a "control" field at that time, and may even offer 
convenience syntax to allow "exec-oob" to behave as if a "control" field 
were passed.  But the addition of new control features is rare, so I'd 
rather deal with that when we have something to actually add, rather 
than making us suffer with unneeded verbosity for potentially years 
waiting for that next control field to materialize.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default
  2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
                   ` (7 preceding siblings ...)
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
@ 2018-06-30 16:26 ` Markus Armbruster
  8 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-06-30 16:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Kevin Wolf, Peter Maydell, Thomas Huth, Fam Zheng,
	Eric Auger, John Snow, Max Reitz, Christian Borntraeger,
	Dr . David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Markus Armbruster

I'm going to merge the fixes and cleanups [PATCH 1-5].  We need a few
more fixes before we can enable OOB [PATCH 6-7].  I'm working on some.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
                     ` (2 preceding siblings ...)
  2018-06-27 11:59   ` [Qemu-devel] your mail Peter Xu
@ 2018-07-02  5:43   ` Markus Armbruster
  2018-07-04  5:44     ` Peter Xu
  3 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-07-02  5:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

More lose ends:

* scripts/qmp/ doesn't support OOB, yet.  qmp-shell.py in particular

* test-qmp-cmds neglects to cover the OOB additions to qmp-dispatch.c

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-07-02  5:43   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
@ 2018-07-04  5:44     ` Peter Xu
  2018-07-04  7:03       ` Markus Armbruster
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2018-07-04  5:44 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On Mon, Jul 02, 2018 at 07:43:06AM +0200, Markus Armbruster wrote:
> More lose ends:
> 
> * scripts/qmp/ doesn't support OOB, yet.  qmp-shell.py in particular
> 
> * test-qmp-cmds neglects to cover the OOB additions to qmp-dispatch.c

Would you mind I put these aside for now?

I'm afraid things grow then we lose control, so not sure whether I can
just focus on the bugs first (e.g., the COMMAND_DROP event
broadcasting, and also the response queue flow control issue).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-07-04  5:44     ` Peter Xu
@ 2018-07-04  7:03       ` Markus Armbruster
  0 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2018-07-04  7:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 02, 2018 at 07:43:06AM +0200, Markus Armbruster wrote:
>> More lose ends:
>> 
>> * scripts/qmp/ doesn't support OOB, yet.  qmp-shell.py in particular
>> 
>> * test-qmp-cmds neglects to cover the OOB additions to qmp-dispatch.c
>
> Would you mind I put these aside for now?
>
> I'm afraid things grow then we lose control, so not sure whether I can
> just focus on the bugs first (e.g., the COMMAND_DROP event
> broadcasting, and also the response queue flow control issue).

I agree we need to focus and prioritize.

Any lose ends we can't fix right away we should document with suitable
TODO or FIXME comments.

We can also try to spread the load onto more shoulders.  Perhaps
qmp-shell could be updated by someone who actually uses it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-06-29  9:01             ` Peter Xu
@ 2018-07-18 15:08               ` Markus Armbruster
  2018-07-19 13:00                 ` Peter Xu
  0 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2018-07-18 15:08 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Marc-André Lureau

Marc-André, one question for you inline.  Search for your name.

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jun 28, 2018 at 03:20:41PM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
>> >> Monitor behavior changes even when the client rejects capability "oob".
>> >> 
>> >> Traditionally, the monitor reads, executes and responds to one command
>> >> after the other.  If the client sends in-band commands faster than the
>> >> server can execute them, the kernel will eventually refuse to buffer
>> >> more, and sending blocks or fails with EAGAIN.
>> >> 
>> >> To make OOB possible, we need to read and queue commands as we receive
>> >> them.  If the client sends in-band commands faster than the server can
>> >> execute them, the server will eventually drop commands to limit the
>> >> queue length.  The sever sends event COMMAND_DROPPED then.
>> >> 
>> >> However, we get the new behavior even when the client rejects capability
>> >> "oob".  We get the traditional behavior only when the server doesn't
>> >> offer "oob".
>> >> 
>> >> Is this what we want?
>> >
>> > Not really.  But I thought we kept that old behavior, haven't we?
>> >
>> > In handle_qmp_command() we have this:
>> >
>> >     /*
>> >      * If OOB is not enabled on the current monitor, we'll emulate the
>> >      * old behavior that we won't process the current monitor any more
>> >      * until it has responded.  This helps make sure that as long as
>> >      * OOB is not enabled, the server will never drop any command.
>> >      */
>> >     if (!qmp_oob_enabled(mon)) {
>> >         monitor_suspend(mon);
>> >         req_obj->need_resume = true;
>> >     } else {
>> >         /* Drop the request if queue is full. */
>> >         if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
>> >             qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
>> >             qapi_event_send_command_dropped(id,
>> >                                             COMMAND_DROP_REASON_QUEUE_FULL,
>> >                                             &error_abort);
>> >             qmp_request_free(req_obj);
>> >             return;
>> >         }
>> >     }
>> >
>> > Here assuming that we are not enabling OOB, then since we will suspend
>> > the monitor when we receive one command, then monitor_can_read() later
>> > will check with a result that "we should not read the chardev", then
>> > it'll leave all the data in the input buffer.  Then AFAIU the QMP
>> > client that didn't enable OOB will have similar behavior as before.
>> >
>> > Also, we will never receive a COMMAND_DROPPED event in that legacy
>> > mode as well since it's in the "else" block.   Am I right?
>> 
>> Hmm.  Did I get confused?  Let me look again.
>> 
>> Some places check qmp_oob_enabled(), which is true when the server
>> offered capability "oob" and the client accepted.  In terms of my reply
>> to Daniel, it distinguishes the two run time cases "OOB off" and "OOB
>> on".
>
> Exactly.
>
>> 
>> Other places check ->use_io_thr, which is true when
>> 
>>     (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr)
>> 
>> in monitor_init().

->use_io_thr is now spelled ->use_io_thread.

>> One of these places is get_qmp_greeting().  It makes the server offer
>> "oob" when ->use_io_thr.  Makes sense.
>> 
>> In terms of my reply to Daniel, ->use_io_thr distinguishes between "OOB
>> not offered" and ("OOB offered, but rejected by client" or "OOB offered
>> and accepted").
>
> Exactly.
>
> To be more clear, let's name the three cases (as you mentioned):
>
> (A) OOB not offered
> (B) OOB offered, but rejected by client
> (C) OOB offered, and accepted
>
> Then:
>
> (1) qmp_oob_enabled() will only be return true if (C).
> (2) ->use_io_thr will only be true if (B) or (C).

Notes on (A) to be kept in mind for the remainder of the discussion:

* I don't expect (A) to occur in production

  (A) means the monitor has a chardev-mux backend.  chardev-mux
  multiplexes multiple character backends, e.g. multiplex a monitor and
  a console on stdio.  C-a c switches focus.  While such multiplexing
  may be convenient for humans, it's an unnecessary complication for
  machines, which could just as well use two UNIX domain sockets and not
  have to worry about focus and escape sequences.

* I do expect (A) to go away eventually

  (A) exists only because "not all the chardev frontends can run without
  main thread, or can run in multiple threads" [PATCH 6].  Hopefully,
  that'll be fixed eventually, so (A) can go away.

  Marc-André, your "[PATCH 04/12] Revert "qmp: isolate responses into io
  thread" states the "chardev write path is thread safe".  What's
  missing to make chardev-mux capable of supporting OOB?

>> Uses could to violate 'may use "server offered OOB" only for
>> configuration purposes', so let's check them:
>> 
>> * monitor_json_emitter()

This is now qmp_queue_response()

>>   If mon->use_io_thr, push to response queue.  Else emit directly.
>> 
>>   I'm afraid run time behavior differs for "OOB not offered" (emit
>>   directly) and "OOB offered by rejected" (queue).
>
> Yeah it's different, but logically it's the same.  IMHO it's only a
> fast path for main thread.  In other word, I think it can still work
> correctly if we remove that else clause.  In that sense, it's not
> really a "true" behavior change.

Remember that (A) should not occur in production, and should go away
eventually.  Maintaining a fast path just for that feels unjustified.

>> * qmp_caps_check()
>> 
>>   If !mon->use_io_thr, reject client's acceptance of "oob".  Implicitly
>>   recomputing the set of offered capabilities here is less than elegant,
>>   but it's not wrong.

This has been replaced by monitor_qmp_caps_reset().

If mon->use_io_thread, offer OOB.  Makes sense.

>> * monitor_resume()
>> 
>>   If mon->use_io_thr(), kick the monitor I/O thread.
>> 
>>   Again, different behavior for the two variations of "OOB off".
>
> This is another example like above - IMHO we can just kick it no
> matter what, but we just don't need to do that for main thread (since
> we are in main thread so we are sure it is not asleep).  It's just
> another trivial enhancement on top of the main logic.

Again, maintaining a special case just for (A) feels unjustified.

>> * get_qmp_greeting()
>> 
>>   Covered above, looks good to me.

Also replaced by monitor_qmp_caps_reset().

>> * monitor_qmp_setup_handlers_bh()
>> 
>>   If mon->use_io_thr(), pass the monitor I/O thread's AIOContext to
>>   qemu_chr_fe_set_handlers(), else pass NULL.
>> 
>>   Again, different behavior for the two variations of "OOB off".
>
> This is different indeed, but IMHO it's not really "behavior
> difference", it's just the context (thread to run the code) that is
> different.  The major code path for case (A) and (B) (which are the
> two "off" cases) should be the same (when I say "major", I excluded
> those trivial enhancements that I mentioned).

As far as I can tell, monitor_qmp_setup_handlers_bh() can run only if
scheduled by monitor_init() when mon->use_io_thread.  If that's true,
then the !mon->use_io_thread case is unreachable.  Ah, I see you've also
noticed that, and you're proposing a patch below.

>> * monitor_init()
>> 
>>   If mon->use_io_thr, set up bottom half
>>   monitor_qmp_setup_handlers_bh(), else call qemu_chr_fe_set_handlers()
>>   directly.
>
> This is indeed a bit tricky (to avoid a potential race that Stefan has
> pointed out), however after things are setup it'll be exactly the same
> code path for both OFF cases.
>
> And actually when I read the code I noticed that we can actually
> simplify the code with this:
>
> diff --git a/monitor.c b/monitor.c
> index 0730a27172..5d6bff8d51 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4635,18 +4635,14 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
>      Monitor *mon = opaque;
>      GMainContext *context;
>
> -    if (mon->use_io_thr) {
> -        /*
> -         * When use_io_thr is set, we use the global shared dedicated
> -         * IO thread for this monitor to handle input/output.
> -         */
> -        context = monitor_get_io_context();
> -        /* We should have inited globals before reaching here. */
> -        assert(context);
> -    } else {
> -        /* The default main loop, which is the main thread */
> -        context = NULL;
> -    }
> +    assert(mon->use_io_thr);
> +    /*
> +     * When use_io_thr is set, we use the global shared dedicated
> +     * IO thread for this monitor to handle input/output.
> +     */
> +    context = monitor_get_io_context();
> +    /* We should have inited globals before reaching here. */
> +    assert(context);
>
>      qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
>                               monitor_qmp_event, NULL, mon, context, true);
>
> Since monitor_qmp_setup_handlers_bh() will only be used for IOThread
> case.  I can post a patch for that.

Yes, please.

>> 
>>   Again, different behavior for the two variations of "OOB off".

The difference is

* timing: right away for (A) vs. bottom half for (B) and (C)

* context argument for qemu_chr_fe_set_handlers(): NULL, i.e. main loop
  thread for (A), monitor_get_io_context(), i.e. the I/O thread for (B)
  and (C)

Okay, I think.

>> Either I am confused now, or the two variations of "OOB off" execute
>> different code at run time.  Which is it?
>
> As mentioned, they should be running the same code at run time.
> Though with some trivial differences, but if you see above most of
> them should only be small enhancements which can actually be removed.

Please do.

>> If it's different code, are the differences observable for the client?
>
> AFAIU since the code path should mostly be the same for the two OOF
> cases, IMHO there should have no difference observable from the
> client, otherwise it's buggy and I'd be willing to fix it.
>
>> 
>> Observable or not, I suspect the differences should not exist.
>
> We can remove those "trivial enhancements" if we want to make sure the
> code paths are exactly the same, but I'm not sure whether that's
> really what we need (or we just need to make sure it's unobservable).

As long as we're running separate code for (A), "unobservable
difference" is a proposition we need to prove.

Reducing separate code should simplify the proof.

Eliminiating it will simplify it maximally :)

> Thanks!

Thank *you* for persevering :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] monitor: enable OOB by default
  2018-07-18 15:08               ` Markus Armbruster
@ 2018-07-19 13:00                 ` Peter Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2018-07-19 13:00 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Marc-André Lureau

On Wed, Jul 18, 2018 at 05:08:21PM +0200, Markus Armbruster wrote:
> Marc-André, one question for you inline.  Search for your name.
> 
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Jun 28, 2018 at 03:20:41PM +0200, Markus Armbruster wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Wed, Jun 27, 2018 at 03:13:57PM +0200, Markus Armbruster wrote:
> >> >> Monitor behavior changes even when the client rejects capability "oob".
> >> >> 
> >> >> Traditionally, the monitor reads, executes and responds to one command
> >> >> after the other.  If the client sends in-band commands faster than the
> >> >> server can execute them, the kernel will eventually refuse to buffer
> >> >> more, and sending blocks or fails with EAGAIN.
> >> >> 
> >> >> To make OOB possible, we need to read and queue commands as we receive
> >> >> them.  If the client sends in-band commands faster than the server can
> >> >> execute them, the server will eventually drop commands to limit the
> >> >> queue length.  The sever sends event COMMAND_DROPPED then.
> >> >> 
> >> >> However, we get the new behavior even when the client rejects capability
> >> >> "oob".  We get the traditional behavior only when the server doesn't
> >> >> offer "oob".
> >> >> 
> >> >> Is this what we want?
> >> >
> >> > Not really.  But I thought we kept that old behavior, haven't we?
> >> >
> >> > In handle_qmp_command() we have this:
> >> >
> >> >     /*
> >> >      * If OOB is not enabled on the current monitor, we'll emulate the
> >> >      * old behavior that we won't process the current monitor any more
> >> >      * until it has responded.  This helps make sure that as long as
> >> >      * OOB is not enabled, the server will never drop any command.
> >> >      */
> >> >     if (!qmp_oob_enabled(mon)) {
> >> >         monitor_suspend(mon);
> >> >         req_obj->need_resume = true;
> >> >     } else {
> >> >         /* Drop the request if queue is full. */
> >> >         if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
> >> >             qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> >> >             qapi_event_send_command_dropped(id,
> >> >                                             COMMAND_DROP_REASON_QUEUE_FULL,
> >> >                                             &error_abort);
> >> >             qmp_request_free(req_obj);
> >> >             return;
> >> >         }
> >> >     }
> >> >
> >> > Here assuming that we are not enabling OOB, then since we will suspend
> >> > the monitor when we receive one command, then monitor_can_read() later
> >> > will check with a result that "we should not read the chardev", then
> >> > it'll leave all the data in the input buffer.  Then AFAIU the QMP
> >> > client that didn't enable OOB will have similar behavior as before.
> >> >
> >> > Also, we will never receive a COMMAND_DROPPED event in that legacy
> >> > mode as well since it's in the "else" block.   Am I right?
> >> 
> >> Hmm.  Did I get confused?  Let me look again.
> >> 
> >> Some places check qmp_oob_enabled(), which is true when the server
> >> offered capability "oob" and the client accepted.  In terms of my reply
> >> to Daniel, it distinguishes the two run time cases "OOB off" and "OOB
> >> on".
> >
> > Exactly.
> >
> >> 
> >> Other places check ->use_io_thr, which is true when
> >> 
> >>     (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr)
> >> 
> >> in monitor_init().
> 
> ->use_io_thr is now spelled ->use_io_thread.
> 
> >> One of these places is get_qmp_greeting().  It makes the server offer
> >> "oob" when ->use_io_thr.  Makes sense.
> >> 
> >> In terms of my reply to Daniel, ->use_io_thr distinguishes between "OOB
> >> not offered" and ("OOB offered, but rejected by client" or "OOB offered
> >> and accepted").
> >
> > Exactly.
> >
> > To be more clear, let's name the three cases (as you mentioned):
> >
> > (A) OOB not offered
> > (B) OOB offered, but rejected by client
> > (C) OOB offered, and accepted
> >
> > Then:
> >
> > (1) qmp_oob_enabled() will only be return true if (C).
> > (2) ->use_io_thr will only be true if (B) or (C).
> 
> Notes on (A) to be kept in mind for the remainder of the discussion:
> 
> * I don't expect (A) to occur in production
> 
>   (A) means the monitor has a chardev-mux backend.  chardev-mux
>   multiplexes multiple character backends, e.g. multiplex a monitor and
>   a console on stdio.  C-a c switches focus.  While such multiplexing
>   may be convenient for humans, it's an unnecessary complication for
>   machines, which could just as well use two UNIX domain sockets and not
>   have to worry about focus and escape sequences.
> 
> * I do expect (A) to go away eventually
> 
>   (A) exists only because "not all the chardev frontends can run without
>   main thread, or can run in multiple threads" [PATCH 6].  Hopefully,
>   that'll be fixed eventually, so (A) can go away.
> 
>   Marc-André, your "[PATCH 04/12] Revert "qmp: isolate responses into io
>   thread" states the "chardev write path is thread safe".  What's
>   missing to make chardev-mux capable of supporting OOB?
> 
> >> Uses could to violate 'may use "server offered OOB" only for
> >> configuration purposes', so let's check them:
> >> 
> >> * monitor_json_emitter()
> 
> This is now qmp_queue_response()
> 
> >>   If mon->use_io_thr, push to response queue.  Else emit directly.
> >> 
> >>   I'm afraid run time behavior differs for "OOB not offered" (emit
> >>   directly) and "OOB offered by rejected" (queue).
> >
> > Yeah it's different, but logically it's the same.  IMHO it's only a
> > fast path for main thread.  In other word, I think it can still work
> > correctly if we remove that else clause.  In that sense, it's not
> > really a "true" behavior change.
> 
> Remember that (A) should not occur in production, and should go away
> eventually.  Maintaining a fast path just for that feels unjustified.
> 
> >> * qmp_caps_check()
> >> 
> >>   If !mon->use_io_thr, reject client's acceptance of "oob".  Implicitly
> >>   recomputing the set of offered capabilities here is less than elegant,
> >>   but it's not wrong.
> 
> This has been replaced by monitor_qmp_caps_reset().
> 
> If mon->use_io_thread, offer OOB.  Makes sense.
> 
> >> * monitor_resume()
> >> 
> >>   If mon->use_io_thr(), kick the monitor I/O thread.
> >> 
> >>   Again, different behavior for the two variations of "OOB off".
> >
> > This is another example like above - IMHO we can just kick it no
> > matter what, but we just don't need to do that for main thread (since
> > we are in main thread so we are sure it is not asleep).  It's just
> > another trivial enhancement on top of the main logic.
> 
> Again, maintaining a special case just for (A) feels unjustified.
> 
> >> * get_qmp_greeting()
> >> 
> >>   Covered above, looks good to me.
> 
> Also replaced by monitor_qmp_caps_reset().
> 
> >> * monitor_qmp_setup_handlers_bh()
> >> 
> >>   If mon->use_io_thr(), pass the monitor I/O thread's AIOContext to
> >>   qemu_chr_fe_set_handlers(), else pass NULL.
> >> 
> >>   Again, different behavior for the two variations of "OOB off".
> >
> > This is different indeed, but IMHO it's not really "behavior
> > difference", it's just the context (thread to run the code) that is
> > different.  The major code path for case (A) and (B) (which are the
> > two "off" cases) should be the same (when I say "major", I excluded
> > those trivial enhancements that I mentioned).
> 
> As far as I can tell, monitor_qmp_setup_handlers_bh() can run only if
> scheduled by monitor_init() when mon->use_io_thread.  If that's true,
> then the !mon->use_io_thread case is unreachable.  Ah, I see you've also
> noticed that, and you're proposing a patch below.

Yeah, and...

> 
> >> * monitor_init()
> >> 
> >>   If mon->use_io_thr, set up bottom half
> >>   monitor_qmp_setup_handlers_bh(), else call qemu_chr_fe_set_handlers()
> >>   directly.
> >
> > This is indeed a bit tricky (to avoid a potential race that Stefan has
> > pointed out), however after things are setup it'll be exactly the same
> > code path for both OFF cases.
> >
> > And actually when I read the code I noticed that we can actually
> > simplify the code with this:
> >
> > diff --git a/monitor.c b/monitor.c
> > index 0730a27172..5d6bff8d51 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -4635,18 +4635,14 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
> >      Monitor *mon = opaque;
> >      GMainContext *context;
> >
> > -    if (mon->use_io_thr) {
> > -        /*
> > -         * When use_io_thr is set, we use the global shared dedicated
> > -         * IO thread for this monitor to handle input/output.
> > -         */
> > -        context = monitor_get_io_context();
> > -        /* We should have inited globals before reaching here. */
> > -        assert(context);
> > -    } else {
> > -        /* The default main loop, which is the main thread */
> > -        context = NULL;
> > -    }
> > +    assert(mon->use_io_thr);
> > +    /*
> > +     * When use_io_thr is set, we use the global shared dedicated
> > +     * IO thread for this monitor to handle input/output.
> > +     */
> > +    context = monitor_get_io_context();
> > +    /* We should have inited globals before reaching here. */
> > +    assert(context);
> >
> >      qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
> >                               monitor_qmp_event, NULL, mon, context, true);
> >
> > Since monitor_qmp_setup_handlers_bh() will only be used for IOThread
> > case.  I can post a patch for that.
> 
> Yes, please.

... actually I have had that patch in my "turn-oob-on-default" series,
and even with your Reviewed-by. :)

  https://patchwork.kernel.org/patch/10506173/

I suppose I'll just repost it and the series after the release.

> 
> >> 
> >>   Again, different behavior for the two variations of "OOB off".
> 
> The difference is
> 
> * timing: right away for (A) vs. bottom half for (B) and (C)
> 
> * context argument for qemu_chr_fe_set_handlers(): NULL, i.e. main loop
>   thread for (A), monitor_get_io_context(), i.e. the I/O thread for (B)
>   and (C)
> 
> Okay, I think.
> 
> >> Either I am confused now, or the two variations of "OOB off" execute
> >> different code at run time.  Which is it?
> >
> > As mentioned, they should be running the same code at run time.
> > Though with some trivial differences, but if you see above most of
> > them should only be small enhancements which can actually be removed.
> 
> Please do.
> 
> >> If it's different code, are the differences observable for the client?
> >
> > AFAIU since the code path should mostly be the same for the two OOF
> > cases, IMHO there should have no difference observable from the
> > client, otherwise it's buggy and I'd be willing to fix it.
> >
> >> 
> >> Observable or not, I suspect the differences should not exist.
> >
> > We can remove those "trivial enhancements" if we want to make sure the
> > code paths are exactly the same, but I'm not sure whether that's
> > really what we need (or we just need to make sure it's unobservable).
> 
> As long as we're running separate code for (A), "unobservable
> difference" is a proposition we need to prove.
> 
> Reducing separate code should simplify the proof.
> 
> Eliminiating it will simplify it maximally :)

Ok, this I can try to do too after the release.

> 
> > Thanks!
> 
> Thank *you* for persevering :)

I should thank you for your continuous support on out-of-band (or even
before it was proposed and named by you :).

(I hope I didn't miss anything that I should comment on; let me know
 if I have)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2018-07-19 13:04 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20  7:32 [Qemu-devel] [PATCH v5 0/7] monitor: enable OOB by default Peter Xu
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 1/7] chardev: comment details for CLOSED event Peter Xu
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 2/7] monitor: rename *_pop_one to *_pop_any Peter Xu
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 3/7] monitor: flush qmp responses when CLOSED Peter Xu
2018-06-20  8:33   ` Markus Armbruster
2018-06-20  8:38     ` Peter Xu
2018-06-20  9:50       ` Markus Armbruster
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 4/7] tests: iotests: drop some stderr line Peter Xu
2018-06-20  8:35   ` Markus Armbruster
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 5/7] docs: mention shared state protect for OOB Peter Xu
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 6/7] monitor: remove "x-oob", turn oob on by default Peter Xu
2018-06-20  7:32 ` [Qemu-devel] [PATCH v5 7/7] Revert "tests: Add parameter to qtest_init_without_qmp_handshake" Peter Xu
2018-06-26 17:21 ` [Qemu-devel] (no subject) Markus Armbruster
2018-06-27  7:38   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
2018-06-27  8:41     ` Markus Armbruster
2018-06-27 10:20       ` Daniel P. Berrangé
2018-06-27 11:23         ` Markus Armbruster
2018-06-27 12:07           ` Peter Xu
2018-06-27 12:37             ` Eric Blake
2018-06-28  7:04               ` Markus Armbruster
2018-06-29  7:20                 ` Peter Xu
2018-06-28  6:55             ` Markus Armbruster
2018-06-28 11:43               ` Eric Blake
2018-06-29  8:18               ` Peter Xu
2018-06-27 13:13       ` Markus Armbruster
2018-06-27 13:28         ` Daniel P. Berrangé
2018-06-28 13:02           ` Markus Armbruster
2018-06-27 13:34         ` Peter Xu
2018-06-28 13:20           ` Markus Armbruster
2018-06-29  9:01             ` Peter Xu
2018-07-18 15:08               ` Markus Armbruster
2018-07-19 13:00                 ` Peter Xu
2018-06-27  7:40   ` Markus Armbruster
2018-06-27  8:35     ` Markus Armbruster
2018-06-27 12:32       ` Peter Xu
2018-06-28  9:29         ` Markus Armbruster
2018-06-29  9:42           ` Peter Xu
2018-06-27 13:36     ` Peter Xu
2018-06-27 11:59   ` [Qemu-devel] your mail Peter Xu
2018-06-28  8:31     ` Markus Armbruster
2018-06-28 11:51       ` Eric Blake
2018-06-28 12:00       ` Daniel P. Berrangé
2018-06-29  9:57         ` Peter Xu
2018-06-29 15:40           ` Eric Blake
2018-07-02  5:43   ` [Qemu-devel] monitor: enable OOB by default Markus Armbruster
2018-07-04  5:44     ` Peter Xu
2018-07-04  7:03       ` Markus Armbruster
2018-06-30 16:26 ` [Qemu-devel] [PATCH v5 0/7] " Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.