All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-8.2 v2 0/2] migration: Add max-switchover-bandwidth parameter
@ 2023-08-03 15:53 Peter Xu
  2023-08-03 15:53 ` [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments Peter Xu
  2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Xu @ 2023-08-03 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhiyi Guo, peterx, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

v2:
- Fix wordings, reindent qapi doc [Markus]
- Added a pre-requisite patch to dedup documents in qapi/migration
- Rename available-bandwidth to max-switchover-bandwidth [Dan]

This is the v2 series to add the new parameter to guide migration
switchover calculations.

For more information on the new parameter and why we need it, please read
commit message in patch 2.

Please have a look, thanks.

Peter Xu (2):
  qapi/migration: Deduplicate migration parameter field comments
  migration: Allow user to specify migration switchover bandwidth

 qapi/migration.json            | 297 +++------------------------------
 migration/migration.h          |   2 +-
 migration/options.h            |   1 +
 migration/migration-hmp-cmds.c |  14 ++
 migration/migration.c          |  19 ++-
 migration/options.c            |  28 ++++
 migration/trace-events         |   2 +-
 7 files changed, 80 insertions(+), 283 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-03 15:53 [PATCH for-8.2 v2 0/2] migration: Add max-switchover-bandwidth parameter Peter Xu
@ 2023-08-03 15:53 ` Peter Xu
  2023-08-04 12:28   ` Markus Armbruster
  2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-03 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhiyi Guo, peterx, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

We used to have three objects that have always the same list of parameters
and comments are always duplicated:

  - @MigrationParameter
  - @MigrationParameters
  - @MigrateSetParameters

Before we can deduplicate the code, it's fairly straightforward to
deduplicate the comments first, so for each time we add a new migration
parameter we don't need to copy the same paragraphs three times.

Make the @MigrationParameter the major source of truth, while leaving the
rest two to reference to it.

We do have a slight problem in the man/html pages generated, that for the
latter two objects we'll get a list of Members but with all of them saying
"Not documented":

   Members
       announce-initial: int (optional)
              Not documented

       announce-max: int (optional)
              Not documented

       announce-rounds: int (optional)
              Not documented

       [...]

Even though we'll have a reference there telling the reader to jump over to
read the @MigrationParameter sections instead, for example:

   MigrationParameters (Object)

       The object structure to represent a list of migration parameters.
       The optional members aren't actually optional.  For detailed
       explanation for each of the field, please refer to the documentation
       of MigrationParameter.

So hopefully that's not too bad.. and we can leave it for later to make it
even better.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json | 283 ++------------------------------------------
 1 file changed, 7 insertions(+), 276 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..bb798f87a5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -854,142 +854,9 @@
 ##
 # @MigrateSetParameters:
 #
-# @announce-initial: Initial delay (in milliseconds) before sending
-#     the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-#     the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-#     migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-#     subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: compression level
-#
-# @compress-threads: compression thread count
-#
-# @compress-wait-thread: Controls behavior when all compression
-#     threads are currently busy.  If true (default), wait for a free
-#     compression thread to become available; otherwise, send the page
-#     uncompressed.  (Since 3.1)
-#
-# @decompress-threads: decompression thread count
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-#     bytes_xfer_period to trigger throttling.  It is expressed as
-#     percentage.  The default value is 50. (Since 5.0)
-#
-# @cpu-throttle-initial: Initial percentage of time guest cpus are
-#     throttled when migration auto-converge is activated.  The
-#     default value is 20. (Since 2.7)
-#
-# @cpu-throttle-increment: throttle percentage increase each time
-#     auto-converge detects that migration is not making progress.
-#     The default value is 10. (Since 2.7)
-#
-# @cpu-throttle-tailslow: Make CPU throttling slower at tail stage At
-#     the tail stage of throttling, the Guest is very sensitive to CPU
-#     percentage while the @cpu-throttle -increment is excessive
-#     usually at tail stage.  If this parameter is true, we will
-#     compute the ideal CPU percentage used by the Guest, which may
-#     exactly make the dirty rate match the dirty rate threshold.
-#     Then we will choose a smaller throttle increment between the one
-#     specified by @cpu-throttle-increment and the one generated by
-#     ideal CPU percentage.  Therefore, it is compatible to
-#     traditional throttling, meanwhile the throttle increment won't
-#     be excessive at tail stage.  The default value is false.  (Since
-#     5.1)
-#
-# @tls-creds: ID of the 'tls-creds' object that provides credentials
-#     for establishing a TLS connection over the migration data
-#     channel.  On the outgoing side of the migration, the credentials
-#     must be for a 'client' endpoint, while for the incoming side the
-#     credentials must be for a 'server' endpoint.  Setting this to a
-#     non-empty string enables TLS for all migrations.  An empty
-#     string means that QEMU will use plain text mode for migration,
-#     rather than TLS (Since 2.9) Previously (since 2.7), this was
-#     reported by omitting tls-creds instead.
-#
-# @tls-hostname: hostname of the target host for the migration.  This
-#     is required when using x509 based TLS credentials and the
-#     migration URI does not already include a hostname.  For example
-#     if using fd: or exec: based migration, the hostname must be
-#     provided so that the server's x509 certificate identity can be
-#     validated.  (Since 2.7) An empty string means that QEMU will use
-#     the hostname associated with the migration URI, if any.  (Since
-#     2.9) Previously (since 2.7), this was reported by omitting
-#     tls-hostname instead.
-#
-# @max-bandwidth: to set maximum speed for migration.  maximum speed
-#     in bytes per second.  (Since 2.8)
-#
-# @downtime-limit: set maximum tolerated downtime for migration.
-#     maximum downtime in milliseconds (Since 2.8)
-#
-# @x-checkpoint-delay: the delay time between two COLO checkpoints.
-#     (Since 2.8)
-#
-# @block-incremental: Affects how much storage is migrated when the
-#     block migration capability is enabled.  When false, the entire
-#     storage backing chain is migrated into a flattened image at the
-#     destination; when true, only the active qcow2 layer is migrated
-#     and the destination must already have access to the same backing
-#     chain as was used on the source.  (since 2.10)
-#
-# @multifd-channels: Number of channels used to migrate data in
-#     parallel.  This is the same number that the number of sockets
-#     used for migration.  The default value is 2 (since 4.0)
-#
-# @xbzrle-cache-size: cache size to be used by XBZRLE migration.  It
-#     needs to be a multiple of the target page size and a power of 2
-#     (Since 2.11)
-#
-# @max-postcopy-bandwidth: Background transfer bandwidth during
-#     postcopy.  Defaults to 0 (unlimited).  In bytes per second.
-#     (Since 3.0)
-#
-# @max-cpu-throttle: maximum cpu throttle percentage.  The default
-#     value is 99. (Since 3.1)
-#
-# @multifd-compression: Which compression method to use.  Defaults to
-#     none.  (Since 5.0)
-#
-# @multifd-zlib-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 9,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 9 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @multifd-zstd-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 20,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 20 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
-#     aliases for the purpose of dirty bitmap migration.  Such aliases
-#     may for example be the corresponding names on the opposite site.
-#     The mapping must be one-to-one, but not necessarily complete: On
-#     the source, unmapped bitmaps and all bitmaps on unmapped nodes
-#     will be ignored.  On the destination, encountering an unmapped
-#     alias in the incoming migration stream will result in a report,
-#     and all further bitmap migration data will then be discarded.
-#     Note that the destination does not know about bitmaps it does
-#     not receive, so there is no limitation or requirement regarding
-#     the number of bitmaps received, or how they are named, or on
-#     which nodes they are placed.  By default (when this parameter
-#     has never been set), bitmap names are mapped to themselves.
-#     Nodes are mapped to their block device name if there is one, and
-#     to their node name otherwise.  (Since 5.2)
-#
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
-#     limit during live migration.  Should be in the range 1 to 1000ms.
-#     Defaults to 1000ms.  (Since 8.1)
-#
-# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#     Defaults to 1.  (Since 8.1)
+# Object structure to set migration parameters.  For detailed
+# explanation of each of the field, please refer to the documentation
+# of @MigrationParameter.
 #
 # Features:
 #
@@ -1053,146 +920,10 @@
 ##
 # @MigrationParameters:
 #
-# The optional members aren't actually optional.
-#
-# @announce-initial: Initial delay (in milliseconds) before sending
-#     the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-#     the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-#     migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-#     subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: compression level
-#
-# @compress-threads: compression thread count
-#
-# @compress-wait-thread: Controls behavior when all compression
-#     threads are currently busy.  If true (default), wait for a free
-#     compression thread to become available; otherwise, send the page
-#     uncompressed.  (Since 3.1)
-#
-# @decompress-threads: decompression thread count
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-#     bytes_xfer_period to trigger throttling.  It is expressed as
-#     percentage.  The default value is 50. (Since 5.0)
-#
-# @cpu-throttle-initial: Initial percentage of time guest cpus are
-#     throttled when migration auto-converge is activated.  (Since
-#     2.7)
-#
-# @cpu-throttle-increment: throttle percentage increase each time
-#     auto-converge detects that migration is not making progress.
-#     (Since 2.7)
-#
-# @cpu-throttle-tailslow: Make CPU throttling slower at tail stage At
-#     the tail stage of throttling, the Guest is very sensitive to CPU
-#     percentage while the @cpu-throttle -increment is excessive
-#     usually at tail stage.  If this parameter is true, we will
-#     compute the ideal CPU percentage used by the Guest, which may
-#     exactly make the dirty rate match the dirty rate threshold.
-#     Then we will choose a smaller throttle increment between the one
-#     specified by @cpu-throttle-increment and the one generated by
-#     ideal CPU percentage.  Therefore, it is compatible to
-#     traditional throttling, meanwhile the throttle increment won't
-#     be excessive at tail stage.  The default value is false.  (Since
-#     5.1)
-#
-# @tls-creds: ID of the 'tls-creds' object that provides credentials
-#     for establishing a TLS connection over the migration data
-#     channel.  On the outgoing side of the migration, the credentials
-#     must be for a 'client' endpoint, while for the incoming side the
-#     credentials must be for a 'server' endpoint.  An empty string
-#     means that QEMU will use plain text mode for migration, rather
-#     than TLS (Since 2.7) Note: 2.8 reports this by omitting
-#     tls-creds instead.
-#
-# @tls-hostname: hostname of the target host for the migration.  This
-#     is required when using x509 based TLS credentials and the
-#     migration URI does not already include a hostname.  For example
-#     if using fd: or exec: based migration, the hostname must be
-#     provided so that the server's x509 certificate identity can be
-#     validated.  (Since 2.7) An empty string means that QEMU will use
-#     the hostname associated with the migration URI, if any.  (Since
-#     2.9) Note: 2.8 reports this by omitting tls-hostname instead.
-#
-# @tls-authz: ID of the 'authz' object subclass that provides access
-#     control checking of the TLS x509 certificate distinguished name.
-#     (Since 4.0)
-#
-# @max-bandwidth: to set maximum speed for migration.  maximum speed
-#     in bytes per second.  (Since 2.8)
-#
-# @downtime-limit: set maximum tolerated downtime for migration.
-#     maximum downtime in milliseconds (Since 2.8)
-#
-# @x-checkpoint-delay: the delay time between two COLO checkpoints.
-#     (Since 2.8)
-#
-# @block-incremental: Affects how much storage is migrated when the
-#     block migration capability is enabled.  When false, the entire
-#     storage backing chain is migrated into a flattened image at the
-#     destination; when true, only the active qcow2 layer is migrated
-#     and the destination must already have access to the same backing
-#     chain as was used on the source.  (since 2.10)
-#
-# @multifd-channels: Number of channels used to migrate data in
-#     parallel.  This is the same number that the number of sockets
-#     used for migration.  The default value is 2 (since 4.0)
-#
-# @xbzrle-cache-size: cache size to be used by XBZRLE migration.  It
-#     needs to be a multiple of the target page size and a power of 2
-#     (Since 2.11)
-#
-# @max-postcopy-bandwidth: Background transfer bandwidth during
-#     postcopy.  Defaults to 0 (unlimited).  In bytes per second.
-#     (Since 3.0)
-#
-# @max-cpu-throttle: maximum cpu throttle percentage.  Defaults to 99.
-#     (Since 3.1)
-#
-# @multifd-compression: Which compression method to use.  Defaults to
-#     none.  (Since 5.0)
-#
-# @multifd-zlib-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 9,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 9 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @multifd-zstd-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 20,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 20 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
-#     aliases for the purpose of dirty bitmap migration.  Such aliases
-#     may for example be the corresponding names on the opposite site.
-#     The mapping must be one-to-one, but not necessarily complete: On
-#     the source, unmapped bitmaps and all bitmaps on unmapped nodes
-#     will be ignored.  On the destination, encountering an unmapped
-#     alias in the incoming migration stream will result in a report,
-#     and all further bitmap migration data will then be discarded.
-#     Note that the destination does not know about bitmaps it does
-#     not receive, so there is no limitation or requirement regarding
-#     the number of bitmaps received, or how they are named, or on
-#     which nodes they are placed.  By default (when this parameter
-#     has never been set), bitmap names are mapped to themselves.
-#     Nodes are mapped to their block device name if there is one, and
-#     to their node name otherwise.  (Since 5.2)
-#
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
-#     limit during live migration.  Should be in the range 1 to 1000ms.
-#     Defaults to 1000ms.  (Since 8.1)
-#
-# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#     Defaults to 1.  (Since 8.1)
+# The object structure to represent a list of migration parameters.
+# The optional members aren't actually optional.  For detailed
+# explanation for each of the field, please refer to the documentation
+# of @MigrationParameter.
 #
 # Features:
 #
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-08-03 15:53 [PATCH for-8.2 v2 0/2] migration: Add max-switchover-bandwidth parameter Peter Xu
  2023-08-03 15:53 ` [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments Peter Xu
@ 2023-08-03 15:53 ` Peter Xu
  2023-08-31 18:14   ` Joao Martins
                     ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Peter Xu @ 2023-08-03 15:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhiyi Guo, peterx, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

Migration bandwidth is a very important value to live migration.  It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.

This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data.  This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.

However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing.  For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM).  It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes,
e.g., that a few tens of MB/s even if 10Gbps available, and then the
decision to switchover will be further affected by this.

The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth.

The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth".  At least not something
I can think of so far.

One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.

For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified.

A new parameter "max-switchover-bandwidth" is introduced just for this. So
when the user specified this parameter, instead of trusting the estimated
value from QEMU itself (based on the QEMUFile send speed), let's trust the
user more by using this value to decide when to switchover, assuming that
we'll have such bandwidth available then.

When the user wants to have migration only use 5Gbps out of that 10Gbps,
one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
5Gbps so it'll never use over 5Gbps too (so the user can have the rest
5Gbps for other things).  So it can be useful even if the network is not
dedicated, but as long as the user can know a solid value.

This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.

Reported-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json            | 14 +++++++++++++-
 migration/migration.h          |  2 +-
 migration/options.h            |  1 +
 migration/migration-hmp-cmds.c | 14 ++++++++++++++
 migration/migration.c          | 19 +++++++++++++++----
 migration/options.c            | 28 ++++++++++++++++++++++++++++
 migration/trace-events         |  2 +-
 7 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index bb798f87a5..6a04fb7d36 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -759,6 +759,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 #     in bytes per second.  (Since 2.8)
 #
+# @max-switchover-bandwidth: to set available bandwidth for migration.
+#     By default, this value is zero, means the user is not aware of
+#     the available bandwidth that can be used by QEMU migration, so
+#     QEMU will estimate the bandwidth automatically.  This can be set
+#     when the estimated value is not accurate, while the user is able
+#     to guarantee such bandwidth is available for migration purpose
+#     during the migration procedure.  When specified correctly, this
+#     can make the switchover decision much more accurate, which will
+#     also be based on the max downtime specified.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 #     maximum downtime in milliseconds (Since 2.8)
 #
@@ -840,7 +850,7 @@
            'cpu-throttle-initial', 'cpu-throttle-increment',
            'cpu-throttle-tailslow',
            'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-           'downtime-limit',
+           'max-switchover-bandwidth', 'downtime-limit',
            { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
            'block-incremental',
            'multifd-channels',
@@ -885,6 +895,7 @@
             '*tls-hostname': 'StrOrNull',
             '*tls-authz': 'StrOrNull',
             '*max-bandwidth': 'size',
+            '*max-switchover-bandwidth': 'size',
             '*downtime-limit': 'uint64',
             '*x-checkpoint-delay': { 'type': 'uint32',
                                      'features': [ 'unstable' ] },
@@ -949,6 +960,7 @@
             '*tls-hostname': 'str',
             '*tls-authz': 'str',
             '*max-bandwidth': 'size',
+            '*max-switchover-bandwidth': 'size',
             '*downtime-limit': 'uint64',
             '*x-checkpoint-delay': { 'type': 'uint32',
                                      'features': [ 'unstable' ] },
diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..f18cee27f7 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -283,7 +283,7 @@ struct MigrationState {
     /*
      * The final stage happens when the remaining data is smaller than
      * this threshold; it's calculated from the requested downtime and
-     * measured bandwidth
+     * measured bandwidth, or max-switchover-bandwidth if specified.
      */
     int64_t threshold_size;
 
diff --git a/migration/options.h b/migration/options.h
index 045e2a41a2..a510ca94c9 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -80,6 +80,7 @@ int migrate_decompress_threads(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
+uint64_t migrate_max_switchover_bandwidth(void);
 uint64_t migrate_max_postcopy_bandwidth(void);
 int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index c115ef2d23..d7572d4c0a 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -321,6 +321,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH),
             params->max_bandwidth);
+        assert(params->has_max_switchover_bandwidth);
+        monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_MAX_SWITCHOVER_BANDWIDTH),
+            params->max_switchover_bandwidth);
         assert(params->has_downtime_limit);
         monitor_printf(mon, "%s: %" PRIu64 " ms\n",
             MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT),
@@ -574,6 +578,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         }
         p->max_bandwidth = valuebw;
         break;
+    case MIGRATION_PARAMETER_MAX_SWITCHOVER_BANDWIDTH:
+        p->has_max_switchover_bandwidth = true;
+        ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw);
+        if (ret < 0 || valuebw > INT64_MAX
+            || (size_t)valuebw != valuebw) {
+            error_setg(&err, "Invalid size %s", valuestr);
+            break;
+        }
+        p->max_switchover_bandwidth = valuebw;
+        break;
     case MIGRATION_PARAMETER_DOWNTIME_LIMIT:
         p->has_downtime_limit = true;
         visit_type_size(v, param, &p->downtime_limit, &err);
diff --git a/migration/migration.c b/migration/migration.c
index 5528acb65e..8493e3ca49 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2684,7 +2684,7 @@ static void migration_update_counters(MigrationState *s,
 {
     uint64_t transferred, transferred_pages, time_spent;
     uint64_t current_bytes; /* bytes transferred since the beginning */
-    double bandwidth;
+    double bandwidth, avail_bw;
 
     if (current_time < s->iteration_start_time + BUFFER_DELAY) {
         return;
@@ -2694,7 +2694,17 @@ static void migration_update_counters(MigrationState *s,
     transferred = current_bytes - s->iteration_initial_bytes;
     time_spent = current_time - s->iteration_start_time;
     bandwidth = (double)transferred / time_spent;
-    s->threshold_size = bandwidth * migrate_downtime_limit();
+    if (migrate_max_switchover_bandwidth()) {
+        /*
+         * If the user specified an available bandwidth, let's trust the
+         * user so that can be more accurate than what we estimated.
+         */
+        avail_bw = migrate_max_switchover_bandwidth();
+    } else {
+        /* If the user doesn't specify bandwidth, we use the estimated */
+        avail_bw = bandwidth;
+    }
+    s->threshold_size = avail_bw * migrate_downtime_limit();
 
     s->mbps = (((double) transferred * 8.0) /
                ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
@@ -2711,7 +2721,7 @@ static void migration_update_counters(MigrationState *s,
     if (stat64_get(&mig_stats.dirty_pages_rate) &&
         transferred > 10000) {
         s->expected_downtime =
-            stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth;
+            stat64_get(&mig_stats.dirty_bytes_last_sync) / avail_bw;
     }
 
     migration_rate_reset(s->to_dst_file);
@@ -2719,7 +2729,8 @@ static void migration_update_counters(MigrationState *s,
     update_iteration_initial_status(s);
 
     trace_migrate_transferred(transferred, time_spent,
-                              bandwidth, s->threshold_size);
+                              bandwidth, migrate_max_switchover_bandwidth(),
+                              s->threshold_size);
 }
 
 static bool migration_can_switchover(MigrationState *s)
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..19d87ab812 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -125,6 +125,8 @@ Property migration_properties[] = {
                       parameters.cpu_throttle_tailslow, false),
     DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
                       parameters.max_bandwidth, MAX_THROTTLE),
+    DEFINE_PROP_SIZE("max-switchover-bandwidth", MigrationState,
+                      parameters.max_switchover_bandwidth, 0),
     DEFINE_PROP_UINT64("x-downtime-limit", MigrationState,
                       parameters.downtime_limit,
                       DEFAULT_MIGRATE_SET_DOWNTIME),
@@ -780,6 +782,13 @@ uint64_t migrate_max_bandwidth(void)
     return s->parameters.max_bandwidth;
 }
 
+uint64_t migrate_max_switchover_bandwidth(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->parameters.max_switchover_bandwidth;
+}
+
 uint64_t migrate_max_postcopy_bandwidth(void)
 {
     MigrationState *s = migrate_get_current();
@@ -917,6 +926,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
                                  s->parameters.tls_authz : "");
     params->has_max_bandwidth = true;
     params->max_bandwidth = s->parameters.max_bandwidth;
+    params->has_max_switchover_bandwidth = true;
+    params->max_switchover_bandwidth = s->parameters.max_switchover_bandwidth;
     params->has_downtime_limit = true;
     params->downtime_limit = s->parameters.downtime_limit;
     params->has_x_checkpoint_delay = true;
@@ -1056,6 +1067,15 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
+    if (params->has_max_switchover_bandwidth &&
+        (params->max_switchover_bandwidth > SIZE_MAX)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   "max_switchover_bandwidth",
+                   "an integer in the range of 0 to "stringify(SIZE_MAX)
+                   " bytes/second");
+        return false;
+    }
+
     if (params->has_downtime_limit &&
         (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
@@ -1225,6 +1245,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->max_bandwidth = params->max_bandwidth;
     }
 
+    if (params->has_max_switchover_bandwidth) {
+        dest->max_switchover_bandwidth = params->max_switchover_bandwidth;
+    }
+
     if (params->has_downtime_limit) {
         dest->downtime_limit = params->downtime_limit;
     }
@@ -1341,6 +1365,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         }
     }
 
+    if (params->has_max_switchover_bandwidth) {
+        s->parameters.max_switchover_bandwidth = params->max_switchover_bandwidth;
+    }
+
     if (params->has_downtime_limit) {
         s->parameters.downtime_limit = params->downtime_limit;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..1296b8db5b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -185,7 +185,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
 source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
+migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64
 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-03 15:53 ` [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments Peter Xu
@ 2023-08-04 12:28   ` Markus Armbruster
  2023-08-04 13:59     ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Armbruster @ 2023-08-04 12:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Zhiyi Guo, Daniel P . Berrangé,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

Peter Xu <peterx@redhat.com> writes:

> We used to have three objects that have always the same list of parameters

We have!

> and comments are always duplicated:
>
>   - @MigrationParameter
>   - @MigrationParameters
>   - @MigrateSetParameters
>
> Before we can deduplicate the code, it's fairly straightforward to
> deduplicate the comments first, so for each time we add a new migration
> parameter we don't need to copy the same paragraphs three times.

De-duplicating the code would be nice, but we haven't done so in years,
which suggests it's hard enough not to be worth the trouble.

De-duplicating the documentation is certainly easier.

Is that what you're trying to say?

Our discussion pros and cons that is happening in review of v1 should be
captured in the commit message, right here.

> Make the @MigrationParameter the major source of truth, while leaving the
> rest two to reference to it.

Any particular reason for picking this one?

> We do have a slight problem in the man/html pages generated, that for the
> latter two objects we'll get a list of Members but with all of them saying
> "Not documented":
>
>    Members
>        announce-initial: int (optional)
>               Not documented
>
>        announce-max: int (optional)
>               Not documented
>
>        announce-rounds: int (optional)
>               Not documented
>
>        [...]
>
> Even though we'll have a reference there telling the reader to jump over to
> read the @MigrationParameter sections instead, for example:
>
>    MigrationParameters (Object)
>
>        The object structure to represent a list of migration parameters.
>        The optional members aren't actually optional.  For detailed
>        explanation for each of the field, please refer to the documentation
>        of MigrationParameter.
>
> So hopefully that's not too bad.. and we can leave it for later to make it
> even better.

It's plenty bad, I'm afraid.  It comes out as a short paragraph "don't
look here, look there", followed by screenfuls claiming "not
documented."  Embarrassing.  Worse, *misleading*, because the short
paragraph is easy to miss.

Also discussed in review of v1.  Let's continue there, to avoid
splitting the thread.

> Signed-off-by: Peter Xu <peterx@redhat.com>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 12:28   ` Markus Armbruster
@ 2023-08-04 13:59     ` Daniel P. Berrangé
  2023-08-04 16:01       ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-08-04 13:59 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Xu, qemu-devel, Zhiyi Guo, Leonardo Bras Soares Passos,
	Fabiano Rosas, Juan Quintela, Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > We used to have three objects that have always the same list of parameters
> 
> We have!
> 
> > and comments are always duplicated:
> >
> >   - @MigrationParameter
> >   - @MigrationParameters
> >   - @MigrateSetParameters
> >
> > Before we can deduplicate the code, it's fairly straightforward to
> > deduplicate the comments first, so for each time we add a new migration
> > parameter we don't need to copy the same paragraphs three times.
> 
> De-duplicating the code would be nice, but we haven't done so in years,
> which suggests it's hard enough not to be worth the trouble.

The "MigrationParameter" enumeration isn't actually used in
QMP at all.

It is only used in HMP for hmp_migrate_set_parameter and
hmp_info_migrate_parameters. So it is questionable documenting
that enum in the QMP reference docs at all.

1c1
< { 'struct': 'MigrationParameters',
---
> { 'struct': 'MigrateSetParameters',
14,16c14,16
<             '*tls-creds': 'str',
<             '*tls-hostname': 'str',
<             '*tls-authz': 'str',
---
>             '*tls-creds': 'StrOrNull',
>             '*tls-hostname': 'StrOrNull',
>             '*tls-authz': 'StrOrNull',

Is it not valid to use StrOrNull in both cases and thus
delete the duplication here ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 13:59     ` Daniel P. Berrangé
@ 2023-08-04 16:01       ` Peter Xu
  2023-08-04 16:29         ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-04 16:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > We used to have three objects that have always the same list of parameters
> > 
> > We have!
> > 
> > > and comments are always duplicated:
> > >
> > >   - @MigrationParameter
> > >   - @MigrationParameters
> > >   - @MigrateSetParameters
> > >
> > > Before we can deduplicate the code, it's fairly straightforward to
> > > deduplicate the comments first, so for each time we add a new migration
> > > parameter we don't need to copy the same paragraphs three times.
> > 
> > De-duplicating the code would be nice, but we haven't done so in years,
> > which suggests it's hard enough not to be worth the trouble.
> 
> The "MigrationParameter" enumeration isn't actually used in
> QMP at all.
> 
> It is only used in HMP for hmp_migrate_set_parameter and
> hmp_info_migrate_parameters. So it is questionable documenting
> that enum in the QMP reference docs at all.
> 
> 1c1
> < { 'struct': 'MigrationParameters',
> ---
> > { 'struct': 'MigrateSetParameters',
> 14,16c14,16
> <             '*tls-creds': 'str',
> <             '*tls-hostname': 'str',
> <             '*tls-authz': 'str',
> ---
> >             '*tls-creds': 'StrOrNull',
> >             '*tls-hostname': 'StrOrNull',
> >             '*tls-authz': 'StrOrNull',
> 
> Is it not valid to use StrOrNull in both cases and thus
> delete the duplication here ?

I tested removing MigrateSetParameters by replacing it with
MigrationParameters and it looks all fine here... I manually tested qmp/hmp
on set/query parameters, and qtests are all happy.

The only thing I see that may affect it is we used to logically allow
taking things like '"tls-authz": null' in the json input, but now we won't
allow that because we'll be asking for a string type only.

Since we have query-qmp-schema I suppose we're all fine, because logically
the mgmt app (libvirt?) will still query that to understand the protocol,
so now we'll have (response of query-qmp-schema):

        {
            "arg-type": "144",
            "meta-type": "command",
            "name": "migrate-set-parameters",
            "ret-type": "0"
        },

Where 144 can start to point to MigrationParameters, rather than
MigrateSetParameters.

Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
fields when setting?  Funnily I tried it and actually anything that does
migrate-set-parameters with a "null" passed over to tls-* fields will
already crash qemu...

./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.

#0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
#2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
#3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
#4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
#5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
#6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
#7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
#8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
#9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
#10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
#11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
#12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
#13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
#15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
#16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
#18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
#19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48

Then I suppose it means all mgmt apps are not using "null" anyway, and it
makes more sense to me to just remove MigrateSetParameters (by replacing it
with MigrationParameters).

Then if we can also replace MigrationParameter enum with an internal enum
(alongside with a _str[] array for it) it seems we're all fine to dedup the
3 objects into 1 in qapi schema.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 16:01       ` Peter Xu
@ 2023-08-04 16:29         ` Daniel P. Berrangé
  2023-08-04 16:46           ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-08-04 16:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
> On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> > > Peter Xu <peterx@redhat.com> writes:
> > > 
> > > > We used to have three objects that have always the same list of parameters
> > > 
> > > We have!
> > > 
> > > > and comments are always duplicated:
> > > >
> > > >   - @MigrationParameter
> > > >   - @MigrationParameters
> > > >   - @MigrateSetParameters
> > > >
> > > > Before we can deduplicate the code, it's fairly straightforward to
> > > > deduplicate the comments first, so for each time we add a new migration
> > > > parameter we don't need to copy the same paragraphs three times.
> > > 
> > > De-duplicating the code would be nice, but we haven't done so in years,
> > > which suggests it's hard enough not to be worth the trouble.
> > 
> > The "MigrationParameter" enumeration isn't actually used in
> > QMP at all.
> > 
> > It is only used in HMP for hmp_migrate_set_parameter and
> > hmp_info_migrate_parameters. So it is questionable documenting
> > that enum in the QMP reference docs at all.
> > 
> > 1c1
> > < { 'struct': 'MigrationParameters',
> > ---
> > > { 'struct': 'MigrateSetParameters',
> > 14,16c14,16
> > <             '*tls-creds': 'str',
> > <             '*tls-hostname': 'str',
> > <             '*tls-authz': 'str',
> > ---
> > >             '*tls-creds': 'StrOrNull',
> > >             '*tls-hostname': 'StrOrNull',
> > >             '*tls-authz': 'StrOrNull',
> > 
> > Is it not valid to use StrOrNull in both cases and thus
> > delete the duplication here ?
> 
> I tested removing MigrateSetParameters by replacing it with
> MigrationParameters and it looks all fine here... I manually tested qmp/hmp
> on set/query parameters, and qtests are all happy.

I meant the other way around, such we would be using 'StrOrNull'
in all scenarios.

> 
> The only thing I see that may affect it is we used to logically allow
> taking things like '"tls-authz": null' in the json input, but now we won't
> allow that because we'll be asking for a string type only.
> 
> Since we have query-qmp-schema I suppose we're all fine, because logically
> the mgmt app (libvirt?) will still query that to understand the protocol,
> so now we'll have (response of query-qmp-schema):
> 
>         {
>             "arg-type": "144",
>             "meta-type": "command",
>             "name": "migrate-set-parameters",
>             "ret-type": "0"
>         },
> 
> Where 144 can start to point to MigrationParameters, rather than
> MigrateSetParameters.
> 
> Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
> fields when setting?  Funnily I tried it and actually anything that does
> migrate-set-parameters with a "null" passed over to tls-* fields will
> already crash qemu...
> 
> ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
> 
> #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
> #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
> #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
> #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
> #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
> #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
> #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
> #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
> #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
> #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
> #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
> #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
> #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
> #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
> #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
> #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
> #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
> 
> Then I suppose it means all mgmt apps are not using "null" anyway, and it
> makes more sense to me to just remove MigrateSetParameters (by replacing it
> with MigrationParameters).

It shouldn't be crashing,  because qmp_migrate_set_parameters()
is turning 'null' into  "", which means the assert ought to
never fire. Did you have a local modiification that caused
this crash perhaps ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 16:29         ` Daniel P. Berrangé
@ 2023-08-04 16:46           ` Peter Xu
  2023-08-04 16:48             ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-04 16:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 05:29:19PM +0100, Daniel P. Berrangé wrote:
> On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
> > On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> > > > Peter Xu <peterx@redhat.com> writes:
> > > > 
> > > > > We used to have three objects that have always the same list of parameters
> > > > 
> > > > We have!
> > > > 
> > > > > and comments are always duplicated:
> > > > >
> > > > >   - @MigrationParameter
> > > > >   - @MigrationParameters
> > > > >   - @MigrateSetParameters
> > > > >
> > > > > Before we can deduplicate the code, it's fairly straightforward to
> > > > > deduplicate the comments first, so for each time we add a new migration
> > > > > parameter we don't need to copy the same paragraphs three times.
> > > > 
> > > > De-duplicating the code would be nice, but we haven't done so in years,
> > > > which suggests it's hard enough not to be worth the trouble.
> > > 
> > > The "MigrationParameter" enumeration isn't actually used in
> > > QMP at all.
> > > 
> > > It is only used in HMP for hmp_migrate_set_parameter and
> > > hmp_info_migrate_parameters. So it is questionable documenting
> > > that enum in the QMP reference docs at all.
> > > 
> > > 1c1
> > > < { 'struct': 'MigrationParameters',
> > > ---
> > > > { 'struct': 'MigrateSetParameters',
> > > 14,16c14,16
> > > <             '*tls-creds': 'str',
> > > <             '*tls-hostname': 'str',
> > > <             '*tls-authz': 'str',
> > > ---
> > > >             '*tls-creds': 'StrOrNull',
> > > >             '*tls-hostname': 'StrOrNull',
> > > >             '*tls-authz': 'StrOrNull',
> > > 
> > > Is it not valid to use StrOrNull in both cases and thus
> > > delete the duplication here ?
> > 
> > I tested removing MigrateSetParameters by replacing it with
> > MigrationParameters and it looks all fine here... I manually tested qmp/hmp
> > on set/query parameters, and qtests are all happy.
> 
> I meant the other way around, such we would be using 'StrOrNull'
> in all scenarios.

Yes, that should also work and even without worrying on nulls.  I just took
a random one replacing the other.

> 
> > 
> > The only thing I see that may affect it is we used to logically allow
> > taking things like '"tls-authz": null' in the json input, but now we won't
> > allow that because we'll be asking for a string type only.
> > 
> > Since we have query-qmp-schema I suppose we're all fine, because logically
> > the mgmt app (libvirt?) will still query that to understand the protocol,
> > so now we'll have (response of query-qmp-schema):
> > 
> >         {
> >             "arg-type": "144",
> >             "meta-type": "command",
> >             "name": "migrate-set-parameters",
> >             "ret-type": "0"
> >         },
> > 
> > Where 144 can start to point to MigrationParameters, rather than
> > MigrateSetParameters.
> > 
> > Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
> > fields when setting?  Funnily I tried it and actually anything that does
> > migrate-set-parameters with a "null" passed over to tls-* fields will
> > already crash qemu...
> > 
> > ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
> > 
> > #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
> > #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
> > #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
> > #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
> > #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
> > #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
> > #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
> > #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
> > #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
> > #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
> > #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
> > #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
> > #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
> > #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
> > #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
> > #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
> > #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
> > 
> > Then I suppose it means all mgmt apps are not using "null" anyway, and it
> > makes more sense to me to just remove MigrateSetParameters (by replacing it
> > with MigrationParameters).
> 
> It shouldn't be crashing,  because qmp_migrate_set_parameters()
> is turning 'null' into  "", which means the assert ought to
> never fire. Did you have a local modiification that caused
> this crash perhaps ?

I think it just got overlooked when introducing tls-authz to not have added
that special code in qmp_migrate_set_parameters(), the other two are fine.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 16:46           ` Peter Xu
@ 2023-08-04 16:48             ` Daniel P. Berrangé
  2023-08-04 21:02               ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-08-04 16:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 12:46:18PM -0400, Peter Xu wrote:
> On Fri, Aug 04, 2023 at 05:29:19PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
> > > On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> > > > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> > > > > Peter Xu <peterx@redhat.com> writes:
> > > > > 
> > > > > > We used to have three objects that have always the same list of parameters
> > > > > 
> > > > > We have!
> > > > > 
> > > > > > and comments are always duplicated:
> > > > > >
> > > > > >   - @MigrationParameter
> > > > > >   - @MigrationParameters
> > > > > >   - @MigrateSetParameters
> > > > > >
> > > > > > Before we can deduplicate the code, it's fairly straightforward to
> > > > > > deduplicate the comments first, so for each time we add a new migration
> > > > > > parameter we don't need to copy the same paragraphs three times.
> > > > > 
> > > > > De-duplicating the code would be nice, but we haven't done so in years,
> > > > > which suggests it's hard enough not to be worth the trouble.
> > > > 
> > > > The "MigrationParameter" enumeration isn't actually used in
> > > > QMP at all.
> > > > 
> > > > It is only used in HMP for hmp_migrate_set_parameter and
> > > > hmp_info_migrate_parameters. So it is questionable documenting
> > > > that enum in the QMP reference docs at all.
> > > > 
> > > > 1c1
> > > > < { 'struct': 'MigrationParameters',
> > > > ---
> > > > > { 'struct': 'MigrateSetParameters',
> > > > 14,16c14,16
> > > > <             '*tls-creds': 'str',
> > > > <             '*tls-hostname': 'str',
> > > > <             '*tls-authz': 'str',
> > > > ---
> > > > >             '*tls-creds': 'StrOrNull',
> > > > >             '*tls-hostname': 'StrOrNull',
> > > > >             '*tls-authz': 'StrOrNull',
> > > > 
> > > > Is it not valid to use StrOrNull in both cases and thus
> > > > delete the duplication here ?
> > > 
> > > I tested removing MigrateSetParameters by replacing it with
> > > MigrationParameters and it looks all fine here... I manually tested qmp/hmp
> > > on set/query parameters, and qtests are all happy.
> > 
> > I meant the other way around, such we would be using 'StrOrNull'
> > in all scenarios.
> 
> Yes, that should also work and even without worrying on nulls.  I just took
> a random one replacing the other.
> 
> > 
> > > 
> > > The only thing I see that may affect it is we used to logically allow
> > > taking things like '"tls-authz": null' in the json input, but now we won't
> > > allow that because we'll be asking for a string type only.
> > > 
> > > Since we have query-qmp-schema I suppose we're all fine, because logically
> > > the mgmt app (libvirt?) will still query that to understand the protocol,
> > > so now we'll have (response of query-qmp-schema):
> > > 
> > >         {
> > >             "arg-type": "144",
> > >             "meta-type": "command",
> > >             "name": "migrate-set-parameters",
> > >             "ret-type": "0"
> > >         },
> > > 
> > > Where 144 can start to point to MigrationParameters, rather than
> > > MigrateSetParameters.
> > > 
> > > Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
> > > fields when setting?  Funnily I tried it and actually anything that does
> > > migrate-set-parameters with a "null" passed over to tls-* fields will
> > > already crash qemu...
> > > 
> > > ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
> > > 
> > > #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
> > > #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
> > > #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
> > > #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
> > > #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
> > > #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
> > > #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
> > > #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
> > > #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
> > > #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
> > > #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
> > > #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
> > > #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
> > > #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > > #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
> > > #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > > #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > > #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
> > > #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
> > > #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
> > > 
> > > Then I suppose it means all mgmt apps are not using "null" anyway, and it
> > > makes more sense to me to just remove MigrateSetParameters (by replacing it
> > > with MigrationParameters).
> > 
> > It shouldn't be crashing,  because qmp_migrate_set_parameters()
> > is turning 'null' into  "", which means the assert ought to
> > never fire. Did you have a local modiification that caused
> > this crash perhaps ?
> 
> I think it just got overlooked when introducing tls-authz to not have added
> that special code in qmp_migrate_set_parameters(), the other two are fine.

Oh right yes, pre-existing bug.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 16:48             ` Daniel P. Berrangé
@ 2023-08-04 21:02               ` Peter Xu
  2023-08-05  8:12                 ` Markus Armbruster
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-04 21:02 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Aug 04, 2023 at 05:48:49PM +0100, Daniel P. Berrangé wrote:
> On Fri, Aug 04, 2023 at 12:46:18PM -0400, Peter Xu wrote:
> > On Fri, Aug 04, 2023 at 05:29:19PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
> > > > On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> > > > > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> > > > > > Peter Xu <peterx@redhat.com> writes:
> > > > > > 
> > > > > > > We used to have three objects that have always the same list of parameters
> > > > > > 
> > > > > > We have!
> > > > > > 
> > > > > > > and comments are always duplicated:
> > > > > > >
> > > > > > >   - @MigrationParameter
> > > > > > >   - @MigrationParameters
> > > > > > >   - @MigrateSetParameters
> > > > > > >
> > > > > > > Before we can deduplicate the code, it's fairly straightforward to
> > > > > > > deduplicate the comments first, so for each time we add a new migration
> > > > > > > parameter we don't need to copy the same paragraphs three times.
> > > > > > 
> > > > > > De-duplicating the code would be nice, but we haven't done so in years,
> > > > > > which suggests it's hard enough not to be worth the trouble.
> > > > > 
> > > > > The "MigrationParameter" enumeration isn't actually used in
> > > > > QMP at all.
> > > > > 
> > > > > It is only used in HMP for hmp_migrate_set_parameter and
> > > > > hmp_info_migrate_parameters. So it is questionable documenting
> > > > > that enum in the QMP reference docs at all.
> > > > > 
> > > > > 1c1
> > > > > < { 'struct': 'MigrationParameters',
> > > > > ---
> > > > > > { 'struct': 'MigrateSetParameters',
> > > > > 14,16c14,16
> > > > > <             '*tls-creds': 'str',
> > > > > <             '*tls-hostname': 'str',
> > > > > <             '*tls-authz': 'str',
> > > > > ---
> > > > > >             '*tls-creds': 'StrOrNull',
> > > > > >             '*tls-hostname': 'StrOrNull',
> > > > > >             '*tls-authz': 'StrOrNull',
> > > > > 
> > > > > Is it not valid to use StrOrNull in both cases and thus
> > > > > delete the duplication here ?
> > > > 
> > > > I tested removing MigrateSetParameters by replacing it with
> > > > MigrationParameters and it looks all fine here... I manually tested qmp/hmp
> > > > on set/query parameters, and qtests are all happy.
> > > 
> > > I meant the other way around, such we would be using 'StrOrNull'
> > > in all scenarios.
> > 
> > Yes, that should also work and even without worrying on nulls.  I just took
> > a random one replacing the other.
> > 
> > > 
> > > > 
> > > > The only thing I see that may affect it is we used to logically allow
> > > > taking things like '"tls-authz": null' in the json input, but now we won't
> > > > allow that because we'll be asking for a string type only.
> > > > 
> > > > Since we have query-qmp-schema I suppose we're all fine, because logically
> > > > the mgmt app (libvirt?) will still query that to understand the protocol,
> > > > so now we'll have (response of query-qmp-schema):
> > > > 
> > > >         {
> > > >             "arg-type": "144",
> > > >             "meta-type": "command",
> > > >             "name": "migrate-set-parameters",
> > > >             "ret-type": "0"
> > > >         },
> > > > 
> > > > Where 144 can start to point to MigrationParameters, rather than
> > > > MigrateSetParameters.
> > > > 
> > > > Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
> > > > fields when setting?  Funnily I tried it and actually anything that does
> > > > migrate-set-parameters with a "null" passed over to tls-* fields will
> > > > already crash qemu...
> > > > 
> > > > ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
> > > > 
> > > > #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
> > > > #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
> > > > #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
> > > > #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
> > > > #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
> > > > #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
> > > > #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
> > > > #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
> > > > #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
> > > > #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
> > > > #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
> > > > #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
> > > > #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
> > > > #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > > > #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
> > > > #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> > > > #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> > > > #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
> > > > #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
> > > > #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
> > > > 
> > > > Then I suppose it means all mgmt apps are not using "null" anyway, and it
> > > > makes more sense to me to just remove MigrateSetParameters (by replacing it
> > > > with MigrationParameters).
> > > 
> > > It shouldn't be crashing,  because qmp_migrate_set_parameters()
> > > is turning 'null' into  "", which means the assert ought to
> > > never fire. Did you have a local modiification that caused
> > > this crash perhaps ?
> > 
> > I think it just got overlooked when introducing tls-authz to not have added
> > that special code in qmp_migrate_set_parameters(), the other two are fine.
> 
> Oh right yes, pre-existing bug.

So do we really care about "null" in any form over "" (empty str) here for
tls-* parameters?

To fix this tls-authz bug we can add one more QTYPE_QNULL to QTYPE_QSTRING
convertion, but I'd rather just use "str" for all tls* fields and remove
the other two instead, if "null" is not important to anyone.

In all cases, I've appended with the two patches I'm currently testing
with.  It should also fix the tls-authz crash over 'null' by just rejecting
that.  But I'm open to anything - the patch (more than RFC) is more for
reference of whether we can drop the two objects in qapi/migration.

Thanks,

===8<===

From cd07ae2c048fe2265845bcf3f1ef4529854b71a1 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 4 Aug 2023 11:02:26 -0400
Subject: [PATCH 1/2] migration/qapi: Replace @MigrateSetParameters with
 @MigrationParameters
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

These two structs are mostly identical besides some fields (quote from
Daniel P. Berrangé in his reply):

1c1
< { 'struct': 'MigrationParameters',
---
> { 'struct': 'MigrateSetParameters',
14,16c14,16
<             '*tls-creds': 'str',
<             '*tls-hostname': 'str',
<             '*tls-authz': 'str',
---
>             '*tls-creds': 'StrOrNull',
>             '*tls-hostname': 'StrOrNull',
>             '*tls-authz': 'StrOrNull',

Here the difference is @MigrateSetParameters object would allow 'null'
values for any tls-* fields passed in.

Is that really important?  It seems not, because right now if anyone tries
to pass over a 'null' value to any of them, QEMU will already crash:

./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.

And it's actually important to fix this crash instead.

To fix it, we can either change the code to handle QTYPE_NULL, or rather we
can directly replace all @MigrateSetParameters references with
@MigrationParameters knowing that no user is anyway using 'null' as an
input.

This greatly deduplicates the code not only in qapi/migration.json, but
also in the generic migration code.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json            | 185 +--------------------------------
 migration/migration-hmp-cmds.c |  16 +--
 migration/options.c            | 140 ++-----------------------
 3 files changed, 12 insertions(+), 329 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..0416da65b5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -851,189 +851,6 @@
            { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
            'vcpu-dirty-limit'] }
 
-##
-# @MigrateSetParameters:
-#
-# @announce-initial: Initial delay (in milliseconds) before sending
-#     the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-#     the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-#     migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-#     subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: compression level
-#
-# @compress-threads: compression thread count
-#
-# @compress-wait-thread: Controls behavior when all compression
-#     threads are currently busy.  If true (default), wait for a free
-#     compression thread to become available; otherwise, send the page
-#     uncompressed.  (Since 3.1)
-#
-# @decompress-threads: decompression thread count
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-#     bytes_xfer_period to trigger throttling.  It is expressed as
-#     percentage.  The default value is 50. (Since 5.0)
-#
-# @cpu-throttle-initial: Initial percentage of time guest cpus are
-#     throttled when migration auto-converge is activated.  The
-#     default value is 20. (Since 2.7)
-#
-# @cpu-throttle-increment: throttle percentage increase each time
-#     auto-converge detects that migration is not making progress.
-#     The default value is 10. (Since 2.7)
-#
-# @cpu-throttle-tailslow: Make CPU throttling slower at tail stage At
-#     the tail stage of throttling, the Guest is very sensitive to CPU
-#     percentage while the @cpu-throttle -increment is excessive
-#     usually at tail stage.  If this parameter is true, we will
-#     compute the ideal CPU percentage used by the Guest, which may
-#     exactly make the dirty rate match the dirty rate threshold.
-#     Then we will choose a smaller throttle increment between the one
-#     specified by @cpu-throttle-increment and the one generated by
-#     ideal CPU percentage.  Therefore, it is compatible to
-#     traditional throttling, meanwhile the throttle increment won't
-#     be excessive at tail stage.  The default value is false.  (Since
-#     5.1)
-#
-# @tls-creds: ID of the 'tls-creds' object that provides credentials
-#     for establishing a TLS connection over the migration data
-#     channel.  On the outgoing side of the migration, the credentials
-#     must be for a 'client' endpoint, while for the incoming side the
-#     credentials must be for a 'server' endpoint.  Setting this to a
-#     non-empty string enables TLS for all migrations.  An empty
-#     string means that QEMU will use plain text mode for migration,
-#     rather than TLS (Since 2.9) Previously (since 2.7), this was
-#     reported by omitting tls-creds instead.
-#
-# @tls-hostname: hostname of the target host for the migration.  This
-#     is required when using x509 based TLS credentials and the
-#     migration URI does not already include a hostname.  For example
-#     if using fd: or exec: based migration, the hostname must be
-#     provided so that the server's x509 certificate identity can be
-#     validated.  (Since 2.7) An empty string means that QEMU will use
-#     the hostname associated with the migration URI, if any.  (Since
-#     2.9) Previously (since 2.7), this was reported by omitting
-#     tls-hostname instead.
-#
-# @max-bandwidth: to set maximum speed for migration.  maximum speed
-#     in bytes per second.  (Since 2.8)
-#
-# @downtime-limit: set maximum tolerated downtime for migration.
-#     maximum downtime in milliseconds (Since 2.8)
-#
-# @x-checkpoint-delay: the delay time between two COLO checkpoints.
-#     (Since 2.8)
-#
-# @block-incremental: Affects how much storage is migrated when the
-#     block migration capability is enabled.  When false, the entire
-#     storage backing chain is migrated into a flattened image at the
-#     destination; when true, only the active qcow2 layer is migrated
-#     and the destination must already have access to the same backing
-#     chain as was used on the source.  (since 2.10)
-#
-# @multifd-channels: Number of channels used to migrate data in
-#     parallel.  This is the same number that the number of sockets
-#     used for migration.  The default value is 2 (since 4.0)
-#
-# @xbzrle-cache-size: cache size to be used by XBZRLE migration.  It
-#     needs to be a multiple of the target page size and a power of 2
-#     (Since 2.11)
-#
-# @max-postcopy-bandwidth: Background transfer bandwidth during
-#     postcopy.  Defaults to 0 (unlimited).  In bytes per second.
-#     (Since 3.0)
-#
-# @max-cpu-throttle: maximum cpu throttle percentage.  The default
-#     value is 99. (Since 3.1)
-#
-# @multifd-compression: Which compression method to use.  Defaults to
-#     none.  (Since 5.0)
-#
-# @multifd-zlib-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 9,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 9 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @multifd-zstd-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 20,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 20 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
-#     aliases for the purpose of dirty bitmap migration.  Such aliases
-#     may for example be the corresponding names on the opposite site.
-#     The mapping must be one-to-one, but not necessarily complete: On
-#     the source, unmapped bitmaps and all bitmaps on unmapped nodes
-#     will be ignored.  On the destination, encountering an unmapped
-#     alias in the incoming migration stream will result in a report,
-#     and all further bitmap migration data will then be discarded.
-#     Note that the destination does not know about bitmaps it does
-#     not receive, so there is no limitation or requirement regarding
-#     the number of bitmaps received, or how they are named, or on
-#     which nodes they are placed.  By default (when this parameter
-#     has never been set), bitmap names are mapped to themselves.
-#     Nodes are mapped to their block device name if there is one, and
-#     to their node name otherwise.  (Since 5.2)
-#
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
-#     limit during live migration.  Should be in the range 1 to 1000ms.
-#     Defaults to 1000ms.  (Since 8.1)
-#
-# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#     Defaults to 1.  (Since 8.1)
-#
-# Features:
-#
-# @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
-#     are experimental.
-#
-# TODO: either fuse back into MigrationParameters, or make
-#     MigrationParameters members mandatory
-#
-# Since: 2.4
-##
-{ 'struct': 'MigrateSetParameters',
-  'data': { '*announce-initial': 'size',
-            '*announce-max': 'size',
-            '*announce-rounds': 'size',
-            '*announce-step': 'size',
-            '*compress-level': 'uint8',
-            '*compress-threads': 'uint8',
-            '*compress-wait-thread': 'bool',
-            '*decompress-threads': 'uint8',
-            '*throttle-trigger-threshold': 'uint8',
-            '*cpu-throttle-initial': 'uint8',
-            '*cpu-throttle-increment': 'uint8',
-            '*cpu-throttle-tailslow': 'bool',
-            '*tls-creds': 'StrOrNull',
-            '*tls-hostname': 'StrOrNull',
-            '*tls-authz': 'StrOrNull',
-            '*max-bandwidth': 'size',
-            '*downtime-limit': 'uint64',
-            '*x-checkpoint-delay': { 'type': 'uint32',
-                                     'features': [ 'unstable' ] },
-            '*block-incremental': 'bool',
-            '*multifd-channels': 'uint8',
-            '*xbzrle-cache-size': 'size',
-            '*max-postcopy-bandwidth': 'size',
-            '*max-cpu-throttle': 'uint8',
-            '*multifd-compression': 'MultiFDCompression',
-            '*multifd-zlib-level': 'uint8',
-            '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
-            '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
-                                            'features': [ 'unstable' ] },
-            '*vcpu-dirty-limit': 'uint64'} }
-
 ##
 # @migrate-set-parameters:
 #
@@ -1048,7 +865,7 @@
 # <- { "return": {} }
 ##
 { 'command': 'migrate-set-parameters', 'boxed': true,
-  'data': 'MigrateSetParameters' }
+  'data': 'MigrationParameters' }
 
 ##
 # @MigrationParameters:
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index c115ef2d23..a64672f640 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -497,7 +497,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     const char *param = qdict_get_str(qdict, "parameter");
     const char *valuestr = qdict_get_str(qdict, "value");
     Visitor *v = string_input_visitor_new(valuestr);
-    MigrateSetParameters *p = g_new0(MigrateSetParameters, 1);
+    MigrationParameters *p = g_new0(MigrationParameters, 1);
     uint64_t valuebw = 0;
     uint64_t cache_size;
     Error *err = NULL;
@@ -546,19 +546,13 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
         break;
     case MIGRATION_PARAMETER_TLS_CREDS:
-        p->tls_creds = g_new0(StrOrNull, 1);
-        p->tls_creds->type = QTYPE_QSTRING;
-        visit_type_str(v, param, &p->tls_creds->u.s, &err);
+        visit_type_str(v, param, &p->tls_creds, &err);
         break;
     case MIGRATION_PARAMETER_TLS_HOSTNAME:
-        p->tls_hostname = g_new0(StrOrNull, 1);
-        p->tls_hostname->type = QTYPE_QSTRING;
-        visit_type_str(v, param, &p->tls_hostname->u.s, &err);
+        visit_type_str(v, param, &p->tls_hostname, &err);
         break;
     case MIGRATION_PARAMETER_TLS_AUTHZ:
-        p->tls_authz = g_new0(StrOrNull, 1);
-        p->tls_authz->type = QTYPE_QSTRING;
-        visit_type_str(v, param, &p->tls_authz->u.s, &err);
+        visit_type_str(v, param, &p->tls_authz, &err);
         break;
     case MIGRATION_PARAMETER_MAX_BANDWIDTH:
         p->has_max_bandwidth = true;
@@ -657,7 +651,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     qmp_migrate_set_parameters(p, &err);
 
  cleanup:
-    qapi_free_MigrateSetParameters(p);
+    qapi_free_MigrationParameters(p);
     visit_free(v);
     hmp_handle_error(mon, err);
 }
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..7967c572fc 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1172,113 +1172,7 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
     return true;
 }
 
-static void migrate_params_test_apply(MigrateSetParameters *params,
-                                      MigrationParameters *dest)
-{
-    *dest = migrate_get_current()->parameters;
-
-    /* TODO use QAPI_CLONE() instead of duplicating it inline */
-
-    if (params->has_compress_level) {
-        dest->compress_level = params->compress_level;
-    }
-
-    if (params->has_compress_threads) {
-        dest->compress_threads = params->compress_threads;
-    }
-
-    if (params->has_compress_wait_thread) {
-        dest->compress_wait_thread = params->compress_wait_thread;
-    }
-
-    if (params->has_decompress_threads) {
-        dest->decompress_threads = params->decompress_threads;
-    }
-
-    if (params->has_throttle_trigger_threshold) {
-        dest->throttle_trigger_threshold = params->throttle_trigger_threshold;
-    }
-
-    if (params->has_cpu_throttle_initial) {
-        dest->cpu_throttle_initial = params->cpu_throttle_initial;
-    }
-
-    if (params->has_cpu_throttle_increment) {
-        dest->cpu_throttle_increment = params->cpu_throttle_increment;
-    }
-
-    if (params->has_cpu_throttle_tailslow) {
-        dest->cpu_throttle_tailslow = params->cpu_throttle_tailslow;
-    }
-
-    if (params->tls_creds) {
-        assert(params->tls_creds->type == QTYPE_QSTRING);
-        dest->tls_creds = params->tls_creds->u.s;
-    }
-
-    if (params->tls_hostname) {
-        assert(params->tls_hostname->type == QTYPE_QSTRING);
-        dest->tls_hostname = params->tls_hostname->u.s;
-    }
-
-    if (params->has_max_bandwidth) {
-        dest->max_bandwidth = params->max_bandwidth;
-    }
-
-    if (params->has_downtime_limit) {
-        dest->downtime_limit = params->downtime_limit;
-    }
-
-    if (params->has_x_checkpoint_delay) {
-        dest->x_checkpoint_delay = params->x_checkpoint_delay;
-    }
-
-    if (params->has_block_incremental) {
-        dest->block_incremental = params->block_incremental;
-    }
-    if (params->has_multifd_channels) {
-        dest->multifd_channels = params->multifd_channels;
-    }
-    if (params->has_multifd_compression) {
-        dest->multifd_compression = params->multifd_compression;
-    }
-    if (params->has_xbzrle_cache_size) {
-        dest->xbzrle_cache_size = params->xbzrle_cache_size;
-    }
-    if (params->has_max_postcopy_bandwidth) {
-        dest->max_postcopy_bandwidth = params->max_postcopy_bandwidth;
-    }
-    if (params->has_max_cpu_throttle) {
-        dest->max_cpu_throttle = params->max_cpu_throttle;
-    }
-    if (params->has_announce_initial) {
-        dest->announce_initial = params->announce_initial;
-    }
-    if (params->has_announce_max) {
-        dest->announce_max = params->announce_max;
-    }
-    if (params->has_announce_rounds) {
-        dest->announce_rounds = params->announce_rounds;
-    }
-    if (params->has_announce_step) {
-        dest->announce_step = params->announce_step;
-    }
-
-    if (params->has_block_bitmap_mapping) {
-        dest->has_block_bitmap_mapping = true;
-        dest->block_bitmap_mapping = params->block_bitmap_mapping;
-    }
-
-    if (params->has_x_vcpu_dirty_limit_period) {
-        dest->x_vcpu_dirty_limit_period =
-            params->x_vcpu_dirty_limit_period;
-    }
-    if (params->has_vcpu_dirty_limit) {
-        dest->vcpu_dirty_limit = params->vcpu_dirty_limit;
-    }
-}
-
-static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
+static void migrate_params_apply(MigrationParameters *params, Error **errp)
 {
     MigrationState *s = migrate_get_current();
 
@@ -1318,20 +1212,17 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
 
     if (params->tls_creds) {
         g_free(s->parameters.tls_creds);
-        assert(params->tls_creds->type == QTYPE_QSTRING);
-        s->parameters.tls_creds = g_strdup(params->tls_creds->u.s);
+        s->parameters.tls_creds = g_strdup(params->tls_creds);
     }
 
     if (params->tls_hostname) {
         g_free(s->parameters.tls_hostname);
-        assert(params->tls_hostname->type == QTYPE_QSTRING);
-        s->parameters.tls_hostname = g_strdup(params->tls_hostname->u.s);
+        s->parameters.tls_hostname = g_strdup(params->tls_hostname);
     }
 
     if (params->tls_authz) {
         g_free(s->parameters.tls_authz);
-        assert(params->tls_authz->type == QTYPE_QSTRING);
-        s->parameters.tls_authz = g_strdup(params->tls_authz->u.s);
+        s->parameters.tls_authz = g_strdup(params->tls_authz);
     }
 
     if (params->has_max_bandwidth) {
@@ -1404,28 +1295,9 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     }
 }
 
-void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
+void qmp_migrate_set_parameters(MigrationParameters *params, Error **errp)
 {
-    MigrationParameters tmp;
-
-    /* TODO Rewrite "" to null instead */
-    if (params->tls_creds
-        && params->tls_creds->type == QTYPE_QNULL) {
-        qobject_unref(params->tls_creds->u.n);
-        params->tls_creds->type = QTYPE_QSTRING;
-        params->tls_creds->u.s = strdup("");
-    }
-    /* TODO Rewrite "" to null instead */
-    if (params->tls_hostname
-        && params->tls_hostname->type == QTYPE_QNULL) {
-        qobject_unref(params->tls_hostname->u.n);
-        params->tls_hostname->type = QTYPE_QSTRING;
-        params->tls_hostname->u.s = strdup("");
-    }
-
-    migrate_params_test_apply(params, &tmp);
-
-    if (!migrate_params_check(&tmp, errp)) {
+    if (!migrate_params_check(params, errp)) {
         /* Invalid parameter */
         return;
     }
-- 
2.41.0

===8<===

From 4d0661be6be85631f64c6bcb8cec9f30a49bc805 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 4 Aug 2023 12:32:21 -0400
Subject: [PATCH 2/2] migration/qapi: Drop @MigrationParameter enum

Drop the enum in qapi because it is never used in QMP APIs.  Instead making
it an internal definition for QEMU so that we can decouple it from QAPI,
and also we can deduplicate the QAPI documentations.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json            | 179 ---------------------------------
 migration/options.h            |  47 +++++++++
 migration/migration-hmp-cmds.c |   3 +-
 migration/options.c            |  51 ++++++++++
 4 files changed, 100 insertions(+), 180 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 0416da65b5..4846b2a98e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -672,185 +672,6 @@
       'bitmaps': [ 'BitmapMigrationBitmapAlias' ]
   } }
 
-##
-# @MigrationParameter:
-#
-# Migration parameters enumeration
-#
-# @announce-initial: Initial delay (in milliseconds) before sending
-#     the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-#     the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-#     migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-#     subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 9,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 9 means best compression ratio which will consume
-#     more CPU.
-#
-# @compress-threads: Set compression thread count to be used in live
-#     migration, the compression thread count is an integer between 1
-#     and 255.
-#
-# @compress-wait-thread: Controls behavior when all compression
-#     threads are currently busy.  If true (default), wait for a free
-#     compression thread to become available; otherwise, send the page
-#     uncompressed.  (Since 3.1)
-#
-# @decompress-threads: Set decompression thread count to be used in
-#     live migration, the decompression thread count is an integer
-#     between 1 and 255. Usually, decompression is at least 4 times as
-#     fast as compression, so set the decompress-threads to the number
-#     about 1/4 of compress-threads is adequate.
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-#     bytes_xfer_period to trigger throttling.  It is expressed as
-#     percentage.  The default value is 50. (Since 5.0)
-#
-# @cpu-throttle-initial: Initial percentage of time guest cpus are
-#     throttled when migration auto-converge is activated.  The
-#     default value is 20. (Since 2.7)
-#
-# @cpu-throttle-increment: throttle percentage increase each time
-#     auto-converge detects that migration is not making progress.
-#     The default value is 10. (Since 2.7)
-#
-# @cpu-throttle-tailslow: Make CPU throttling slower at tail stage At
-#     the tail stage of throttling, the Guest is very sensitive to CPU
-#     percentage while the @cpu-throttle -increment is excessive
-#     usually at tail stage.  If this parameter is true, we will
-#     compute the ideal CPU percentage used by the Guest, which may
-#     exactly make the dirty rate match the dirty rate threshold.
-#     Then we will choose a smaller throttle increment between the one
-#     specified by @cpu-throttle-increment and the one generated by
-#     ideal CPU percentage.  Therefore, it is compatible to
-#     traditional throttling, meanwhile the throttle increment won't
-#     be excessive at tail stage.  The default value is false.  (Since
-#     5.1)
-#
-# @tls-creds: ID of the 'tls-creds' object that provides credentials
-#     for establishing a TLS connection over the migration data
-#     channel.  On the outgoing side of the migration, the credentials
-#     must be for a 'client' endpoint, while for the incoming side the
-#     credentials must be for a 'server' endpoint.  Setting this will
-#     enable TLS for all migrations.  The default is unset, resulting
-#     in unsecured migration at the QEMU level.  (Since 2.7)
-#
-# @tls-hostname: hostname of the target host for the migration.  This
-#     is required when using x509 based TLS credentials and the
-#     migration URI does not already include a hostname.  For example
-#     if using fd: or exec: based migration, the hostname must be
-#     provided so that the server's x509 certificate identity can be
-#     validated.  (Since 2.7)
-#
-# @tls-authz: ID of the 'authz' object subclass that provides access
-#     control checking of the TLS x509 certificate distinguished name.
-#     This object is only resolved at time of use, so can be deleted
-#     and recreated on the fly while the migration server is active.
-#     If missing, it will default to denying access (Since 4.0)
-#
-# @max-bandwidth: to set maximum speed for migration.  maximum speed
-#     in bytes per second.  (Since 2.8)
-#
-# @downtime-limit: set maximum tolerated downtime for migration.
-#     maximum downtime in milliseconds (Since 2.8)
-#
-# @x-checkpoint-delay: The delay time (in ms) between two COLO
-#     checkpoints in periodic mode.  (Since 2.8)
-#
-# @block-incremental: Affects how much storage is migrated when the
-#     block migration capability is enabled.  When false, the entire
-#     storage backing chain is migrated into a flattened image at the
-#     destination; when true, only the active qcow2 layer is migrated
-#     and the destination must already have access to the same backing
-#     chain as was used on the source.  (since 2.10)
-#
-# @multifd-channels: Number of channels used to migrate data in
-#     parallel.  This is the same number that the number of sockets
-#     used for migration.  The default value is 2 (since 4.0)
-#
-# @xbzrle-cache-size: cache size to be used by XBZRLE migration.  It
-#     needs to be a multiple of the target page size and a power of 2
-#     (Since 2.11)
-#
-# @max-postcopy-bandwidth: Background transfer bandwidth during
-#     postcopy.  Defaults to 0 (unlimited).  In bytes per second.
-#     (Since 3.0)
-#
-# @max-cpu-throttle: maximum cpu throttle percentage.  Defaults to 99.
-#     (Since 3.1)
-#
-# @multifd-compression: Which compression method to use.  Defaults to
-#     none.  (Since 5.0)
-#
-# @multifd-zlib-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 9,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 9 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @multifd-zstd-level: Set the compression level to be used in live
-#     migration, the compression level is an integer between 0 and 20,
-#     where 0 means no compression, 1 means the best compression
-#     speed, and 20 means best compression ratio which will consume
-#     more CPU. Defaults to 1. (Since 5.0)
-#
-# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
-#     aliases for the purpose of dirty bitmap migration.  Such aliases
-#     may for example be the corresponding names on the opposite site.
-#     The mapping must be one-to-one, but not necessarily complete: On
-#     the source, unmapped bitmaps and all bitmaps on unmapped nodes
-#     will be ignored.  On the destination, encountering an unmapped
-#     alias in the incoming migration stream will result in a report,
-#     and all further bitmap migration data will then be discarded.
-#     Note that the destination does not know about bitmaps it does
-#     not receive, so there is no limitation or requirement regarding
-#     the number of bitmaps received, or how they are named, or on
-#     which nodes they are placed.  By default (when this parameter
-#     has never been set), bitmap names are mapped to themselves.
-#     Nodes are mapped to their block device name if there is one, and
-#     to their node name otherwise.  (Since 5.2)
-#
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
-#     limit during live migration.  Should be in the range 1 to 1000ms.
-#     Defaults to 1000ms.  (Since 8.1)
-#
-# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#     Defaults to 1.  (Since 8.1)
-#
-# Features:
-#
-# @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
-#     are experimental.
-#
-# Since: 2.4
-##
-{ 'enum': 'MigrationParameter',
-  'data': ['announce-initial', 'announce-max',
-           'announce-rounds', 'announce-step',
-           'compress-level', 'compress-threads', 'decompress-threads',
-           'compress-wait-thread', 'throttle-trigger-threshold',
-           'cpu-throttle-initial', 'cpu-throttle-increment',
-           'cpu-throttle-tailslow',
-           'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-           'downtime-limit',
-           { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
-           'block-incremental',
-           'multifd-channels',
-           'xbzrle-cache-size', 'max-postcopy-bandwidth',
-           'max-cpu-throttle', 'multifd-compression',
-           'multifd-zlib-level', 'multifd-zstd-level',
-           'block-bitmap-mapping',
-           { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
-           'vcpu-dirty-limit'] }
-
 ##
 # @migrate-set-parameters:
 #
diff --git a/migration/options.h b/migration/options.h
index 045e2a41a2..b1b3a26604 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -65,6 +65,53 @@ bool migrate_cap_set(int cap, bool value, Error **errp);
 
 /* parameters */
 
+typedef enum {
+    MIGRATION_PARAMETER_ANNOUNCE_INITIAL,
+    MIGRATION_PARAMETER_ANNOUNCE_MAX,
+    MIGRATION_PARAMETER_ANNOUNCE_ROUNDS,
+    MIGRATION_PARAMETER_ANNOUNCE_STEP,
+    MIGRATION_PARAMETER_COMPRESS_LEVEL,
+    MIGRATION_PARAMETER_COMPRESS_THREADS,
+    MIGRATION_PARAMETER_DECOMPRESS_THREADS,
+    MIGRATION_PARAMETER_COMPRESS_WAIT_THREAD,
+    MIGRATION_PARAMETER_THROTTLE_TRIGGER_THRESHOLD,
+    MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL,
+    MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT,
+    MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW,
+    MIGRATION_PARAMETER_TLS_CREDS,
+    MIGRATION_PARAMETER_TLS_HOSTNAME,
+    MIGRATION_PARAMETER_TLS_AUTHZ,
+    MIGRATION_PARAMETER_MAX_BANDWIDTH,
+    MIGRATION_PARAMETER_DOWNTIME_LIMIT,
+    MIGRATION_PARAMETER_X_CHECKPOINT_DELAY,
+    MIGRATION_PARAMETER_BLOCK_INCREMENTAL,
+    MIGRATION_PARAMETER_MULTIFD_CHANNELS,
+    MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE,
+    MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH,
+    MIGRATION_PARAMETER_MAX_CPU_THROTTLE,
+    MIGRATION_PARAMETER_MULTIFD_COMPRESSION,
+    MIGRATION_PARAMETER_MULTIFD_ZLIB_LEVEL,
+    MIGRATION_PARAMETER_MULTIFD_ZSTD_LEVEL,
+    MIGRATION_PARAMETER_BLOCK_BITMAP_MAPPING,
+    MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD,
+    MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT,
+    MIGRATION_PARAMETER__MAX,
+} MigrationParameter;
+
+extern const char *MigrationParameter_string[MIGRATION_PARAMETER__MAX];
+#define  MigrationParameter_str(p)  MigrationParameter_string[p]
+
+/**
+ * @MigrationParameter_from_str(): Parse string into a MigrationParameter
+ *
+ * @param: input string
+ * @errp: error message if failed to parse the string
+ *
+ * Returns MigrationParameter enum (>=0) if succeed, or negative otherwise
+ * which will always setup @errp.
+ */
+int MigrationParameter_from_str(const char *param, Error **errp);
+
 const BitmapMigrationNodeAliasList *migrate_block_bitmap_mapping(void);
 bool migrate_has_block_bitmap_mapping(void);
 
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index a64672f640..68c68079c2 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -31,6 +31,7 @@
 #include "ui/qemu-spice.h"
 #include "sysemu/sysemu.h"
 #include "migration.h"
+#include "migration/options.h"
 
 static void migration_global_dump(Monitor *mon)
 {
@@ -503,7 +504,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     Error *err = NULL;
     int val, ret;
 
-    val = qapi_enum_parse(&MigrationParameter_lookup, param, -1, &err);
+    val = MigrationParameter_from_str(param, &err);
     if (val < 0) {
         goto cleanup;
     }
diff --git a/migration/options.c b/migration/options.c
index 7967c572fc..0e661bc251 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -84,6 +84,57 @@
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD     1000    /* milliseconds */
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT            1       /* MB/s */
 
+const char *MigrationParameter_string[MIGRATION_PARAMETER__MAX] = {
+    [MIGRATION_PARAMETER_ANNOUNCE_INITIAL] = "announce-initial",
+    [MIGRATION_PARAMETER_ANNOUNCE_MAX] = "announce-max",
+    [MIGRATION_PARAMETER_ANNOUNCE_ROUNDS] = "announce-rounds",
+    [MIGRATION_PARAMETER_ANNOUNCE_STEP] = "announce-step",
+    [MIGRATION_PARAMETER_COMPRESS_LEVEL] = "compress-level",
+    [MIGRATION_PARAMETER_COMPRESS_THREADS] = "compress-threads",
+    [MIGRATION_PARAMETER_DECOMPRESS_THREADS] = "decompress-threads",
+    [MIGRATION_PARAMETER_COMPRESS_WAIT_THREAD] = "compress-wait-thread",
+    [MIGRATION_PARAMETER_THROTTLE_TRIGGER_THRESHOLD] = "throttle-trigger-threshold",
+    [MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL] = "cpu-throttle-initial",
+    [MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT] = "cpu-throttle-increment",
+    [MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW] = "cpu-throttle-tailslow",
+    [MIGRATION_PARAMETER_TLS_CREDS] = "tls-creds",
+    [MIGRATION_PARAMETER_TLS_HOSTNAME] = "tls-hostname",
+    [MIGRATION_PARAMETER_TLS_AUTHZ] = "tls-authz",
+    [MIGRATION_PARAMETER_MAX_BANDWIDTH] = "max-bandwidth",
+    [MIGRATION_PARAMETER_DOWNTIME_LIMIT] = "downtime-limit",
+    [MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] = "x-checkpoint-delay",
+    [MIGRATION_PARAMETER_BLOCK_INCREMENTAL] = "block-incremental",
+    [MIGRATION_PARAMETER_MULTIFD_CHANNELS] = "multifd-channels",
+    [MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE] = "xbzrle-cache-size",
+    [MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH] = "max-postcopy-bandwidth",
+    [MIGRATION_PARAMETER_MAX_CPU_THROTTLE] = "max-cpu-throttle",
+    [MIGRATION_PARAMETER_MULTIFD_COMPRESSION] = "multifd-compression",
+    [MIGRATION_PARAMETER_MULTIFD_ZLIB_LEVEL] = "multifd-zlib-level",
+    [MIGRATION_PARAMETER_MULTIFD_ZSTD_LEVEL] = "multifd-zstd-level",
+    [MIGRATION_PARAMETER_BLOCK_BITMAP_MAPPING] = "block-bitmap-mapping",
+    [MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD] = "x-vcpu-dirty-limit-period",
+    [MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT] = "vcpu-dirty-limit",
+};
+
+int MigrationParameter_from_str(const char *param, Error **errp)
+{
+    int i;
+
+    if (!param) {
+        error_setg(errp, "Missing parameter value");
+        return -1;
+    }
+
+    for (i = 0; i < MIGRATION_PARAMETER__MAX; i++) {
+        if (!strcmp(param, MigrationParameter_string[i])) {
+            return i;
+        }
+    }
+
+    error_setg(errp, "Invalid parameter value: %s", param);
+    return -1;
+}
+
 Property migration_properties[] = {
     DEFINE_PROP_BOOL("store-global-state", MigrationState,
                      store_global_state, true),
-- 
2.41.0

===8<===

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-04 21:02               ` Peter Xu
@ 2023-08-05  8:12                 ` Markus Armbruster
  2023-08-06 15:49                   ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Armbruster @ 2023-08-05  8:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Markus Armbruster, qemu-devel, Zhiyi Guo,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

Peter Xu <peterx@redhat.com> writes:

> On Fri, Aug 04, 2023 at 05:48:49PM +0100, Daniel P. Berrangé wrote:
>> On Fri, Aug 04, 2023 at 12:46:18PM -0400, Peter Xu wrote:
>> > On Fri, Aug 04, 2023 at 05:29:19PM +0100, Daniel P. Berrangé wrote:
>> > > On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
>> > > > On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
>> > > > > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
>> > > > > > Peter Xu <peterx@redhat.com> writes:
>> > > > > > 
>> > > > > > > We used to have three objects that have always the same list of parameters
>> > > > > > 
>> > > > > > We have!
>> > > > > > 
>> > > > > > > and comments are always duplicated:
>> > > > > > >
>> > > > > > >   - @MigrationParameter
>> > > > > > >   - @MigrationParameters
>> > > > > > >   - @MigrateSetParameters
>> > > > > > >
>> > > > > > > Before we can deduplicate the code, it's fairly straightforward to
>> > > > > > > deduplicate the comments first, so for each time we add a new migration
>> > > > > > > parameter we don't need to copy the same paragraphs three times.
>> > > > > > 
>> > > > > > De-duplicating the code would be nice, but we haven't done so in years,
>> > > > > > which suggests it's hard enough not to be worth the trouble.
>> > > > > 
>> > > > > The "MigrationParameter" enumeration isn't actually used in
>> > > > > QMP at all.
>> > > > > 
>> > > > > It is only used in HMP for hmp_migrate_set_parameter and
>> > > > > hmp_info_migrate_parameters. So it is questionable documenting
>> > > > > that enum in the QMP reference docs at all.
>> > > > > 
>> > > > > 1c1
>> > > > > < { 'struct': 'MigrationParameters',
>> > > > > ---
>> > > > > > { 'struct': 'MigrateSetParameters',
>> > > > > 14,16c14,16
>> > > > > <             '*tls-creds': 'str',
>> > > > > <             '*tls-hostname': 'str',
>> > > > > <             '*tls-authz': 'str',
>> > > > > ---
>> > > > > >             '*tls-creds': 'StrOrNull',
>> > > > > >             '*tls-hostname': 'StrOrNull',
>> > > > > >             '*tls-authz': 'StrOrNull',
>> > > > > 
>> > > > > Is it not valid to use StrOrNull in both cases and thus
>> > > > > delete the duplication here ?
>> > > > 
>> > > > I tested removing MigrateSetParameters by replacing it with
>> > > > MigrationParameters and it looks all fine here... I manually tested qmp/hmp
>> > > > on set/query parameters, and qtests are all happy.
>> > > 
>> > > I meant the other way around, such we would be using 'StrOrNull'
>> > > in all scenarios.
>> > 
>> > Yes, that should also work and even without worrying on nulls.  I just took
>> > a random one replacing the other.
>> > 
>> > > 
>> > > > 
>> > > > The only thing I see that may affect it is we used to logically allow
>> > > > taking things like '"tls-authz": null' in the json input, but now we won't
>> > > > allow that because we'll be asking for a string type only.
>> > > > 
>> > > > Since we have query-qmp-schema I suppose we're all fine, because logically
>> > > > the mgmt app (libvirt?) will still query that to understand the protocol,
>> > > > so now we'll have (response of query-qmp-schema):
>> > > > 
>> > > >         {
>> > > >             "arg-type": "144",
>> > > >             "meta-type": "command",
>> > > >             "name": "migrate-set-parameters",
>> > > >             "ret-type": "0"
>> > > >         },
>> > > > 
>> > > > Where 144 can start to point to MigrationParameters, rather than
>> > > > MigrateSetParameters.
>> > > > 
>> > > > Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
>> > > > fields when setting?  Funnily I tried it and actually anything that does
>> > > > migrate-set-parameters with a "null" passed over to tls-* fields will
>> > > > already crash qemu...
>> > > > 
>> > > > ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
>> > > > 
>> > > > #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
>> > > > #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
>> > > > #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
>> > > > #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
>> > > > #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
>> > > > #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
>> > > > #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
>> > > > #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
>> > > > #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
>> > > > #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
>> > > > #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
>> > > > #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
>> > > > #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
>> > > > #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>> > > > #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
>> > > > #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
>> > > > #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
>> > > > #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
>> > > > #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
>> > > > #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
>> > > > 
>> > > > Then I suppose it means all mgmt apps are not using "null" anyway, and it
>> > > > makes more sense to me to just remove MigrateSetParameters (by replacing it
>> > > > with MigrationParameters).
>> > > 
>> > > It shouldn't be crashing,  because qmp_migrate_set_parameters()
>> > > is turning 'null' into  "", which means the assert ought to
>> > > never fire. Did you have a local modiification that caused
>> > > this crash perhaps ?
>> > 
>> > I think it just got overlooked when introducing tls-authz to not have added
>> > that special code in qmp_migrate_set_parameters(), the other two are fine.
>> 
>> Oh right yes, pre-existing bug.
>
> So do we really care about "null" in any form over "" (empty str) here for
> tls-* parameters?

In my opinion, the use of "" was a design mistake.  Here's my argument:

commit 01fa55982692fb51a16049b63b571651a1053989
Author: Markus Armbruster <armbru@redhat.com>
Date:   Tue Jul 18 14:42:04 2017 +0200

    migration: Use JSON null instead of "" to reset parameter to default
    
    migrate-set-parameters sets migration parameters according to is
    arguments like this:
    
    * Present means "set the parameter to this value"
    
    * Absent means "leave the parameter unchanged"
    
    * Except for parameters tls_creds and tls_hostname, "" means "reset
      the parameter to its default value
    
    The first two are perfectly normal: presence of the parameter makes
    the command do something.
    
    The third one overloads the parameter with a second meaning.  The
    overloading is *implicit*, i.e. it's not visible in the types.  Works
    here, because "" is neither a valid TLS credentials ID, nor a valid
    host name.
    
    Pressing argument values the schema accepts, but are semantically
    invalid, into service to mean "reset to default" is not general, as
    suitable invalid values need not exist.  I also find it ugly.
    
    To clean this up, we could add a separate flag argument to ask for
    "reset to default", or add a distinct value to @tls_creds and
    @tls_hostname.  This commit implements the latter: add JSON null to
    the values of @tls_creds and @tls_hostname, deprecate "".
    
    Because we're so close to the 2.10 freeze, implement it in the
    stupidest way possible: have qmp_migrate_set_parameters() rewrite null
    to "" before anything else can see the null.  The proper way to do it
    would be rewriting "" to null, but that requires fixing up code to
    work with null.  Add TODO comments for that.
    
    Signed-off-by: Markus Armbruster <armbru@redhat.com>
    Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>

> To fix this tls-authz bug we can add one more QTYPE_QNULL to QTYPE_QSTRING
> convertion, but I'd rather just use "str" for all tls* fields and remove
> the other two instead, if "null" is not important to anyone.

"Important" sounds too much like absolutes :)

I think we have a tradeoff here.  If perpetuating the unclean and ugly
use of "" is what it takes to de-triplicate migration parameters, we may
decide to accept that.

> In all cases, I've appended with the two patches I'm currently testing
> with.  It should also fix the tls-authz crash over 'null' by just rejecting
> that.  But I'm open to anything - the patch (more than RFC) is more for
> reference of whether we can drop the two objects in qapi/migration.
>
> Thanks,



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-05  8:12                 ` Markus Armbruster
@ 2023-08-06 15:49                   ` Peter Xu
  2023-08-08 20:03                     ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-06 15:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	qemu-devel, Zhiyi Guo, Leonardo Bras Soares Passos,
	Fabiano Rosas, Juan Quintela, Eric Blake, Chensheng Dong

On Sat, Aug 05, 2023 at 10:12:00AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Aug 04, 2023 at 05:48:49PM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Aug 04, 2023 at 12:46:18PM -0400, Peter Xu wrote:
> >> > On Fri, Aug 04, 2023 at 05:29:19PM +0100, Daniel P. Berrangé wrote:
> >> > > On Fri, Aug 04, 2023 at 12:01:54PM -0400, Peter Xu wrote:
> >> > > > On Fri, Aug 04, 2023 at 02:59:07PM +0100, Daniel P. Berrangé wrote:
> >> > > > > On Fri, Aug 04, 2023 at 02:28:05PM +0200, Markus Armbruster wrote:
> >> > > > > > Peter Xu <peterx@redhat.com> writes:
> >> > > > > > 
> >> > > > > > > We used to have three objects that have always the same list of parameters
> >> > > > > > 
> >> > > > > > We have!
> >> > > > > > 
> >> > > > > > > and comments are always duplicated:
> >> > > > > > >
> >> > > > > > >   - @MigrationParameter
> >> > > > > > >   - @MigrationParameters
> >> > > > > > >   - @MigrateSetParameters
> >> > > > > > >
> >> > > > > > > Before we can deduplicate the code, it's fairly straightforward to
> >> > > > > > > deduplicate the comments first, so for each time we add a new migration
> >> > > > > > > parameter we don't need to copy the same paragraphs three times.
> >> > > > > > 
> >> > > > > > De-duplicating the code would be nice, but we haven't done so in years,
> >> > > > > > which suggests it's hard enough not to be worth the trouble.
> >> > > > > 
> >> > > > > The "MigrationParameter" enumeration isn't actually used in
> >> > > > > QMP at all.
> >> > > > > 
> >> > > > > It is only used in HMP for hmp_migrate_set_parameter and
> >> > > > > hmp_info_migrate_parameters. So it is questionable documenting
> >> > > > > that enum in the QMP reference docs at all.
> >> > > > > 
> >> > > > > 1c1
> >> > > > > < { 'struct': 'MigrationParameters',
> >> > > > > ---
> >> > > > > > { 'struct': 'MigrateSetParameters',
> >> > > > > 14,16c14,16
> >> > > > > <             '*tls-creds': 'str',
> >> > > > > <             '*tls-hostname': 'str',
> >> > > > > <             '*tls-authz': 'str',
> >> > > > > ---
> >> > > > > >             '*tls-creds': 'StrOrNull',
> >> > > > > >             '*tls-hostname': 'StrOrNull',
> >> > > > > >             '*tls-authz': 'StrOrNull',
> >> > > > > 
> >> > > > > Is it not valid to use StrOrNull in both cases and thus
> >> > > > > delete the duplication here ?
> >> > > > 
> >> > > > I tested removing MigrateSetParameters by replacing it with
> >> > > > MigrationParameters and it looks all fine here... I manually tested qmp/hmp
> >> > > > on set/query parameters, and qtests are all happy.
> >> > > 
> >> > > I meant the other way around, such we would be using 'StrOrNull'
> >> > > in all scenarios.
> >> > 
> >> > Yes, that should also work and even without worrying on nulls.  I just took
> >> > a random one replacing the other.
> >> > 
> >> > > 
> >> > > > 
> >> > > > The only thing I see that may affect it is we used to logically allow
> >> > > > taking things like '"tls-authz": null' in the json input, but now we won't
> >> > > > allow that because we'll be asking for a string type only.
> >> > > > 
> >> > > > Since we have query-qmp-schema I suppose we're all fine, because logically
> >> > > > the mgmt app (libvirt?) will still query that to understand the protocol,
> >> > > > so now we'll have (response of query-qmp-schema):
> >> > > > 
> >> > > >         {
> >> > > >             "arg-type": "144",
> >> > > >             "meta-type": "command",
> >> > > >             "name": "migrate-set-parameters",
> >> > > >             "ret-type": "0"
> >> > > >         },
> >> > > > 
> >> > > > Where 144 can start to point to MigrationParameters, rather than
> >> > > > MigrateSetParameters.
> >> > > > 
> >> > > > Ok, then what if the mgmt app doesn't care and just used "null" in tls-*
> >> > > > fields when setting?  Funnily I tried it and actually anything that does
> >> > > > migrate-set-parameters with a "null" passed over to tls-* fields will
> >> > > > already crash qemu...
> >> > > > 
> >> > > > ./migration/options.c:1333: migrate_params_apply: Assertion `params->tls_authz->type == QTYPE_QSTRING' failed.
> >> > > > 
> >> > > > #0  0x00007f72f4b2a844 in __pthread_kill_implementation () at /lib64/libc.so.6
> >> > > > #1  0x00007f72f4ad9abe in raise () at /lib64/libc.so.6
> >> > > > #2  0x00007f72f4ac287f in abort () at /lib64/libc.so.6
> >> > > > #3  0x00007f72f4ac279b in _nl_load_domain.cold () at /lib64/libc.so.6
> >> > > > #4  0x00007f72f4ad2147 in  () at /lib64/libc.so.6
> >> > > > #5  0x00005573308740e6 in migrate_params_apply (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1333
> >> > > > #6  0x0000557330874591 in qmp_migrate_set_parameters (params=0x7ffc74fd09d0, errp=0x7ffc74fd0998) at ../migration/options.c:1433
> >> > > > #7  0x0000557330cb9132 in qmp_marshal_migrate_set_parameters (args=0x7f72e00036d0, ret=0x7f72f133cd98, errp=0x7f72f133cd90) at qapi/qapi-commands-migration.c:214
> >> > > > #8  0x0000557330d07fab in do_qmp_dispatch_bh (opaque=0x7f72f133ce30) at ../qapi/qmp-dispatch.c:128
> >> > > > #9  0x0000557330d33bbb in aio_bh_call (bh=0x5573337d7920) at ../util/async.c:169
> >> > > > #10 0x0000557330d33cd8 in aio_bh_poll (ctx=0x55733356e7d0) at ../util/async.c:216
> >> > > > #11 0x0000557330d17a19 in aio_dispatch (ctx=0x55733356e7d0) at ../util/aio-posix.c:423
> >> > > > #12 0x0000557330d34117 in aio_ctx_dispatch (source=0x55733356e7d0, callback=0x0, user_data=0x0) at ../util/async.c:358
> >> > > > #13 0x00007f72f5a8848c in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> >> > > > #14 0x0000557330d358d4 in glib_pollfds_poll () at ../util/main-loop.c:290
> >> > > > #15 0x0000557330d35951 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> >> > > > #16 0x0000557330d35a5f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> >> > > > #17 0x000055733083aee0 in qemu_main_loop () at ../softmmu/runstate.c:732
> >> > > > #18 0x0000557330b0921b in qemu_default_main () at ../softmmu/main.c:37
> >> > > > #19 0x0000557330b09251 in main (argc=35, argv=0x7ffc74fd0ec8) at ../softmmu/main.c:48
> >> > > > 
> >> > > > Then I suppose it means all mgmt apps are not using "null" anyway, and it
> >> > > > makes more sense to me to just remove MigrateSetParameters (by replacing it
> >> > > > with MigrationParameters).
> >> > > 
> >> > > It shouldn't be crashing,  because qmp_migrate_set_parameters()
> >> > > is turning 'null' into  "", which means the assert ought to
> >> > > never fire. Did you have a local modiification that caused
> >> > > this crash perhaps ?
> >> > 
> >> > I think it just got overlooked when introducing tls-authz to not have added
> >> > that special code in qmp_migrate_set_parameters(), the other two are fine.
> >> 
> >> Oh right yes, pre-existing bug.
> >
> > So do we really care about "null" in any form over "" (empty str) here for
> > tls-* parameters?
> 
> In my opinion, the use of "" was a design mistake.  Here's my argument:
> 
> commit 01fa55982692fb51a16049b63b571651a1053989
> Author: Markus Armbruster <armbru@redhat.com>
> Date:   Tue Jul 18 14:42:04 2017 +0200
> 
>     migration: Use JSON null instead of "" to reset parameter to default
>     
>     migrate-set-parameters sets migration parameters according to is
>     arguments like this:
>     
>     * Present means "set the parameter to this value"
>     
>     * Absent means "leave the parameter unchanged"
>     
>     * Except for parameters tls_creds and tls_hostname, "" means "reset
>       the parameter to its default value
>     
>     The first two are perfectly normal: presence of the parameter makes
>     the command do something.
>     
>     The third one overloads the parameter with a second meaning.  The
>     overloading is *implicit*, i.e. it's not visible in the types.  Works
>     here, because "" is neither a valid TLS credentials ID, nor a valid
>     host name.
>     
>     Pressing argument values the schema accepts, but are semantically
>     invalid, into service to mean "reset to default" is not general, as
>     suitable invalid values need not exist.  I also find it ugly.
>     
>     To clean this up, we could add a separate flag argument to ask for
>     "reset to default", or add a distinct value to @tls_creds and
>     @tls_hostname.  This commit implements the latter: add JSON null to
>     the values of @tls_creds and @tls_hostname, deprecate "".
>     
>     Because we're so close to the 2.10 freeze, implement it in the
>     stupidest way possible: have qmp_migrate_set_parameters() rewrite null
>     to "" before anything else can see the null.  The proper way to do it
>     would be rewriting "" to null, but that requires fixing up code to
>     work with null.  Add TODO comments for that.
>     
>     Signed-off-by: Markus Armbruster <armbru@redhat.com>
>     Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
>     Reviewed-by: Eric Blake <eblake@redhat.com>

I see.  Personally I think as long as the interface is 100% clear I'll be
all fine (say, no possible misuse of "").  But keeping StrOrNull may
definitely be cleaner I guess.

> 
> > To fix this tls-authz bug we can add one more QTYPE_QNULL to QTYPE_QSTRING
> > convertion, but I'd rather just use "str" for all tls* fields and remove
> > the other two instead, if "null" is not important to anyone.
> 
> "Important" sounds too much like absolutes :)
> 
> I think we have a tradeoff here.  If perpetuating the unclean and ugly
> use of "" is what it takes to de-triplicate migration parameters, we may
> decide to accept that.

I don't think it's a must.  As Dan raised, we can convert str -> StrOrNull
for MigrationParameters. I assume it won't affect query-migrate-parameters
anyway OTOH.

I assume it means there's nothing yet obvious that we overlooked on the
whole idea.  Let me propose the formal patchset early next week.  It'll be
mostly the patch I attached but just add those extra logics for StrOrNull,
so the diffstat might be less attractive but hopefully still good enough to
be accepted.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-06 15:49                   ` Peter Xu
@ 2023-08-08 20:03                     ` Peter Xu
  2023-08-14 22:24                       ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-08-08 20:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	qemu-devel, Zhiyi Guo, Leonardo Bras Soares Passos,
	Fabiano Rosas, Juan Quintela, Eric Blake, Chensheng Dong

On Sun, Aug 06, 2023 at 11:49:46AM -0400, Peter Xu wrote:
> > I think we have a tradeoff here.  If perpetuating the unclean and ugly
> > use of "" is what it takes to de-triplicate migration parameters, we may
> > decide to accept that.
> 
> I don't think it's a must.  As Dan raised, we can convert str -> StrOrNull
> for MigrationParameters. I assume it won't affect query-migrate-parameters
> anyway OTOH.
> 
> I assume it means there's nothing yet obvious that we overlooked on the
> whole idea.  Let me propose the formal patchset early next week.  It'll be
> mostly the patch I attached but just add those extra logics for StrOrNull,
> so the diffstat might be less attractive but hopefully still good enough to
> be accepted.

The new StrOrNull approach doesn't work with current migration object
properties.. as StrOrNull must be a pointer for @MigrationParameters not
static, and it stops working with offsetof():

../migration/options.c:218:5: error: cannot apply ‘offsetof’ to a non constant address
  218 |     DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds->u.s),
      |     ^~~~~~~~~~~~~~~~~~
../migration/options.c:219:5: error: cannot apply ‘offsetof’ to a non constant address
  219 |     DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname->u.s),
      |     ^~~~~~~~~~~~~~~~~~
../migration/options.c:220:5: error: cannot apply ‘offsetof’ to a non constant address
  220 |     DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz->u.s),
      |     ^~~~~~~~~~~~~~~~~~

Any easy way to fix this?  I.e., is there a way to declare StrOrNull (in
MigrationParameters of qapi/migration.json) to be statically allocated
rather than a pointer (just like default behavior of any uint* types)?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments
  2023-08-08 20:03                     ` Peter Xu
@ 2023-08-14 22:24                       ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-08-14 22:24 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	qemu-devel, Zhiyi Guo, Leonardo Bras Soares Passos,
	Fabiano Rosas, Juan Quintela, Eric Blake, Chensheng Dong

On Tue, Aug 08, 2023 at 04:03:46PM -0400, Peter Xu wrote:
> On Sun, Aug 06, 2023 at 11:49:46AM -0400, Peter Xu wrote:
> > > I think we have a tradeoff here.  If perpetuating the unclean and ugly
> > > use of "" is what it takes to de-triplicate migration parameters, we may
> > > decide to accept that.
> > 
> > I don't think it's a must.  As Dan raised, we can convert str -> StrOrNull
> > for MigrationParameters. I assume it won't affect query-migrate-parameters
> > anyway OTOH.
> > 
> > I assume it means there's nothing yet obvious that we overlooked on the
> > whole idea.  Let me propose the formal patchset early next week.  It'll be
> > mostly the patch I attached but just add those extra logics for StrOrNull,
> > so the diffstat might be less attractive but hopefully still good enough to
> > be accepted.
> 
> The new StrOrNull approach doesn't work with current migration object
> properties.. as StrOrNull must be a pointer for @MigrationParameters not
> static, and it stops working with offsetof():
> 
> ../migration/options.c:218:5: error: cannot apply ‘offsetof’ to a non constant address
>   218 |     DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds->u.s),
>       |     ^~~~~~~~~~~~~~~~~~
> ../migration/options.c:219:5: error: cannot apply ‘offsetof’ to a non constant address
>   219 |     DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname->u.s),
>       |     ^~~~~~~~~~~~~~~~~~
> ../migration/options.c:220:5: error: cannot apply ‘offsetof’ to a non constant address
>   220 |     DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz->u.s),
>       |     ^~~~~~~~~~~~~~~~~~
> 
> Any easy way to fix this?  I.e., is there a way to declare StrOrNull (in
> MigrationParameters of qapi/migration.json) to be statically allocated
> rather than a pointer (just like default behavior of any uint* types)?

Posted a version with 'str' replacing 'StrOrNull'.  Let's move the
discussion there:

https://lore.kernel.org/r/20230814221947.353093-1-peterx@redhat.com

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
@ 2023-08-31 18:14   ` Joao Martins
  2023-08-31 18:34     ` Peter Xu
  2023-09-01  6:55   ` Wang, Lei
  2023-09-01 17:59   ` Joao Martins
  2 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-08-31 18:14 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

On 03/08/2023 16:53, Peter Xu wrote:
> @@ -2719,7 +2729,8 @@ static void migration_update_counters(MigrationState *s,
>      update_iteration_initial_status(s);
>  
>      trace_migrate_transferred(transferred, time_spent,
> -                              bandwidth, s->threshold_size);
> +                              bandwidth, migrate_max_switchover_bandwidth(),
> +                              s->threshold_size);
>  }

(...)

> diff --git a/migration/trace-events b/migration/trace-events
> index 4666f19325..1296b8db5b 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -185,7 +185,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
>  source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  source_return_path_thread_switchover_acked(void) ""
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> -migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
> +migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64

Given your previous snippet, perhaps you meant to introduce
'max_switchover_bandwidth' arg, unless of course you meant in the callpath of
the tracepoint to instead use @avail_bw as the variable to use?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-08-31 18:14   ` Joao Martins
@ 2023-08-31 18:34     ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-08-31 18:34 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

On Thu, Aug 31, 2023 at 07:14:47PM +0100, Joao Martins wrote:
> On 03/08/2023 16:53, Peter Xu wrote:
> > @@ -2719,7 +2729,8 @@ static void migration_update_counters(MigrationState *s,
> >      update_iteration_initial_status(s);
> >  
> >      trace_migrate_transferred(transferred, time_spent,
> > -                              bandwidth, s->threshold_size);
> > +                              bandwidth, migrate_max_switchover_bandwidth(),
> > +                              s->threshold_size);
> >  }
> 
> (...)
> 
> > diff --git a/migration/trace-events b/migration/trace-events
> > index 4666f19325..1296b8db5b 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -185,7 +185,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
> >  source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> >  source_return_path_thread_switchover_acked(void) ""
> >  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> > -migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
> > +migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64
> 
> Given your previous snippet, perhaps you meant to introduce
> 'max_switchover_bandwidth' arg, unless of course you meant in the callpath of
> the tracepoint to instead use @avail_bw as the variable to use?

Yeah, got it overlooked...  I'll fix that when repost, thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
  2023-08-31 18:14   ` Joao Martins
@ 2023-09-01  6:55   ` Wang, Lei
  2023-09-01  8:37     ` Daniel P. Berrangé
  2023-09-01 17:59   ` Joao Martins
  2 siblings, 1 reply; 25+ messages in thread
From: Wang, Lei @ 2023-09-01  6:55 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

On 8/3/2023 23:53, Peter Xu wrote:
> Migration bandwidth is a very important value to live migration.  It's
> because it's one of the major factors that we'll make decision on when to
> switchover to destination in a precopy process.
> 
> This value is currently estimated by QEMU during the whole live migration
> process by monitoring how fast we were sending the data.  This can be the
> most accurate bandwidth if in the ideal world, where we're always feeding
> unlimited data to the migration channel, and then it'll be limited to the
> bandwidth that is available.
> 
> However in reality it may be very different, e.g., over a 10Gbps network we
> can see query-migrate showing migration bandwidth of only a few tens of
> MB/s just because there are plenty of other things the migration thread
> might be doing.  For example, the migration thread can be busy scanning
> zero pages, or it can be fetching dirty bitmap from other external dirty
> sources (like vhost or KVM).  It means we may not be pushing data as much
> as possible to migration channel, so the bandwidth estimated from "how many
> data we sent in the channel" can be dramatically inaccurate sometimes,
> e.g., that a few tens of MB/s even if 10Gbps available, and then the
> decision to switchover will be further affected by this.
> 
> The migration may not even converge at all with the downtime specified,
> with that wrong estimation of bandwidth.
> 
> The issue is QEMU itself may not be able to avoid those uncertainties on
> measuing the real "available migration bandwidth".  At least not something
> I can think of so far.
> 
> One way to fix this is when the user is fully aware of the available
> bandwidth, then we can allow the user to help providing an accurate value.
> 
> For example, if the user has a dedicated channel of 10Gbps for migration
> for this specific VM, the user can specify this bandwidth so QEMU can
> always do the calculation based on this fact, trusting the user as long as
> specified.
> 
> A new parameter "max-switchover-bandwidth" is introduced just for this. So
> when the user specified this parameter, instead of trusting the estimated
> value from QEMU itself (based on the QEMUFile send speed), let's trust the
> user more by using this value to decide when to switchover, assuming that
> we'll have such bandwidth available then.
> 
> When the user wants to have migration only use 5Gbps out of that 10Gbps,
> one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
> 5Gbps so it'll never use over 5Gbps too (so the user can have the rest

Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
10Gbps network, in the completion stage will it send the remaining data in 5Gbps
using downtime_limit time or in 10Gbps (saturate the network) using the
downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)

> 5Gbps for other things).  So it can be useful even if the network is not
> dedicated, but as long as the user can know a solid value.
> 
> This can resolve issues like "unconvergence migration" which is caused by
> hilarious low "migration bandwidth" detected for whatever reason.
> 
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  qapi/migration.json            | 14 +++++++++++++-
>  migration/migration.h          |  2 +-
>  migration/options.h            |  1 +
>  migration/migration-hmp-cmds.c | 14 ++++++++++++++
>  migration/migration.c          | 19 +++++++++++++++----
>  migration/options.c            | 28 ++++++++++++++++++++++++++++
>  migration/trace-events         |  2 +-
>  7 files changed, 73 insertions(+), 7 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bb798f87a5..6a04fb7d36 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -759,6 +759,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  #     in bytes per second.  (Since 2.8)
>  #
> +# @max-switchover-bandwidth: to set available bandwidth for migration.
> +#     By default, this value is zero, means the user is not aware of
> +#     the available bandwidth that can be used by QEMU migration, so
> +#     QEMU will estimate the bandwidth automatically.  This can be set
> +#     when the estimated value is not accurate, while the user is able
> +#     to guarantee such bandwidth is available for migration purpose
> +#     during the migration procedure.  When specified correctly, this
> +#     can make the switchover decision much more accurate, which will
> +#     also be based on the max downtime specified.  (Since 8.2)
> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  #     maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -840,7 +850,7 @@
>             'cpu-throttle-initial', 'cpu-throttle-increment',
>             'cpu-throttle-tailslow',
>             'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
> -           'downtime-limit',
> +           'max-switchover-bandwidth', 'downtime-limit',
>             { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
>             'block-incremental',
>             'multifd-channels',
> @@ -885,6 +895,7 @@
>              '*tls-hostname': 'StrOrNull',
>              '*tls-authz': 'StrOrNull',
>              '*max-bandwidth': 'size',
> +            '*max-switchover-bandwidth': 'size',
>              '*downtime-limit': 'uint64',
>              '*x-checkpoint-delay': { 'type': 'uint32',
>                                       'features': [ 'unstable' ] },
> @@ -949,6 +960,7 @@
>              '*tls-hostname': 'str',
>              '*tls-authz': 'str',
>              '*max-bandwidth': 'size',
> +            '*max-switchover-bandwidth': 'size',
>              '*downtime-limit': 'uint64',
>              '*x-checkpoint-delay': { 'type': 'uint32',
>                                       'features': [ 'unstable' ] },
> diff --git a/migration/migration.h b/migration/migration.h
> index 6eea18db36..f18cee27f7 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -283,7 +283,7 @@ struct MigrationState {
>      /*
>       * The final stage happens when the remaining data is smaller than
>       * this threshold; it's calculated from the requested downtime and
> -     * measured bandwidth
> +     * measured bandwidth, or max-switchover-bandwidth if specified.
>       */
>      int64_t threshold_size;
>  
> diff --git a/migration/options.h b/migration/options.h
> index 045e2a41a2..a510ca94c9 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -80,6 +80,7 @@ int migrate_decompress_threads(void);
>  uint64_t migrate_downtime_limit(void);
>  uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
> +uint64_t migrate_max_switchover_bandwidth(void);
>  uint64_t migrate_max_postcopy_bandwidth(void);
>  int migrate_multifd_channels(void);
>  MultiFDCompression migrate_multifd_compression(void);
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index c115ef2d23..d7572d4c0a 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -321,6 +321,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH),
>              params->max_bandwidth);
> +        assert(params->has_max_switchover_bandwidth);
> +        monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
> +            MigrationParameter_str(MIGRATION_PARAMETER_MAX_SWITCHOVER_BANDWIDTH),
> +            params->max_switchover_bandwidth);
>          assert(params->has_downtime_limit);
>          monitor_printf(mon, "%s: %" PRIu64 " ms\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT),
> @@ -574,6 +578,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>          }
>          p->max_bandwidth = valuebw;
>          break;
> +    case MIGRATION_PARAMETER_MAX_SWITCHOVER_BANDWIDTH:
> +        p->has_max_switchover_bandwidth = true;
> +        ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw);
> +        if (ret < 0 || valuebw > INT64_MAX
> +            || (size_t)valuebw != valuebw) {
> +            error_setg(&err, "Invalid size %s", valuestr);
> +            break;
> +        }
> +        p->max_switchover_bandwidth = valuebw;
> +        break;
>      case MIGRATION_PARAMETER_DOWNTIME_LIMIT:
>          p->has_downtime_limit = true;
>          visit_type_size(v, param, &p->downtime_limit, &err);
> diff --git a/migration/migration.c b/migration/migration.c
> index 5528acb65e..8493e3ca49 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2684,7 +2684,7 @@ static void migration_update_counters(MigrationState *s,
>  {
>      uint64_t transferred, transferred_pages, time_spent;
>      uint64_t current_bytes; /* bytes transferred since the beginning */
> -    double bandwidth;
> +    double bandwidth, avail_bw;
>  
>      if (current_time < s->iteration_start_time + BUFFER_DELAY) {
>          return;
> @@ -2694,7 +2694,17 @@ static void migration_update_counters(MigrationState *s,
>      transferred = current_bytes - s->iteration_initial_bytes;
>      time_spent = current_time - s->iteration_start_time;
>      bandwidth = (double)transferred / time_spent;
> -    s->threshold_size = bandwidth * migrate_downtime_limit();
> +    if (migrate_max_switchover_bandwidth()) {
> +        /*
> +         * If the user specified an available bandwidth, let's trust the
> +         * user so that can be more accurate than what we estimated.
> +         */
> +        avail_bw = migrate_max_switchover_bandwidth();
> +    } else {
> +        /* If the user doesn't specify bandwidth, we use the estimated */
> +        avail_bw = bandwidth;
> +    }
> +    s->threshold_size = avail_bw * migrate_downtime_limit();
>  
>      s->mbps = (((double) transferred * 8.0) /
>                 ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
> @@ -2711,7 +2721,7 @@ static void migration_update_counters(MigrationState *s,
>      if (stat64_get(&mig_stats.dirty_pages_rate) &&
>          transferred > 10000) {
>          s->expected_downtime =
> -            stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth;
> +            stat64_get(&mig_stats.dirty_bytes_last_sync) / avail_bw;
>      }
>  
>      migration_rate_reset(s->to_dst_file);
> @@ -2719,7 +2729,8 @@ static void migration_update_counters(MigrationState *s,
>      update_iteration_initial_status(s);
>  
>      trace_migrate_transferred(transferred, time_spent,
> -                              bandwidth, s->threshold_size);
> +                              bandwidth, migrate_max_switchover_bandwidth(),
> +                              s->threshold_size);
>  }
>  
>  static bool migration_can_switchover(MigrationState *s)
> diff --git a/migration/options.c b/migration/options.c
> index 1d1e1321b0..19d87ab812 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -125,6 +125,8 @@ Property migration_properties[] = {
>                        parameters.cpu_throttle_tailslow, false),
>      DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
>                        parameters.max_bandwidth, MAX_THROTTLE),
> +    DEFINE_PROP_SIZE("max-switchover-bandwidth", MigrationState,
> +                      parameters.max_switchover_bandwidth, 0),
>      DEFINE_PROP_UINT64("x-downtime-limit", MigrationState,
>                        parameters.downtime_limit,
>                        DEFAULT_MIGRATE_SET_DOWNTIME),
> @@ -780,6 +782,13 @@ uint64_t migrate_max_bandwidth(void)
>      return s->parameters.max_bandwidth;
>  }
>  
> +uint64_t migrate_max_switchover_bandwidth(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->parameters.max_switchover_bandwidth;
> +}
> +
>  uint64_t migrate_max_postcopy_bandwidth(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -917,6 +926,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>                                   s->parameters.tls_authz : "");
>      params->has_max_bandwidth = true;
>      params->max_bandwidth = s->parameters.max_bandwidth;
> +    params->has_max_switchover_bandwidth = true;
> +    params->max_switchover_bandwidth = s->parameters.max_switchover_bandwidth;
>      params->has_downtime_limit = true;
>      params->downtime_limit = s->parameters.downtime_limit;
>      params->has_x_checkpoint_delay = true;
> @@ -1056,6 +1067,15 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
>          return false;
>      }
>  
> +    if (params->has_max_switchover_bandwidth &&
> +        (params->max_switchover_bandwidth > SIZE_MAX)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                   "max_switchover_bandwidth",
> +                   "an integer in the range of 0 to "stringify(SIZE_MAX)
> +                   " bytes/second");
> +        return false;
> +    }
> +
>      if (params->has_downtime_limit &&
>          (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) {
>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> @@ -1225,6 +1245,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>          dest->max_bandwidth = params->max_bandwidth;
>      }
>  
> +    if (params->has_max_switchover_bandwidth) {
> +        dest->max_switchover_bandwidth = params->max_switchover_bandwidth;
> +    }
> +
>      if (params->has_downtime_limit) {
>          dest->downtime_limit = params->downtime_limit;
>      }
> @@ -1341,6 +1365,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>          }
>      }
>  
> +    if (params->has_max_switchover_bandwidth) {
> +        s->parameters.max_switchover_bandwidth = params->max_switchover_bandwidth;
> +    }
> +
>      if (params->has_downtime_limit) {
>          s->parameters.downtime_limit = params->downtime_limit;
>      }
> diff --git a/migration/trace-events b/migration/trace-events
> index 4666f19325..1296b8db5b 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -185,7 +185,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
>  source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  source_return_path_thread_switchover_acked(void) ""
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> -migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
> +migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64
>  process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  postcopy_preempt_enabled(bool value) "%d"


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-01  6:55   ` Wang, Lei
@ 2023-09-01  8:37     ` Daniel P. Berrangé
  2023-09-01 14:40       ` Peter Xu
  2023-09-05 16:46       ` Peter Xu
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-09-01  8:37 UTC (permalink / raw)
  To: Wang, Lei
  Cc: Peter Xu, qemu-devel, Zhiyi Guo, Markus Armbruster,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Sep 01, 2023 at 02:55:08PM +0800, Wang, Lei wrote:
> On 8/3/2023 23:53, Peter Xu wrote:
> > Migration bandwidth is a very important value to live migration.  It's
> > because it's one of the major factors that we'll make decision on when to
> > switchover to destination in a precopy process.
> > 
> > This value is currently estimated by QEMU during the whole live migration
> > process by monitoring how fast we were sending the data.  This can be the
> > most accurate bandwidth if in the ideal world, where we're always feeding
> > unlimited data to the migration channel, and then it'll be limited to the
> > bandwidth that is available.
> > 
> > However in reality it may be very different, e.g., over a 10Gbps network we
> > can see query-migrate showing migration bandwidth of only a few tens of
> > MB/s just because there are plenty of other things the migration thread
> > might be doing.  For example, the migration thread can be busy scanning
> > zero pages, or it can be fetching dirty bitmap from other external dirty
> > sources (like vhost or KVM).  It means we may not be pushing data as much
> > as possible to migration channel, so the bandwidth estimated from "how many
> > data we sent in the channel" can be dramatically inaccurate sometimes,
> > e.g., that a few tens of MB/s even if 10Gbps available, and then the
> > decision to switchover will be further affected by this.
> > 
> > The migration may not even converge at all with the downtime specified,
> > with that wrong estimation of bandwidth.
> > 
> > The issue is QEMU itself may not be able to avoid those uncertainties on
> > measuing the real "available migration bandwidth".  At least not something
> > I can think of so far.
> > 
> > One way to fix this is when the user is fully aware of the available
> > bandwidth, then we can allow the user to help providing an accurate value.
> > 
> > For example, if the user has a dedicated channel of 10Gbps for migration
> > for this specific VM, the user can specify this bandwidth so QEMU can
> > always do the calculation based on this fact, trusting the user as long as
> > specified.
> > 
> > A new parameter "max-switchover-bandwidth" is introduced just for this. So
> > when the user specified this parameter, instead of trusting the estimated
> > value from QEMU itself (based on the QEMUFile send speed), let's trust the
> > user more by using this value to decide when to switchover, assuming that
> > we'll have such bandwidth available then.
> > 
> > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
> > 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
> 
> Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
> 10Gbps network, in the completion stage will it send the remaining data in 5Gbps
> using downtime_limit time or in 10Gbps (saturate the network) using the
> downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)

Effectively the mgmt app is telling QEMU to assume that this
much bandwidth is available for use during switchover. If QEMU
determines that, given this available bandwidth, the remaining
data can be sent over the link within the downtime limit, it
will perform the switchover. When sending this sitchover data,
it will actually transmit the data at full line rate IIUC.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-01  8:37     ` Daniel P. Berrangé
@ 2023-09-01 14:40       ` Peter Xu
  2023-09-05 16:46       ` Peter Xu
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-09-01 14:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Wang, Lei, qemu-devel, Zhiyi Guo, Markus Armbruster,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong

On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
> > Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
> > 10Gbps network, in the completion stage will it send the remaining data in 5Gbps
> > using downtime_limit time or in 10Gbps (saturate the network) using the
> > downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)
> 
> Effectively the mgmt app is telling QEMU to assume that this
> much bandwidth is available for use during switchover. If QEMU
> determines that, given this available bandwidth, the remaining
> data can be sent over the link within the downtime limit, it
> will perform the switchover. When sending this sitchover data,
> it will actually transmit the data at full line rate IIUC.

Right, currently it's only a way for QEMU to do more accurate calculations
on the switchover decision, while we always use full speed to transfer
during switchover.

The old name "available-bandwidth" might reflect more on that side (telling
qemu the available bandwidth QEMU can use only), but it might be unclear on
when the value will be used (only during making decisions for switchover).
So it seems there's no ideal name for it.

To be explicit, see migration_completion() has a call there with:

        migration_rate_set(RATE_LIMIT_DISABLED);

And this patch won't change that behavior (to use full line speed).

Interestingly this question let me also notice that when switchover for
postcopy we did it slightly different.  I believe postcopy also use line
speed because we put mostly everything needed in the package, and flushed
in qemu_savevm_send_packaged() with line speed too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
  2023-08-31 18:14   ` Joao Martins
  2023-09-01  6:55   ` Wang, Lei
@ 2023-09-01 17:59   ` Joao Martins
  2023-09-01 18:39     ` Joao Martins
  2 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-09-01 17:59 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

On 03/08/2023 16:53, Peter Xu wrote:
> @@ -2694,7 +2694,17 @@ static void migration_update_counters(MigrationState *s,
>      transferred = current_bytes - s->iteration_initial_bytes;
>      time_spent = current_time - s->iteration_start_time;
>      bandwidth = (double)transferred / time_spent;
> -    s->threshold_size = bandwidth * migrate_downtime_limit();
> +    if (migrate_max_switchover_bandwidth()) {
> +        /*
> +         * If the user specified an available bandwidth, let's trust the
> +         * user so that can be more accurate than what we estimated.
> +         */
> +        avail_bw = migrate_max_switchover_bandwidth();
> +    } else {
> +        /* If the user doesn't specify bandwidth, we use the estimated */
> +        avail_bw = bandwidth;
> +    }
> +    s->threshold_size = avail_bw * migrate_downtime_limit();
>  

[ sorry for giving review comments in piecemeal :/ ]

There might be something odd with the calculation. It would be right if
downtime_limit was in seconds. But we are multipling a value that is in
bytes/sec with a time unit that is in miliseconds. When avail_bw is set to
switchover_bandwidth, it sounds to me this should be a:

	/* bytes/msec; @max-switchover-bandwidth is per-seconds */
	avail_bw = migrate_max_switchover_bandwidth() / 1000.0;

Otherwise it looks like that we end up overestimating how much we can still send
during switchover? If this is correct and I am not missing some assumption, then
same is applicable to the threshold_size calculation in general without
switchover-bandwidth but likely in a different way:

	/* bytes/msec; but @bandwidth is calculated in 100msec quantas */
	avail_bw = bandwidth / 100.0;

There's a very good chance I'm missing details, so apologies beforehand on
wasting your time if I didn't pick up on it through the code.

	Joao


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-01 17:59   ` Joao Martins
@ 2023-09-01 18:39     ` Joao Martins
  2023-09-05 15:31       ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-09-01 18:39 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong



On 01/09/2023 18:59, Joao Martins wrote:
> On 03/08/2023 16:53, Peter Xu wrote:
>> @@ -2694,7 +2694,17 @@ static void migration_update_counters(MigrationState *s,
>>      transferred = current_bytes - s->iteration_initial_bytes;
>>      time_spent = current_time - s->iteration_start_time;
>>      bandwidth = (double)transferred / time_spent;
>> -    s->threshold_size = bandwidth * migrate_downtime_limit();
>> +    if (migrate_max_switchover_bandwidth()) {
>> +        /*
>> +         * If the user specified an available bandwidth, let's trust the
>> +         * user so that can be more accurate than what we estimated.
>> +         */
>> +        avail_bw = migrate_max_switchover_bandwidth();
>> +    } else {
>> +        /* If the user doesn't specify bandwidth, we use the estimated */
>> +        avail_bw = bandwidth;
>> +    }
>> +    s->threshold_size = avail_bw * migrate_downtime_limit();
>>  
> 
> [ sorry for giving review comments in piecemeal :/ ]
> 
> There might be something odd with the calculation. It would be right if
> downtime_limit was in seconds. But we are multipling a value that is in
> bytes/sec with a time unit that is in miliseconds. When avail_bw is set to
> switchover_bandwidth, it sounds to me this should be a:
> 
> 	/* bytes/msec; @max-switchover-bandwidth is per-seconds */
> 	avail_bw = migrate_max_switchover_bandwidth() / 1000.0;
> 
> Otherwise it looks like that we end up overestimating how much we can still send
> during switchover? If this is correct and I am not missing some assumption, 

(...)

> then
> same is applicable to the threshold_size calculation in general without
> switchover-bandwidth but likely in a different way:
> 
> 	/* bytes/msec; but @bandwidth is calculated in 100msec quantas */
> 	avail_bw = bandwidth / 100.0;
> 

Nevermind this part. I was wrong in the @bandwidth adjustment as it is already
calculated in bytes/ms. It's max_switchover_bandwidth that needs an adjustment
it seems.

> There's a very good chance I'm missing details, so apologies beforehand on
> wasting your time if I didn't pick up on it through the code.
> 
> 	Joao
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-01 18:39     ` Joao Martins
@ 2023-09-05 15:31       ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-09-05 15:31 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Zhiyi Guo, Daniel P . Berrangé,
	Markus Armbruster, Leonardo Bras Soares Passos, Fabiano Rosas,
	Juan Quintela, Eric Blake, Chensheng Dong

On Fri, Sep 01, 2023 at 07:39:07PM +0100, Joao Martins wrote:
> 
> 
> On 01/09/2023 18:59, Joao Martins wrote:
> > On 03/08/2023 16:53, Peter Xu wrote:
> >> @@ -2694,7 +2694,17 @@ static void migration_update_counters(MigrationState *s,
> >>      transferred = current_bytes - s->iteration_initial_bytes;
> >>      time_spent = current_time - s->iteration_start_time;
> >>      bandwidth = (double)transferred / time_spent;
> >> -    s->threshold_size = bandwidth * migrate_downtime_limit();
> >> +    if (migrate_max_switchover_bandwidth()) {
> >> +        /*
> >> +         * If the user specified an available bandwidth, let's trust the
> >> +         * user so that can be more accurate than what we estimated.
> >> +         */
> >> +        avail_bw = migrate_max_switchover_bandwidth();
> >> +    } else {
> >> +        /* If the user doesn't specify bandwidth, we use the estimated */
> >> +        avail_bw = bandwidth;
> >> +    }
> >> +    s->threshold_size = avail_bw * migrate_downtime_limit();
> >>  
> > 
> > [ sorry for giving review comments in piecemeal :/ ]

This is never a problem.

> > 
> > There might be something odd with the calculation. It would be right if
> > downtime_limit was in seconds. But we are multipling a value that is in
> > bytes/sec with a time unit that is in miliseconds. When avail_bw is set to
> > switchover_bandwidth, it sounds to me this should be a:
> > 
> > 	/* bytes/msec; @max-switchover-bandwidth is per-seconds */
> > 	avail_bw = migrate_max_switchover_bandwidth() / 1000.0;
> > 
> > Otherwise it looks like that we end up overestimating how much we can still send
> > during switchover? If this is correct and I am not missing some assumption, 
> 
> (...)
> 
> > then
> > same is applicable to the threshold_size calculation in general without
> > switchover-bandwidth but likely in a different way:
> > 
> > 	/* bytes/msec; but @bandwidth is calculated in 100msec quantas */
> > 	avail_bw = bandwidth / 100.0;
> > 
> 
> Nevermind this part. I was wrong in the @bandwidth adjustment as it is already
> calculated in bytes/ms. It's max_switchover_bandwidth that needs an adjustment
> it seems.
> 
> > There's a very good chance I'm missing details, so apologies beforehand on
> > wasting your time if I didn't pick up on it through the code.

My fault, thanks for catching this.  So it seems even if the test will
switchover with this patch, it might be too aggresive if we calculate with
a number 1000x larger than the real bandwidth provided..

I'll rename this to expected_bw_per_ms to be clear when repost, too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-01  8:37     ` Daniel P. Berrangé
  2023-09-01 14:40       ` Peter Xu
@ 2023-09-05 16:46       ` Peter Xu
  2023-09-05 17:39         ` Daniel P. Berrangé
  2023-09-06  2:27         ` Wang, Lei
  1 sibling, 2 replies; 25+ messages in thread
From: Peter Xu @ 2023-09-05 16:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Wang, Lei, qemu-devel, Zhiyi Guo, Markus Armbruster,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong, Joao Martins

On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
> > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
> > > 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
> > 
> > Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
> > 10Gbps network, in the completion stage will it send the remaining data in 5Gbps
> > using downtime_limit time or in 10Gbps (saturate the network) using the
> > downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)
> 
> Effectively the mgmt app is telling QEMU to assume that this
> much bandwidth is available for use during switchover. If QEMU
> determines that, given this available bandwidth, the remaining
> data can be sent over the link within the downtime limit, it
> will perform the switchover. When sending this sitchover data,
> it will actually transmit the data at full line rate IIUC.

I'm right at reposting this patch, but then I found that the
max-available-bandwidth is indeed confusing (as Lei's question shows).

We do have all the bandwidth throttling values in the pattern of
max-*-bandwidth and this one will start to be the outlier that it won't
really throttle the network.

If the old name "available-bandwidth" is too general, I'm now considering
"avail-switchover-bandwidth" just to leave max- out of the name to
differenciate, if some day we want to add a real throttle for switchover we
can still have a sane name.

Any objections before I repost?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-05 16:46       ` Peter Xu
@ 2023-09-05 17:39         ` Daniel P. Berrangé
  2023-09-06  2:27         ` Wang, Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-09-05 17:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wang, Lei, qemu-devel, Zhiyi Guo, Markus Armbruster,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong, Joao Martins

On Tue, Sep 05, 2023 at 12:46:03PM -0400, Peter Xu wrote:
> On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
> > > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > > one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
> > > > 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
> > > 
> > > Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
> > > 10Gbps network, in the completion stage will it send the remaining data in 5Gbps
> > > using downtime_limit time or in 10Gbps (saturate the network) using the
> > > downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)
> > 
> > Effectively the mgmt app is telling QEMU to assume that this
> > much bandwidth is available for use during switchover. If QEMU
> > determines that, given this available bandwidth, the remaining
> > data can be sent over the link within the downtime limit, it
> > will perform the switchover. When sending this sitchover data,
> > it will actually transmit the data at full line rate IIUC.
> 
> I'm right at reposting this patch, but then I found that the
> max-available-bandwidth is indeed confusing (as Lei's question shows).
> 
> We do have all the bandwidth throttling values in the pattern of
> max-*-bandwidth and this one will start to be the outlier that it won't
> really throttle the network.
> 
> If the old name "available-bandwidth" is too general, I'm now considering
> "avail-switchover-bandwidth" just to leave max- out of the name to
> differenciate, if some day we want to add a real throttle for switchover we
> can still have a sane name.
> 
> Any objections before I repost?

I think the 'avail-' prefix is good given the confusion Lei pointed out.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth
  2023-09-05 16:46       ` Peter Xu
  2023-09-05 17:39         ` Daniel P. Berrangé
@ 2023-09-06  2:27         ` Wang, Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Wang, Lei @ 2023-09-06  2:27 UTC (permalink / raw)
  To: Peter Xu, Daniel P. Berrangé
  Cc: qemu-devel, Zhiyi Guo, Markus Armbruster,
	Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Eric Blake, Chensheng Dong, Joao Martins

On 9/6/2023 0:46, Peter Xu wrote:
> On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
>>>> When the user wants to have migration only use 5Gbps out of that 10Gbps,
>>>> one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
>>>> 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
>>>
>>> Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
>>> 10Gbps network, in the completion stage will it send the remaining data in 5Gbps
>>> using downtime_limit time or in 10Gbps (saturate the network) using the
>>> downtime_limit / 2 time? Seems this parameter won't rate limit the final stage:)
>>
>> Effectively the mgmt app is telling QEMU to assume that this
>> much bandwidth is available for use during switchover. If QEMU
>> determines that, given this available bandwidth, the remaining
>> data can be sent over the link within the downtime limit, it
>> will perform the switchover. When sending this sitchover data,
>> it will actually transmit the data at full line rate IIUC.
> 
> I'm right at reposting this patch, but then I found that the
> max-available-bandwidth is indeed confusing (as Lei's question shows).
> 
> We do have all the bandwidth throttling values in the pattern of
> max-*-bandwidth and this one will start to be the outlier that it won't
> really throttle the network.
> 
> If the old name "available-bandwidth" is too general, I'm now considering
> "avail-switchover-bandwidth" just to leave max- out of the name to
> differenciate, if some day we want to add a real throttle for switchover we
> can still have a sane name.
> 
> Any objections before I repost?

I'm also OK with it. "avail" has semantics that we have a lower bound of the
bandwidth when switchover so we can promise at least those amount of bandwidth
can be used, so it can cover both the throttling and non-throuttling case.
"switchover" means this parameter only works in the switchover phase rather than
the bulk stage.

> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-09-06  2:28 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03 15:53 [PATCH for-8.2 v2 0/2] migration: Add max-switchover-bandwidth parameter Peter Xu
2023-08-03 15:53 ` [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments Peter Xu
2023-08-04 12:28   ` Markus Armbruster
2023-08-04 13:59     ` Daniel P. Berrangé
2023-08-04 16:01       ` Peter Xu
2023-08-04 16:29         ` Daniel P. Berrangé
2023-08-04 16:46           ` Peter Xu
2023-08-04 16:48             ` Daniel P. Berrangé
2023-08-04 21:02               ` Peter Xu
2023-08-05  8:12                 ` Markus Armbruster
2023-08-06 15:49                   ` Peter Xu
2023-08-08 20:03                     ` Peter Xu
2023-08-14 22:24                       ` Peter Xu
2023-08-03 15:53 ` [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth Peter Xu
2023-08-31 18:14   ` Joao Martins
2023-08-31 18:34     ` Peter Xu
2023-09-01  6:55   ` Wang, Lei
2023-09-01  8:37     ` Daniel P. Berrangé
2023-09-01 14:40       ` Peter Xu
2023-09-05 16:46       ` Peter Xu
2023-09-05 17:39         ` Daniel P. Berrangé
2023-09-06  2:27         ` Wang, Lei
2023-09-01 17:59   ` Joao Martins
2023-09-01 18:39     ` Joao Martins
2023-09-05 15:31       ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.