All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>
To: qemu-devel@nongnu.org, leobras@redhat.com, quintela@redhat.com,
	berrange@redhat.com, peterx@redhat.com, iii@linux.ibm.com,
	huangy81@chinatelecom.cn
Subject: [PULL 19/30] migration: Respect postcopy request order in preemption mode
Date: Wed, 20 Jul 2022 12:19:15 +0100	[thread overview]
Message-ID: <20220720111926.107055-20-dgilbert@redhat.com> (raw)
In-Reply-To: <20220720111926.107055-1-dgilbert@redhat.com>

From: Peter Xu <peterx@redhat.com>

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra <manish.mishra@nutanix.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185520.27583-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 65 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7cbe9c310d..4fbad74c6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -442,8 +442,28 @@ struct PageSearchStatus {
     unsigned long page;
     /* Set once we wrap around */
     bool         complete_round;
-    /* Whether current page is explicitly requested by postcopy */
+    /*
+     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+     * postcopy.  When set, the request is "urgent" because the dest QEMU
+     * threads are waiting for us.
+     */
     bool         postcopy_requested;
+    /*
+     * [POSTCOPY-ONLY] The target channel to use to send current page.
+     *
+     * Note: This may _not_ match with the value in postcopy_requested
+     * above. Let's imagine the case where the postcopy request is exactly
+     * the page that we're sending in progress during precopy. In this case
+     * we'll have postcopy_requested set to true but the target channel
+     * will be the precopy channel (so that we don't split brain on that
+     * specific page since the precopy channel already contains partial of
+     * that page data).
+     *
+     * Besides that specific use case, postcopy_target_channel should
+     * always be equal to postcopy_requested, because by default we send
+     * postcopy pages via postcopy preempt channel.
+     */
+    bool         postcopy_target_channel;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -495,6 +515,9 @@ static QemuCond decomp_done_cond;
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
 
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+                                     bool postcopy_requested);
+
 static void *do_data_compress(void *opaque)
 {
     CompressParam *param = opaque;
@@ -1516,8 +1539,12 @@ retry:
  */
 static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
 {
-    /* This is not a postcopy requested page */
+    /*
+     * This is not a postcopy requested page, mark it "not urgent", and use
+     * precopy channel to send it.
+     */
     pss->postcopy_requested = false;
+    pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
 
     pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
     if (pss->complete_round && pss->block == rs->last_seen_block &&
@@ -2038,15 +2065,20 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
     RAMBlock  *block;
     ram_addr_t offset;
 
-again:
     block = unqueue_page(rs, &offset);
 
     if (block) {
         /* See comment above postcopy_preempted_contains() */
         if (postcopy_preempted_contains(rs, block, offset)) {
             trace_postcopy_preempt_hit(block->idstr, offset);
-            /* This request is dropped */
-            goto again;
+            /*
+             * If what we preempted previously was exactly what we're
+             * requesting right now, restore the preempted precopy
+             * immediately, boosting its priority as it's requested by
+             * postcopy.
+             */
+            postcopy_preempt_restore(rs, pss, true);
+            return true;
         }
     } else {
         /*
@@ -2070,7 +2102,9 @@ again:
          * really rare.
          */
         pss->complete_round = false;
+        /* Mark it an urgent request, meanwhile using POSTCOPY channel */
         pss->postcopy_requested = true;
+        pss->postcopy_target_channel = RAM_CHANNEL_POSTCOPY;
     }
 
     return !!block;
@@ -2324,7 +2358,8 @@ static bool postcopy_preempt_triggered(RAMState *rs)
     return rs->postcopy_preempt_state.preempted;
 }
 
-static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss)
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+                                     bool postcopy_requested)
 {
     PostcopyPreemptState *state = &rs->postcopy_preempt_state;
 
@@ -2332,8 +2367,15 @@ static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss)
 
     pss->block = state->ram_block;
     pss->page = state->ram_page;
-    /* This is not a postcopy request but restoring previous precopy */
-    pss->postcopy_requested = false;
+
+    /* Whether this is a postcopy request? */
+    pss->postcopy_requested = postcopy_requested;
+    /*
+     * When restoring a preempted page, the old data resides in PRECOPY
+     * slow channel, even if postcopy_requested is set.  So always use
+     * PRECOPY channel here.
+     */
+    pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
 
     trace_postcopy_preempt_restored(pss->block->idstr, pss->page);
 
@@ -2344,12 +2386,9 @@ static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss)
 static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
 {
     MigrationState *s = migrate_get_current();
-    unsigned int channel;
+    unsigned int channel = pss->postcopy_target_channel;
     QEMUFile *next;
 
-    channel = pss->postcopy_requested ?
-        RAM_CHANNEL_POSTCOPY : RAM_CHANNEL_PRECOPY;
-
     if (channel != rs->postcopy_channel) {
         if (channel == RAM_CHANNEL_PRECOPY) {
             next = s->to_dst_file;
@@ -2505,7 +2544,7 @@ static int ram_find_and_save_block(RAMState *rs)
              * preempted precopy.  Otherwise find the next dirty bit.
              */
             if (postcopy_preempt_triggered(rs)) {
-                postcopy_preempt_restore(rs, &pss);
+                postcopy_preempt_restore(rs, &pss, false);
                 found = true;
             } else {
                 /* priority queue empty, so just search for something dirty */
-- 
2.36.1



  parent reply	other threads:[~2022-07-20 11:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-20 11:18 [PULL 00/30] migration queue Dr. David Alan Gilbert (git)
2022-07-20 11:18 ` [PULL 01/30] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping Dr. David Alan Gilbert (git)
2022-07-20 11:18 ` [PULL 02/30] cpus: Introduce cpu_list_generation_id Dr. David Alan Gilbert (git)
2022-07-20 11:18 ` [PULL 03/30] migration/dirtyrate: Refactor dirty page rate calculation Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 04/30] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 05/30] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 06/30] softmmu/dirtylimit: Implement virtual CPU throttle Dr. David Alan Gilbert (git)
2022-07-29 13:31   ` Peter Maydell
2022-07-29 14:14     ` Richard Henderson
2022-07-29 15:14       ` Hyman
2022-07-20 11:19 ` [PULL 07/30] softmmu/dirtylimit: Implement dirty page rate limit Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 08/30] tests: Add dirty page rate limit test Dr. David Alan Gilbert (git)
2022-08-01 10:41   ` migration test fails for aarch64 target (was: Re: [PULL 08/30] tests: Add dirty page rate limit test) Thomas Huth
2022-08-01 11:20     ` Peter Maydell
2022-07-20 11:19 ` [PULL 09/30] multifd: Copy pages before compressing them with zlib Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 10/30] migration: Add postcopy-preempt capability Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 11/30] migration: Postcopy preemption preparation on channel creation Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 12/30] migration: Postcopy preemption enablement Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 13/30] migration: Postcopy recover with preempt enabled Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 14/30] migration: Create the postcopy preempt channel asynchronously Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 15/30] migration: Add property x-postcopy-preempt-break-huge Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 16/30] migration: Add helpers to detect TLS capability Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 17/30] migration: Export tls-[creds|hostname|authz] params to cmdline too Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 18/30] migration: Enable TLS for preempt channel Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` Dr. David Alan Gilbert (git) [this message]
2022-07-20 11:19 ` [PULL 20/30] tests: Move MigrateCommon upper Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 21/30] tests: Add postcopy tls migration test Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 22/30] tests: Add postcopy tls recovery " Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 23/30] tests: Add postcopy preempt tests Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 24/30] migration: remove unreachable code after reading data Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 25/30] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 26/30] Add dirty-sync-missed-zero-copy migration stat Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 27/30] migration/multifd: Report to user when zerocopy not working Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 28/30] multifd: Document the locking of MultiFD{Send/Recv}Params Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 29/30] migration: Avoid false-positive on non-supported scenarios for zero-copy-send Dr. David Alan Gilbert (git)
2022-07-20 11:19 ` [PULL 30/30] Revert "gitlab: disable accelerated zlib for s390x" Dr. David Alan Gilbert (git)
2022-07-21 10:11 ` [PULL 00/30] migration queue Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220720111926.107055-20-dgilbert@redhat.com \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=huangy81@chinatelecom.cn \
    --cc=iii@linux.ibm.com \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.