qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <dhildenb@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Shameerali Kolothum Thodi"
	<shameerali.kolothum.thodi@huawei.com>,
	qemu-devel@nongnu.org, "Shannon Zhao" <shannon.zhao@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH RFC] memory: Don't allow to resize RAM while migrating
Date: Fri, 14 Feb 2020 15:38:57 -0500	[thread overview]
Message-ID: <20200214203857.GA1195634@xz-x1> (raw)
In-Reply-To: <A5C9F372-A9A6-4D6C-8C08-798F4ED15C10@redhat.com>

On Fri, Feb 14, 2020 at 03:04:23PM -0500, David Hildenbrand wrote:
> 
> 
> > Am 14.02.2020 um 20:45 schrieb Peter Xu <peterx@redhat.com>:
> > 
> > On Fri, Feb 14, 2020 at 07:26:59PM +0100, David Hildenbrand wrote:
> >>>>>> +    if (!postcopy_is_running()) {
> >>>>>> +        Error *err = NULL;
> >>>>>> +
> >>>>>> +        /*
> >>>>>> +         * Precopy code cannot deal with the size of ram blocks changing at
> >>>>>> +         * random points in time. We're still running on the source, abort
> >>>>>> +         * the migration and continue running here. Make sure to wait until
> >>>>>> +         * migration was canceled.
> >>>>>> +         */
> >>>>>> +        error_setg(&err, "RAM resized during precopy.");
> >>>>>> +        migrate_set_error(migrate_get_current(), err);
> >>>>>> +        error_free(err);
> >>>>>> +        migration_cancel();
> >>>>>> +    } else {
> >>>>>> +        /*
> >>>>>> +         * Postcopy code cannot deal with the size of ram blocks changing at
> >>>>>> +         * random points in time. We're running on the target. Fail hard.
> >>>>>> +         *
> >>>>>> +         * TODO: How to handle this in a better way?
> >>>>>> +         */
> >>>>>> +        error_report("RAM resized during postcopy.");
> >>>>>> +        exit(-1);
> >>>>> 
> >>>>> Now I'm rethinking the postcopy case....
> >>>>> 
> >>>>> ram_dirty_bitmap_reload() should only happen during the postcopy
> >>>>> recovery, and when that happens the VM should be stopped on both
> >>>>> sides.  Which means, ram resizing should not trigger there...
> >>>> 
> >>>> But that guest got the chance to run for a bit and eventually reboot
> >>>> AFAIK. Also, there are other data races possible when used_length
> >>>> suddenly changes, this is just the most obvious one where things will;
> >>>> get screwed up.
> >>> 
> >>> Right, the major one could be in ram_load_postcopy() where we call
> >>> host_from_ram_block_offset().  However if FW extension is the major
> >>> use case then it seems to still work (still better than crashing,
> >>> isn't it? :).
> >> 
> >> "Let's close our eyes and hope it will never happen" ? :) No, I don't
> >> like that. This screams for a better solution long term, and until then
> >> a proper fencing IMHO. We're making here wild guesses about data races
> >> and why they might not be that bad in certain cases (did I mention
> >> load/store tearing? used_length is not an atomic value ...).
> > 
> > Yeah fencing is good, but crashing a VM while it wasn't going to crash
> > is another thing, imho.  You can dump an error message if you really
> > like, but instead of exit() I really prefer we either still let the
> > old way to at least work or really fix it.
> 
> I‘ll do whatever Juan/Dave think is best. I am not convinced that there is no way to corrupt data or crash later when the guest is already running again post-reboot and doing real work.

Yeah I never said it will always work. :)

However it does not mean it'll break every time.  My guess is that for
the happened cases it might still survive quite a few, confessing that
is without much clue.  I just prefer to avoid having an explicit patch
to bail out like that, because it doesn't really help that much by
crashing earlier.

That's something I learnt when I started to work on migration, that
is, we don't call exit() on source VM when we really, really needed
to.  For postcopy, it's the destination VM that matters here.

Yeh not a big deal since this is really corner case even if it
happened.  Let's follow the maintainers' judgement.

Thanks,

-- 
Peter Xu



      reply	other threads:[~2020-02-14 20:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13 17:20 [PATCH RFC] memory: Don't allow to resize RAM while migrating David Hildenbrand
2020-02-13 18:32 ` Peter Xu
2020-02-13 19:42   ` David Hildenbrand
2020-02-13 20:56     ` Peter Xu
2020-02-14  9:17       ` David Hildenbrand
2020-02-14 15:56         ` Peter Xu
2020-02-14 16:45           ` David Hildenbrand
2020-02-13 19:09 ` Juan Quintela
2020-02-13 19:38   ` David Hildenbrand
2020-02-14 10:25 ` Dr. David Alan Gilbert
2020-02-14 10:32   ` David Hildenbrand
2020-02-14 10:42     ` Dr. David Alan Gilbert
2020-02-14 10:46       ` David Hildenbrand
2020-02-14 11:02         ` Dr. David Alan Gilbert
2020-02-14 11:08           ` David Hildenbrand
2020-02-14 12:02             ` David Hildenbrand
2020-02-14 12:46               ` Juan Quintela
2020-02-14 14:00                 ` David Hildenbrand
2020-02-14 15:14                   ` Dr. David Alan Gilbert
2020-02-14 17:29               ` Peter Xu
2020-02-14 17:32                 ` David Hildenbrand
2020-02-14 18:11                   ` Peter Xu
2020-02-14 18:26                     ` David Hildenbrand
2020-02-14 19:44                       ` Peter Xu
2020-02-14 20:04                         ` David Hildenbrand
2020-02-14 20:38                           ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200214203857.GA1195634@xz-x1 \
    --to=peterx@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shannon.zhao@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).