Re: [PATCH RFC] memory: Don't allow to resize RAM while migrating

From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,
	"Shameerali Kolothum Thodi"
	<shameerali.kolothum.thodi@huawei.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Shannon Zhao" <shannon.zhao@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH RFC] memory: Don't allow to resize RAM while migrating
Date: Fri, 14 Feb 2020 17:45:57 +0100	[thread overview]
Message-ID: <e013b2f1-7153-323f-33d7-f6f3af985fe8@redhat.com> (raw)
In-Reply-To: <20200214155623.GA1163818@xz-x1>

>> a) In precopy during the second migration.
>> b) In postcopy during the first migration.
> 
> After reading your reply - even the 1st migration of precopy?  Say,
> when source QEMU resets and found changed FW during the precopy?

I think the FW will only change during migration (depends on the other
QEMU version) - but yeah, might be possible - no expert.

> 
>>
>>>
>>> And is this patch trying to fix/warn when there's a reboot during (3)
>>> so the new size is discovered at a wrong time?  Is my understanding
>>> correct?
>>
>> It's trying to bail out early instead of failing at other random points
>> (with an unclear outcome).
> 
> Yeah, I am just uncertain on whether in some cases it could be a
> silent success (when used_length changed, however migration still
> completed without error reported) and now we're changing it to a VM
> crash... Could that happen?
> 
>   - before the patch, when precopy triggers this,
> 
>     - when it didn't encounter issue with the changed used_length, it
>       could get silently ignored.  Lucky enough & good case.
> 
>     - when it triggered an error, precopy failed.  _However_, we can
>       simply restart... so still not so bad.
> 
>   - after the patch, when precopy detects this, we abort
>     immediately...  Which is really not good...

Se the other sub-thread (see below), we're thinking about canceling
pre-copy, which could work just fine.

> 
> If you see, that's the major thing I was worrying about...
> 
> And since used_length is growing in most cases as you said (at least
> before virtio-mem comes? :), I'm suspecting that could be the major

hah! :) The think about virtio-mem is that it can actually decide to not
resize during migration (and I have that implemented right now) - acpi
code can't.

> case that there will be a silent success.

The thing is, it might not be a silent success but a very strange
error/crash. We have a data race here. But yeah, I agree that we should
at least precopy not crashing.

>>>> In the precopy case it would be easier to abort (although, not simple
>>>> AFAIKS), in the postcopy not so easy - because you're already partially
>>>> running on the migration target.
>>>
>>> Prior to this patch, would a precopy still survive with such an
>>> accident (asked because I _feel_ like migrating a ramblock with
>>> smaller used_length to the same ramblock with bigger used_length seems
>>> to be fine?)?  Or we can stop the precopy and restart.  After this
>>
>> I assume growing the region is the usual case (not shrinking). FW blobs
>> tend to get bigger.
>>
>> Migrating while growing a ram block on the source won't work. The source
>> would try to send a dirt page that's outside of the used_length on the
>> target, making e.g., ram_load_postcopy()/ram_load_precopy() fail with
>> "Illegal RAM offset...".
> 
> Right.
> 
>>
>> In the postcopy case, e.g., ram_dirty_bitmap_reload() will fail in case
>> there is a mismatch between ram block size on source/target.
> 
> IMHO that's an extreme rare case when (one example I can think of):
> 
>   - we start a postcopy after a precopy
>   - system reset, noticed a firmware update
>   - we got a network failure, postcopy interrupted
>   - we try to recover a postcopy
> 
> So are you using postcopy recovery?  I will be surprised if so because
> then you'll be the first user I know that really used that besides QE. :)

One of my strengths is to read code and find flaws :P
Good to know that that should be "barely" affected for now :)

>> Another issue is if the used_length changes while in ram_save_setup(),
>> just between storing ram_bytes_total_common(true) and storing
>> block->used_length. A mismatch will screw up the migration stream.
> 
> Yes this seems to be another issue then.  IIUC the ramblocks are
> protected by RCU, the migration code has always been with the read
> lock there so logically it should see a consistent view of system
> ramblocks in ram_save_setup().  IMHO the thing that was inconsistent
> is that RCU is not safe enough for changing used_length for a ramblock.

Yes.

> 
>>
>> But these are just the immediately visible issues. I am more concerned
>> about used_length changing at random points in time, resulting in more
>> harm. (e.g., non-obvious load-store tearing when accessing the used length)
>>
>> Migration code is inherently racy when it comes to ram block resizes.
>> And that might become more dangerous once we want to size the migration
>> bitmaps smaller (used_length instead of max_length) or disallow access
>> to ram blocks beyond the used_length. Both are things I am working on :)
> 
> Right. Now I start to wonder whether migration is the only special guy
> here.  I noticed at least we've got:
> 
> struct RAMBlockNotifier {
>     void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size);
>     void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size);
>     QLIST_ENTRY(RAMBlockNotifier) next;
> };
> 
> I suspect at least all these users could also break in some way if
> resize happens.

Hah! You should read

https://lore.kernel.org/qemu-devel/20200212134254.11073-1-david@redhat.com/

:)

VFIO is indeed broken on resizes - and fixed in that series (I assume
nobody migrates ...). HAX and SEV simply pin all memory and don't care
about any used_length changes. The callbacks were for now called with
max_length, which works but is not extensible.

See my suggestion in

https://lore.kernel.org/qemu-devel/bb33b209-2b15-4bbd-7fe9-3aa813e4c194@redhat.com/

which builds up on a ram resize notifier.

> 
>>
>>> patch, it'll crash the source VM (&error_abort specified in
>>> memory_region_ram_resize()), which seems a bit more harsh?
>>
>> There seems to be no easy way to abort migration from outside the
>> migration thread. As Juan said, you actually don't want to fail
>> migration but instead soft-abort migration and continue running the
>> guest on the target on a reset. But that's not easy as well.
>>
>> One could think about extending ram block notifiers to notify migration
>> code (before) resizes, so that migration code can work around the
>> resize (how is TBD). Not easy as well :)
> 
> True.  But if you see my worry still stands, on whether such a patch
> would make things worse by crashing it when it could still have a
> chance to survive.  Shall we loose the penalty of that even if we want
> to warn the user earlier?

Canceling migration in precopy case should be fine. Postcopy needs more
thought.

I certainly don't want to live with strange data races in migration code
because "it could work sometimes eventually".

Thanks for all the comments and thoughts!

-- 
Thanks,

David / dhildenb