Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap

From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>,
	"Juergen Gross" <jgross@suse.com>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Ian Jackson" <iwj@xenproject.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap
Date: Mon, 28 Jun 2021 10:47:06 +0200	[thread overview]
Message-ID: <3aea7472-6c1d-f786-db5f-ead60eb03240@suse.com> (raw)
In-Reply-To: <44825600-c27b-34ac-01b2-1ffb5e0bf0be@citrix.com>

On 25.06.2021 20:08, Andrew Cooper wrote:
> On 25/06/2021 14:19, Jan Beulich wrote:
>> Like for the dirty bitmap, it is unnecessary to allocate the deferred-
>> pages bitmap when all that's ever going to happen is a single all-dirty
>> run.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> The clearing of the bitmap at the end of suspend_and_send_dirty() also
>> looks unnecessary - am I overlooking anything?
> 
> Yes. Remus and COLO.  You don't want accumulate successfully-sent
> deferred pages over checkpoints, otherwise you'll eventually be sending
> the entire VM every checkpoint.

Oh, so what I've really missed is save() being a loop over these
functions.

> Answering out of patch order...
>> @@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
>>  {
>>      xc_interface *xch = ctx->xch;
>>      int rc;
>> -    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> -                                    &ctx->save.dirty_bitmap_hbuf);
>>  
>>      rc = ctx->save.ops.setup(ctx);
>>      if ( rc )
>>          goto err;
>>  
>> -    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
>> -        ? xc_hypercall_buffer_alloc_pages(
>> -              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
>> -        : (void *)-1L;
>> +    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
>> +    {
>> +        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> +                                        &ctx->save.dirty_bitmap_hbuf);
>> +
>> +        dirty_bitmap =
>> +            xc_hypercall_buffer_alloc_pages(
>> +                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>> +        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
>> +
>> +        if ( !dirty_bitmap || !ctx->save.deferred_pages )
>> +            goto enomem;
>> +    }
> 
> So this is better than the previous patch.  At least we've got a clean
> NULL pointer now.
> 
> I could in principle get on board with the optimisation, except its not
> safe (see below).
> 
>> --- a/tools/libs/guest/xg_sr_save.c
>> +++ b/tools/libs/guest/xg_sr_save.c
>> @@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
>>                                                        ctx->save.batch_pfns[i]);
>>  
>>          /* Likely a ballooned page. */
>> -        if ( mfns[i] == INVALID_MFN )
>> +        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
>>          {
>>              set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>>              ++ctx->save.nr_deferred_pages;
>> @@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
>>              {
>>                  if ( rc == -1 && errno == EAGAIN )
>>                  {
>> -                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>> -                    ++ctx->save.nr_deferred_pages;
>> +                    if ( ctx->save.deferred_pages )
>> +                    {
>> +                        set_bit(ctx->save.batch_pfns[i],
>> +                                ctx->save.deferred_pages);
>> +                        ++ctx->save.nr_deferred_pages;
>> +                    }
> 
> These two blocks are the only two which modify deferred_pages.
> 
> It occurs to me that this means deferred_pages is PV-only, because of
> the stub implementations of x86_hvm_pfn_to_gfn() and
> x86_hvm_normalise_page().  Furthermore, this is likely to be true for
> any HVM-like domains even on other architectures.

IOW are you suggesting to also avoid allocation for HVM live
migration, thus effectively making assumptions on the two hooks
being just stubs in that case, which can't ever fail?

> If these instead were hard errors when !deferred_pages, then that at
> least get the logic into an acceptable state. 

But the goal here isn't to change the logic, just to avoid allocating
memory that's effectively never used. What you suggest could be a
separate patch, yes, but I'm afraid I'm not feeling confident enough
in understanding why you think this needs changing, so I'd prefer to
leave such a change to you. (If I was to apply some guessing to what
you may mean, I could deduce that you think ->nr_deferred_pages may
still need maintaining, with it being non-zero at the end of the last
step causing migration to fail. But there would then still not be any
need for the bitmap itself in the cases where it no longer gets
allocated.)

> However, the first hunk demonstrates that deferred_pages gets used even
> in the non-live case.  In particular, it is sensitive to errors with the
> guests' handling of its own P2M.  Also, I can't obviously spot anything
> which will correctly fail migration if deferred pages survive the final
> iteration.

How does the first hunk demonstrate this? The question isn't when
the bitmap gets updated, but under what conditions it gets consumed.
If the only sending function ever called is suspend_and_send_dirty(),
then nothing would ever have had a chance to set any bit. And any
bits set in the course of suspend_and_send_dirty() running will get
cleared again at the end of suspend_and_send_dirty().

Jan