All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>,
	"Juergen Gross" <jgross@suse.com>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Ian Jackson" <iwj@xenproject.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap
Date: Mon, 28 Jun 2021 10:47:06 +0200	[thread overview]
Message-ID: <3aea7472-6c1d-f786-db5f-ead60eb03240@suse.com> (raw)
In-Reply-To: <44825600-c27b-34ac-01b2-1ffb5e0bf0be@citrix.com>

On 25.06.2021 20:08, Andrew Cooper wrote:
> On 25/06/2021 14:19, Jan Beulich wrote:
>> Like for the dirty bitmap, it is unnecessary to allocate the deferred-
>> pages bitmap when all that's ever going to happen is a single all-dirty
>> run.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> The clearing of the bitmap at the end of suspend_and_send_dirty() also
>> looks unnecessary - am I overlooking anything?
> 
> Yes. Remus and COLO.  You don't want accumulate successfully-sent
> deferred pages over checkpoints, otherwise you'll eventually be sending
> the entire VM every checkpoint.

Oh, so what I've really missed is save() being a loop over these
functions.

> Answering out of patch order...
>> @@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
>>  {
>>      xc_interface *xch = ctx->xch;
>>      int rc;
>> -    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> -                                    &ctx->save.dirty_bitmap_hbuf);
>>  
>>      rc = ctx->save.ops.setup(ctx);
>>      if ( rc )
>>          goto err;
>>  
>> -    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
>> -        ? xc_hypercall_buffer_alloc_pages(
>> -              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
>> -        : (void *)-1L;
>> +    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
>> +    {
>> +        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> +                                        &ctx->save.dirty_bitmap_hbuf);
>> +
>> +        dirty_bitmap =
>> +            xc_hypercall_buffer_alloc_pages(
>> +                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>> +        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
>> +
>> +        if ( !dirty_bitmap || !ctx->save.deferred_pages )
>> +            goto enomem;
>> +    }
> 
> So this is better than the previous patch.  At least we've got a clean
> NULL pointer now.
> 
> I could in principle get on board with the optimisation, except its not
> safe (see below).
> 
>> --- a/tools/libs/guest/xg_sr_save.c
>> +++ b/tools/libs/guest/xg_sr_save.c
>> @@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
>>                                                        ctx->save.batch_pfns[i]);
>>  
>>          /* Likely a ballooned page. */
>> -        if ( mfns[i] == INVALID_MFN )
>> +        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
>>          {
>>              set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>>              ++ctx->save.nr_deferred_pages;
>> @@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
>>              {
>>                  if ( rc == -1 && errno == EAGAIN )
>>                  {
>> -                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>> -                    ++ctx->save.nr_deferred_pages;
>> +                    if ( ctx->save.deferred_pages )
>> +                    {
>> +                        set_bit(ctx->save.batch_pfns[i],
>> +                                ctx->save.deferred_pages);
>> +                        ++ctx->save.nr_deferred_pages;
>> +                    }
> 
> These two blocks are the only two which modify deferred_pages.
> 
> It occurs to me that this means deferred_pages is PV-only, because of
> the stub implementations of x86_hvm_pfn_to_gfn() and
> x86_hvm_normalise_page().  Furthermore, this is likely to be true for
> any HVM-like domains even on other architectures.

IOW are you suggesting to also avoid allocation for HVM live
migration, thus effectively making assumptions on the two hooks
being just stubs in that case, which can't ever fail?

> If these instead were hard errors when !deferred_pages, then that at
> least get the logic into an acceptable state. 

But the goal here isn't to change the logic, just to avoid allocating
memory that's effectively never used. What you suggest could be a
separate patch, yes, but I'm afraid I'm not feeling confident enough
in understanding why you think this needs changing, so I'd prefer to
leave such a change to you. (If I was to apply some guessing to what
you may mean, I could deduce that you think ->nr_deferred_pages may
still need maintaining, with it being non-zero at the end of the last
step causing migration to fail. But there would then still not be any
need for the bitmap itself in the cases where it no longer gets
allocated.)

> However, the first hunk demonstrates that deferred_pages gets used even
> in the non-live case.  In particular, it is sensitive to errors with the
> guests' handling of its own P2M.  Also, I can't obviously spot anything
> which will correctly fail migration if deferred pages survive the final
> iteration.

How does the first hunk demonstrate this? The question isn't when
the bitmap gets updated, but under what conditions it gets consumed.
If the only sending function ever called is suspend_and_send_dirty(),
then nothing would ever have had a chance to set any bit. And any
bits set in the course of suspend_and_send_dirty() running will get
cleared again at the end of suspend_and_send_dirty().

Jan



  reply	other threads:[~2021-06-28  8:47 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 13:15 [PATCH 00/12] x86: more or less log-dirty related improvements Jan Beulich
2021-06-25 13:17 ` [PATCH 01/12] libxc: split xc_logdirty_control() from xc_shadow_control() Jan Beulich
2021-06-25 14:51   ` Christian Lindig
2021-06-25 15:49   ` Andrew Cooper
2021-06-28  9:40     ` Jan Beulich
2021-06-25 13:18 ` [PATCH 02/12] libxenguest: deal with log-dirty op stats overflow Jan Beulich
2021-06-25 16:36   ` Andrew Cooper
2021-06-28  7:48     ` Jan Beulich
2021-06-28 11:10       ` Olaf Hering
2021-06-28 11:20         ` Jan Beulich
2021-06-28 11:30           ` Olaf Hering
2021-06-25 13:18 ` [PATCH 03/12] libxenguest: short-circuit "all-dirty" handling Jan Beulich
2021-06-25 17:02   ` Andrew Cooper
2021-06-28  8:26     ` Jan Beulich
2021-09-02 17:11       ` Ian Jackson
2021-06-25 13:19 ` [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
2021-06-25 18:08   ` Andrew Cooper
2021-06-28  8:47     ` Jan Beulich [this message]
2021-09-02 17:17       ` Ian Jackson
2021-06-25 13:19 ` [PATCH 05/12] libxenguest: complete loops in xc_map_domain_meminfo() Jan Beulich
2021-06-25 18:30   ` Andrew Cooper
2021-06-28  8:53     ` Jan Beulich
2021-06-25 13:20 ` [PATCH 06/12] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
2021-06-25 19:00   ` Andrew Cooper
2021-06-28  9:05     ` Jan Beulich
2021-06-25 13:20 ` [PATCH 07/12] libxenguest: fix off-by-1 in colo-secondary-bitmap merging Jan Beulich
2021-06-25 19:06   ` Andrew Cooper
2021-06-25 13:21 ` [PATCH 08/12] x86/paging: deal with log-dirty stats overflow Jan Beulich
2021-06-25 19:09   ` Andrew Cooper
2021-06-25 13:21 ` [PATCH 09/12] x86/paging: supply more useful log-dirty page count Jan Beulich
2021-06-25 13:22 ` [PATCH 10/12] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
2021-06-25 13:22 ` [PATCH 11/12] x86/mm: pull a sanity check earlier in xenmem_add_to_physmap_one() Jan Beulich
2021-06-25 19:10   ` Andrew Cooper
2021-06-25 13:24 ` [PATCH 12/12] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
2021-06-25 19:45   ` Andrew Cooper
2021-06-28  9:22     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3aea7472-6c1d-f786-db5f-ead60eb03240@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jgross@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.