All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, Manish Mishra <manish.mishra@nutanix.com>,
	Juan Quintela <quintela@redhat.com>,
	ani@anisinha.ca,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	"Daniel P . Berrange" <berrange@redhat.com>
Subject: Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
Date: Wed, 5 Oct 2022 15:48:30 -0400	[thread overview]
Message-ID: <Yz3fjhBSNRuq/PjS@x1n> (raw)
In-Reply-To: <Yz2JZT55uhTdP7+m@x1n>

On Wed, Oct 05, 2022 at 09:40:53AM -0400, Peter Xu wrote:
> On Wed, Oct 05, 2022 at 12:18:00PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Oct 04, 2022 at 02:55:10PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > Don't take the bitmap mutex when sending pages, or when being throttled by
> > > > > migration_rate_limit() (which is a bit tricky to call it here in ram code,
> > > > > but seems still helpful).
> > > > > 
> > > > > It prepares for the possibility of concurrently sending pages in >1 threads
> > > > > using the function ram_save_host_page() because all threads may need the
> > > > > bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> > > > > qemu_sem_wait() blocking for one thread will not block the other from
> > > > > progressing.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > I generally dont like taking locks conditionally; but this kind of looks
> > > > OK; I think it needs a big comment on the start of the function saying
> > > > that it's called and left with the lock held but that it might drop it
> > > > temporarily.
> > > 
> > > Right, the code is slightly hard to read, I just didn't yet see a good and
> > > easy solution for it yet.  It's just that we may still want to keep the
> > > lock as long as possible for precopy in one shot.
> > > 
> > > > 
> > > > > ---
> > > > >  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
> > > > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > > > 
> > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > index 8303252b6d..6e7de6087a 100644
> > > > > --- a/migration/ram.c
> > > > > +++ b/migration/ram.c
> > > > > @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
> > > > >   */
> > > > >  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > > >  {
> > > > > +    bool page_dirty, release_lock = postcopy_preempt_active();
> > > > 
> > > > Could you rename that to something like 'drop_lock' - you are taking the
> > > > lock at the end even when you have 'release_lock' set - which is a bit
> > > > strange naming.
> > > 
> > > Is there any difference on "drop" or "release"?  I'll change the name
> > > anyway since I definitely trust you on any English comments, but please
> > > still let me know - I love to learn more on those! :)
> > 
> > I'm not sure 'drop' is much better either; I was struggling to find a
> > good nam.
> 
> I can also call it "preempt_enabled".
> 
> Actually I can directly replace it with calling postcopy_preempt_active()
> always but I just want to make it crystal clear that the value is not
> changing and lock & unlock are always paired - in our case I think it is
> not changing, but the var helps to be 100% sure there'll be no possible bug
> on e.g. deadlock caused by state changing.
> 
> > 
> > > > 
> > > > >      int tmppages, pages = 0;
> > > > >      size_t pagesize_bits =
> > > > >          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> > > > > @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > > >              break;
> > > > >          }
> > > > >  
> > > > > +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> > > > > +        /*
> > > > > +         * Properly yield the lock only in postcopy preempt mode because
> > > > > +         * both migration thread and rp-return thread can operate on the
> > > > > +         * bitmaps.
> > > > > +         */
> > > > > +        if (release_lock) {
> > > > > +            qemu_mutex_unlock(&rs->bitmap_mutex);
> > > > > +        }
> > > > 
> > > > Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?
> > > 
> > > I think we can move into it, but it may not be as optimal as keeping it
> > > as-is.
> > > 
> > > Consider a case where we've got the bitmap with continous zero bits.
> > > During postcopy, the migration thread could be spinning here with the lock
> > > held even if it doesn't send a thing.  It could still block the other
> > > return path thread on sending urgent pages which may be outside the zero
> > > zones.
> > 
> > OK, that reason needs commenting then - you're going to do a lot of
> > release/take pairs in that case which is going to show up as very hot;
> > so hmm, if ti was just for that type of 'yield' behaviour you wouldn't
> > normally do it for each bit.
> 
> Hold on.. I think my assumption won't easily trigger, because at the end of
> the loop we'll try to look for the next "dirty" page.  So continuously
> clean pages are unlikely, or I even think it's impossible because we're
> holding the mutex during scanning and clear-dirty, so no one will be able
> to flip the bit.
> 
> So yeah I think it's okay to move it into "page_dirty", but since we'll
> mostly always go into dirty maybe it's just that it won't help a lot
> either, because it'll be mostly the same as keeping it outside?

IOW, maybe I should drop page_dirty directly and replace it with a check,
failing migration if migration_bitmap_clear_dirty() returned false?

-- 
Peter Xu



  reply	other threads:[~2022-10-05 19:55 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
2022-09-20 22:50 ` [PATCH 01/14] migration: Add postcopy_preempt_active() Peter Xu
2022-09-20 22:50 ` [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic Peter Xu
2022-10-04 10:33   ` Dr. David Alan Gilbert
2022-09-20 22:50 ` [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check Peter Xu
2022-10-04 10:41   ` Dr. David Alan Gilbert
2022-09-20 22:50 ` [PATCH 04/14] migration: Remove RAMState.f references in compression code Peter Xu
2022-10-04 10:54   ` Dr. David Alan Gilbert
2022-10-04 14:36     ` Peter Xu
2022-09-20 22:52 ` [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
2022-10-04 13:55   ` Dr. David Alan Gilbert
2022-10-04 19:13     ` Peter Xu
2022-10-05 11:18       ` Dr. David Alan Gilbert
2022-10-05 13:40         ` Peter Xu
2022-10-05 19:48           ` Peter Xu [this message]
2022-09-20 22:52 ` [PATCH 06/14] migration: Use atomic ops properly for page accountings Peter Xu
2022-10-04 16:59   ` Dr. David Alan Gilbert
2022-10-04 19:23     ` Peter Xu
2022-10-05 11:38       ` Dr. David Alan Gilbert
2022-10-05 13:53         ` Peter Xu
2022-10-06 20:40           ` Peter Xu
2022-09-20 22:52 ` [PATCH 07/14] migration: Teach PSS about host page Peter Xu
2022-10-05 11:12   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 08/14] migration: Introduce pss_channel Peter Xu
2022-10-05 13:03   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 09/14] migration: Add pss_init() Peter Xu
2022-10-05 13:09   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
2022-10-05 18:51   ` Dr. David Alan Gilbert
2022-10-05 19:41     ` Peter Xu
2022-10-06  8:36       ` Dr. David Alan Gilbert
2022-10-06  8:37   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus Peter Xu
2022-10-06 16:59   ` Dr. David Alan Gilbert
2022-10-06 18:34     ` Peter Xu
2022-10-06 18:38       ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 12/14] migration: Send requested page directly in rp-return thread Peter Xu
2022-10-06 17:51   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 13/14] migration: Remove old preempt code around state maintainance Peter Xu
2022-09-21  0:47   ` Peter Xu
2022-09-21 13:54     ` Peter Xu
2022-10-06 17:56       ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 14/14] migration: Drop rs->f Peter Xu
2022-10-06 17:57   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz3fjhBSNRuq/PjS@x1n \
    --to=peterx@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=manish.mishra@nutanix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.