RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

From: "Wang, Wei W" <wei.w.wang@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Date: Tue, 13 Jul 2021 08:20:08 +0000	[thread overview]
Message-ID: <22867e1aa6fe4533943e912b4b2e080f@intel.com> (raw)
In-Reply-To: <YOhhoHJFyiQAEBRZ@t490s>

On Friday, July 9, 2021 10:48 PM, Peter Xu wrote:
> On Fri, Jul 09, 2021 at 08:58:08AM +0000, Wang, Wei W wrote:
> > On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > > > Yes I think this is the place I didn't make myself clear.  It's
> > > > > not about sleeping, it's about the cmpxchg being expensive
> > > > > already when the vm
> > > is huge.
> > > >
> > > > OK.
> > > > How did you root cause that it's caused by cmpxchg, instead of
> > > > lock contention (i.e. syscall and sleep) or some other code inside
> > > pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles
> > > of pthread_mutex_lock()?
> > >
> > > We've got "perf top -g" showing a huge amount of stacks lying in
> > > pthread_mutex_lock().
> >
> > This only explains pthread_mutex_lock is the cause, not root caused to
> cmpxchg.
> 
> I think that's enough already to prove we can move the lock elsewhere.
> 
> It's not really a heavy race between threads; it's the pure overhead we called it
> too many times.  So it's not really a problem yet about "what type of lock we
> should use" (mutex or spin lock) or "how this lock is implemented" (say, whether
> using cmpxchg only or optimize using test + test-and-set, as that sounds like a
> good optimization of pure test-and-set spinlocks) because the lock is not busy at
> all.

Just FYI:
there is a big while(1) {} inside pthread_mutex_lock, not sure if the hotspot is in the loop there.

> > What if the guest gets stopped and then the migration thread goes to sleep?
> 
> Isn't the balloon code run in a standalone iothread?  Why guest stopped can
> stop migration thread?

Yes, it is async as you know. Guest puts hints into the vq, then gets paused,
and then the device takes the mutex, then the migration thread gets blocked.
In general, when we use mutex, we need to consider that case that it could be blocked.

> From what I learned in the past few years, funnily "speed of migration" is
> normally not what people care the most.  Issues are mostly with convergence
> and being transparent to users using the VMs so they aren't even aware.

Yes, migration time isn’t that critically important, but shorter is better than longer.
Skipping those free pages saves network bandwidth, which is also good.
Oherwise, 0-page optimization in migration is also meaningless.
In theory, free pages in the last round could also be skipped to reduce downtime
(just haven't got a good testcase to show it).

> >
> > Seems we lack resources for those tests right now. If you are urgent for a
> decision to have it work first, I'm also OK with you to merge it.
> 
> No I can't merge it myself as I'm not the maintainer. :) I haven't received any ack
> yet, so at least I'll need to see how Dave and Juan think.  It's just that I don't
> think qemuspin could help much in this case, and I don't want to mess up the
> issue too.
> 

Yes, I'm also OK if they want to merge it.
If it is possible that anyone from your testing team (QA?) could help do a regression test of free page hint,
checking the difference (e.g. migration time of an idle guest) after applying this patch, that would be greater. 
Thanks!

Best,
Wei