Re: [PATCH RFC] vfio: Revise and update the migration uAPI description

From: Jason Gunthorpe <jgg@nvidia.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH RFC] vfio: Revise and update the migration uAPI description
Date: Mon, 24 Jan 2022 13:57:12 -0400	[thread overview]
Message-ID: <20220124175712.GB84788@nvidia.com> (raw)
In-Reply-To: <87fspdl9cv.fsf@redhat.com>

On Mon, Jan 24, 2022 at 11:24:32AM +0100, Cornelia Huck wrote:
> On Wed, Jan 19 2022, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > So, OK, I drafted a new series that just replaces the whole v1
> > protocol. If we are agreed on breaking everything then I'd like to
> > clean the other troublesome bits too, already we have some future
> > topics on our radar that will benefit from doing this.
> 
> Can you share something about those "future topics"? It will help us
> understand what you are trying to do, and maybe others might be going
> into that direction as well.

We are concerned that the region API has no way to notify userspace
that it has data ready. We discussed this before and Alex was thinking
qemu should be busy looping, but we are expecting to have many devices
in a VM at any time and this seems inefficient.

eg currently it looks like qemu will enter STOP_COPY serially on every
device, and for something like mlx5 this means it sits around doing
nothing while the snapshot is prepared.

It would be better if qemu put all the devices into STOP_COPY and let
them run their work in the background then use poll() to wait for data
to come out. Then we can parallelize all the device steps and support
a model where we the device is streaming the STOP_COPY data slower
than the CPU can consume it, which we are also thinking about for a
future mlx5 revision.

Basically all of this is to speed up migration in for cases with
multiple STOP_COPY type devices.

From what I can see qemu doesn't have the event loop infrastructure to
support this in migration, but we can get the kernel side setup as
part of the simplification process.

> > Aside from being a more unixy interface, an FD can be used with
> > poll/io_uring/splice/etc and opens up better avenues to optimize for
> > operating migrations of multiple devices in parallel. It kills a wack
> > of goofy tricky driver code too.
> 
> Cleaner code certainly sounds compelling. It will be easier to review a
> more concrete proposal, though, so I'll reserve judgment until then.

Sure, we have patches now, just going through testing steps. 

A full series should be posted in the next few days, but if you want
to look ahead:

https://github.com/jgunthorpe/linux/commits/for-yishai

We have also made the matching qemu changes.

Thanks,
Jason