RE: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state

From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"Lu, Baolu" <baolu.lu@intel.com>
Subject: RE: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state
Date: Fri, 7 Jan 2022 02:01:55 +0000	[thread overview]
Message-ID: <BN9PR11MB5276177829EE5ED89AAD82398C4D9@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220107002950.GO2328285@nvidia.com>

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 7, 2022 8:30 AM
> 
> On Fri, Jan 07, 2022 at 12:00:13AM +0000, Tian, Kevin wrote:
> > > Devices that are poorly designed here will have very long migration
> > > downtime latencies and people simply won't want to use them.
> >
> > Different usages have different latency requirement. Do we just want
> > people to decide whether to manage state for a device by
> > measurement?
> 
> It doesn't seem unreasonable to allow userspace to set max timer for
> NDMA for SLA purposes on devices that have unbounded NDMA times. It
> would probably be some new optional ioctl for devices that can
> implement it.

Yes, that's my point.

> 
> However, this basically gives up on the idea that a VM can be migrated
> as any migration can timeout and fail under this philosophy. I think
> that is still very poor.
> 
> Optional migration really can't be sane path forward.
> 

How is it different from the scenario where the guest generates a very
high dirty rate so the precopy phase can never converge to a pre-defined
threshold then abort the migration after certain timeout?

IMHO live migration is always a try-and-fail flavor. A previous migration
failure doesn't prevent the orchestration stack to retry at a later point.

In the meantime people do explore various optimizations to increase 
the success rate. Having the device to stop DMA quickly is one such
optimization from the hardware side. 

Thanks
Kevin