kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state
Date: Fri, 7 Jan 2022 13:23:24 -0400	[thread overview]
Message-ID: <20220107172324.GV2328285@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB5276177829EE5ED89AAD82398C4D9@BN9PR11MB5276.namprd11.prod.outlook.com>

On Fri, Jan 07, 2022 at 02:01:55AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, January 7, 2022 8:30 AM
> > 
> > On Fri, Jan 07, 2022 at 12:00:13AM +0000, Tian, Kevin wrote:
> > > > Devices that are poorly designed here will have very long migration
> > > > downtime latencies and people simply won't want to use them.
> > >
> > > Different usages have different latency requirement. Do we just want
> > > people to decide whether to manage state for a device by
> > > measurement?
> > 
> > It doesn't seem unreasonable to allow userspace to set max timer for
> > NDMA for SLA purposes on devices that have unbounded NDMA times. It
> > would probably be some new optional ioctl for devices that can
> > implement it.
> 
> Yes, that's my point.
> 
> > 
> > However, this basically gives up on the idea that a VM can be migrated
> > as any migration can timeout and fail under this philosophy. I think
> > that is still very poor.
> > 
> > Optional migration really can't be sane path forward.
> > 
> 
> How is it different from the scenario where the guest generates a very
> high dirty rate so the precopy phase can never converge to a pre-defined
> threshold then abort the migration after certain timeout?

The hypervisor can halt the VCPU and put a stop to this and complete
the migration.

There is a difference between optional migration under a SLA and
mandatory migration with no SLA - I think both must be supported to be
sane.

> IMHO live migration is always a try-and-fail flavor. A previous migration
> failure doesn't prevent the orchestration stack to retry at a later point.

An operator might need to emergency migrate a VM without the
possibility for failure. For instance there is something wrong with
the base HW. SLA ignored, migration must be done.

IMHO it is completely wrong to view migration as optional, that is a
terrible standard to design HW to.

Jason

  reply	other threads:[~2022-01-07 17:23 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 23:34 [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state Alex Williamson
2021-12-10  1:25 ` Jason Gunthorpe
2021-12-13 20:40   ` Alex Williamson
2021-12-14 12:08     ` Cornelia Huck
2021-12-14 16:26     ` Jason Gunthorpe
2021-12-20 22:26       ` Alex Williamson
2022-01-04 20:28         ` Jason Gunthorpe
2022-01-06 18:17           ` Alex Williamson
2022-01-06 21:20             ` Jason Gunthorpe
2022-01-10  7:55               ` Tian, Kevin
2022-01-10 17:34                 ` Alex Williamson
2022-01-11  2:41                   ` Tian, Kevin
2022-01-10 18:11                 ` Jason Gunthorpe
2022-01-11  3:14                   ` Tian, Kevin
2022-01-11 18:19                     ` Jason Gunthorpe
2022-01-04  3:49       ` Tian, Kevin
2022-01-04 16:09         ` Jason Gunthorpe
2022-01-05  1:59           ` Tian, Kevin
2022-01-05 12:45             ` Jason Gunthorpe
2022-01-06  6:32               ` Tian, Kevin
2022-01-06 15:42                 ` Jason Gunthorpe
2022-01-07  0:00                   ` Tian, Kevin
2022-01-07  0:29                     ` Jason Gunthorpe
2022-01-07  2:01                       ` Tian, Kevin
2022-01-07 17:23                         ` Jason Gunthorpe [this message]
2022-01-10  3:14                           ` Tian, Kevin
2022-01-10 17:52                             ` Jason Gunthorpe
2022-01-11  2:57                               ` Tian, Kevin
2022-01-05  3:06           ` Tian, Kevin
2021-12-20 17:38 ` Cornelia Huck
2021-12-20 22:49   ` Alex Williamson
2021-12-21 11:24     ` Cornelia Huck
2022-01-07  8:03 ` Tian, Kevin
2022-01-07 16:36   ` Alex Williamson
2022-01-10  6:01     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220107172324.GV2328285@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=farman@linux.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).