netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Brett Creeley <brett.creeley@amd.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"yishaih@nvidia.com" <yishaih@nvidia.com>,
	"shameerali.kolothum.thodi@huawei.com"
	<shameerali.kolothum.thodi@huawei.com>,
	"shannon.nelson@amd.com" <shannon.nelson@amd.com>
Subject: RE: [PATCH v10 vfio 4/7] vfio/pds: Add VFIO live migration support
Date: Tue, 20 Jun 2023 02:02:44 +0000	[thread overview]
Message-ID: <BN9PR11MB5276DD9E2B791EE2C06046348C5CA@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ZJBONrx5LOgpTr1U@nvidia.com>

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, June 19, 2023 8:47 PM
> 
> On Fri, Jun 16, 2023 at 08:06:21AM +0000, Tian, Kevin wrote:
> 
> > Ideally the VMM has an estimation how long a VM can be paused based on
> > SLA, to-be-migrated state size, available network bandwidth, etc. and that
> > hint should be passed to the kernel so any state transition which may
> violate
> > that expectation can fail quickly to break the migration process and put the
> > VM back to the running state.
> >
> > Jason/Shameer, is there similar concern in mlx/hisilicon drivers?
> 
> It is handled through the vfio_device_feature_mig_data_size mechanism..

that is only for estimation of copied data.

IMHO the stop time when the VM is paused includes both the time of
stopping the device and the time of migrating the VM state.

For a software-emulated device the time of stopping the device is negligible.

But certainly for assigned device the worst-case hard-coded 5s timeout as
done in this patch will kill whatever reasonable 'VM dead time' SLA (usually
in milliseconds) which CSPs try to meet purely based on the size of copied
data.

Wouldn't a user-specified stop-device timeout be required to at least allow
breaking migration early according to the desired SLA?

> 
> > > +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && next ==
> > > VFIO_DEVICE_STATE_STOP)
> > > +		return NULL;
> >
> > I'm not sure whether P2P is actually supported here. By definition
> > P2P means the device is stopped but still responds to p2p request
> > from other devices. If you look at mlx example it uses different
> > cmds between RUNNING->RUNNING_P2P and RUNNING_P2P->STOP.
> >
> > But in your case seems you simply move what is required in STOP
> > into P2P. Probably you can just remove the support of P2P like
> > hisilicon does.
> 
> We want new devices to get their architecture right, they need to
> support P2P. Didn't we talk about this already and Brett was going to
> fix it?
> 

Looks it's not fixed since RUNNING_P2P->STOP is a nop in this patch.

  reply	other threads:[~2023-06-20  2:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 22:03 [PATCH v10 vfio 0/7] pds_vfio driver Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 1/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
2023-06-16  6:52   ` Tian, Kevin
2023-06-16 18:37     ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 2/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
2023-06-14 21:31   ` Alex Williamson
2023-06-14 21:41     ` Brett Creeley
2023-06-16  6:56   ` Tian, Kevin
2023-06-16 18:42     ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 3/7] vfio/pds: register with the pds_core PF Brett Creeley
2023-06-15 21:05   ` Shameerali Kolothum Thodi
2023-06-15 21:30     ` Brett Creeley
2023-06-16  7:04   ` Tian, Kevin
2023-06-16 19:01     ` Brett Creeley
2023-06-20  2:11       ` Tian, Kevin
2023-06-02 22:03 ` [PATCH v10 vfio 4/7] vfio/pds: Add VFIO live migration support Brett Creeley
2023-06-15 21:07   ` Shameerali Kolothum Thodi
2023-06-15 21:36     ` Brett Creeley
2023-06-16  8:06   ` Tian, Kevin
2023-06-17  4:45     ` Brett Creeley
2023-06-20  2:19       ` Tian, Kevin
2023-06-19 12:46     ` Jason Gunthorpe
2023-06-20  2:02       ` Tian, Kevin [this message]
2023-06-20 12:31         ` Jason Gunthorpe
2023-06-21  6:49           ` Tian, Kevin
2023-06-21 13:27             ` Jason Gunthorpe
2023-06-26  7:31               ` Tian, Kevin
2023-06-26 18:13                 ` Jason Gunthorpe
2023-06-27  6:03                   ` Tian, Kevin
2023-06-02 22:03 ` [PATCH v10 vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
2023-06-16  8:24   ` Tian, Kevin
2023-06-17  0:47     ` Brett Creeley
2023-06-02 22:03 ` [PATCH v10 vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
2023-06-16  8:25   ` Tian, Kevin
2023-06-16 20:05     ` Brett Creeley
2023-06-14 20:20 ` [PATCH v10 vfio 0/7] pds_vfio driver Alex Williamson
2023-06-16  6:47 ` Tian, Kevin
2023-06-16 20:06   ` Brett Creeley
2023-06-17  4:49 ` Brett Creeley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB5276DD9E2B791EE2C06046348C5CA@BN9PR11MB5276.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=brett.creeley@amd.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shannon.nelson@amd.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).