All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	"yishaih@nvidia.com" <yishaih@nvidia.com>,
	Linuxarm <linuxarm@huawei.com>,
	liulongfang <liulongfang@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
Date: Tue, 1 Mar 2022 09:15:28 -0400	[thread overview]
Message-ID: <20220301131528.GW219866@nvidia.com> (raw)
In-Reply-To: <20220228214110.4deb551f.alex.williamson@redhat.com>

On Mon, Feb 28, 2022 at 09:41:10PM -0700, Alex Williamson wrote:

> > + * returning readable. ENOMSG may not be returned in STOP_COPY. Support
> > + * for this ioctl is required when VFIO_MIGRATION_PRE_COPY is set.
> 
> This entire ioctl on the data_fd seems a bit strange given the previous
> fuss about how difficult it is for a driver to estimate their migration
> data size.  Now drivers are forced to provide those estimates even if
> they only intend to use PRE_COPY as an early compatibility test?

Well, yes. PRE_COPY is designed to be general, not just to serve for
compatability. Qemu needs data to understand how the internal dirty
accumulation in the device is progressing. So everything has to
provide estimates, and at least for acc this is trivial.

> Obviously it's trivial for the acc driver that doesn't support dirty
> tracking and only has a fixed size migration structure, but it seems to
> contradict your earlier statements. 

mlx5 knows exactly this data size once it completes entering
STOP_COPY, it has a migf->total_size just like acc, so no problem to
generate this ioctl. We just don't have a use case for it and qemu
would never call it, so trying not to add dead things to the kernel.

Are you are talking about the prior discussion about getting this data
before reaching STOP_COPY?

> For instance, can mlx5 implement a PRE_COPY solely for compatibility
> testing or is it blocked by an inability to provide data estimates
> for this ioctl?

I expect it can, it works very similar to acc. It just doesn't match
where we are planning for compatability. mlx5 has a more dynamic
compatability requirement, it needs to be exposed to orchestration not
hidden in pre_copy. acc looks like it is static, so 'have acc' is
enough info for orchestration.

> Now if we propose that this ioctl is useful during the STOP_COPY phase,
> how does a non-PRE_COPY driver opt-in to that beneficial use case?  

Just implement it - userspace will learn if the driver supports it on
the first ioctl = ENOTTY means no support.

> Do we later add a different, optional ioctl for non-PRE_COPY and
> then require userspace to support two different methods of getting
> remaining data estimates for a device in STOP_COPY?

I wouldn't add a new ioctl unless we discover a new requirement when
an implementation is made.

> If our primary goal is to simplify the FSM, I'm actually a little
> surprised we support the PRE_COPY* -> STOP_COPY transition directly
> versus passing through STOP.  

A FSM should not have a hidden 'memory' of its past states.

STOP->STOP_COPY should always do exactly the same thing, regardless of
how the FSM reached STOP

The goal is to make the driver easy to implement, which has been done
by mapping the FSM arcs onto the logical actions a driver needs to
take. 

We can't avoid that the driver does different things on
PRE_COPY->STOP_COPY vs STOP->STOP_COPY, so the input arcs should
reflect these situations directly. Notice the drivers are not deciding
what to do in these arcs based on hidden internal state.

> It seems this exists due to our policy that we can only generate one
> data_fd as a result of any sequence of state transitions, but I
> think there might also be an option to achieve similar if the
> PRE_COPY* states are skipped if they aren't the ultimate end state
> of the arc. 

Are you talking about the other path? The FSM today requires drivers
to implement RUNNING->STOP_COPY - the FSM could also do:

RUNNING->PRE_COPY->STOP_COPY

And just carry forward the data_fd. This would eliminate one arc from
the driver at the cost of always having to turn on internal dirty
tracking/etc.

> I'm sure that raises questions about how we correlate a
> PRE_COPY* session to a STOP_COPY session though, but this PRE_COPY*
> specific but ongoing usage in STOP_COPY ioctl seems ad-hoc.

I do not think it is "pre_copy specific" - the ioctl returns the
estimated length of the data_fd, this is always a valid concept.

Jason

  reply	other threads:[~2022-03-01 13:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28  9:01 [PATCH v6 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 03/10] hisi_acc_qm: Move PCI device IDs " Shameer Kolothum
2022-02-28 17:33   ` Alex Williamson
2022-02-28 20:12     ` Bjorn Helgaas
2022-02-28 20:23       ` Alex Williamson
2022-02-28 20:55         ` Bjorn Helgaas
2022-02-28  9:01 ` [PATCH v6 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
2022-02-28 14:57   ` Jason Gunthorpe
2022-02-28 18:01     ` Shameerali Kolothum Thodi
2022-02-28 18:05       ` Jason Gunthorpe
2022-02-28 20:16         ` Alex Williamson
2022-02-28 20:29           ` Jason Gunthorpe
2022-02-28 21:20             ` Alex Williamson
2022-02-28 23:47               ` Jason Gunthorpe
2022-03-01  4:41                 ` Alex Williamson
2022-03-01 13:15                   ` Jason Gunthorpe [this message]
2022-03-01 19:30                     ` Alex Williamson
2022-03-01 20:39                       ` Jason Gunthorpe
2022-03-01 22:44                         ` Alex Williamson
2022-03-02  0:03                           ` Jason Gunthorpe
2022-03-02  9:07                             ` Shameerali Kolothum Thodi
2022-02-28  9:01 ` [PATCH v6 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220301131528.GW219866@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liulongfang@huawei.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.