All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	"yishaih@nvidia.com" <yishaih@nvidia.com>,
	Linuxarm <linuxarm@huawei.com>,
	liulongfang <liulongfang@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
Date: Tue, 1 Mar 2022 20:03:29 -0400	[thread overview]
Message-ID: <20220302000329.GZ219866@nvidia.com> (raw)
In-Reply-To: <20220301154431.42b27278.alex.williamson@redhat.com>

On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote:
> On Tue, 1 Mar 2022 16:39:38 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote:
> > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE
> > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE?  ie. QEMU
> > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even
> > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data.  
> > 
> > The purpose of this ioctl is to help userspace guess when moving on to
> > STOP_COPY is a good idea ie when the device has done almost all the
> > work it is going to be able to do in PRE_COPY. ENOMSG is a similar
> > indicator.
> > 
> > I expect all devices to have some additional STOP_COPY trailer_data in
> > addition to their PRE_COPY initial_data and dirty_data
> > 
> > There is a choice to make if we report the trailer_data during
> > PRE_COPY or not. As this is all estimates, it doesn't matter unless
> > the trailer_data is very big.
> > 
> > Having all devices trend toward a 0 dirty_bytes to say they are are
> > done all the pre-copy they can do makes sense from an API
> > perspective. If one device trends toward 10MB due to a big
> > trailer_data and one trends toward 0 bytes, how will qemu consistently
> > decide when best to trigger STOP_COPY? It makes the API less useful.
> >
> > So, I would not include trailer_data in the dirty_bytes.
> 
> That assumes that it's possible to keep up with the device dirty
> rate.

It keeps options open so we have this choice someday.

We already see that implementations are using vCPU throttling as part
of their migration strategy, and we are seriously looking at DMA
throttling. It is not a big leap to imagine that
internal-state-dirtying throttling will happne someday.

With throttling iterations would ratchet up the throttle until they
reach an absolute small amount of dirty then cut over to STOP_COPY

> It seems like a better approach for userspace would be to look at how
> dirty_bytes is trending.  

It may be biw, but this approach doesn't care if the trailing_bytes
are included or not, so lets leave them out and preserve the other
operating model.

> If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY
> ioctl, it seems even more of a disconnect that when we enter the
> STOP_COPY state, suddenly we start getting new data out of a PRECOPY
> ioctl.

Why? That amounts can go up at any time, how does it matter if it goes
up after STOP_COPY or instantly before?

> BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures
> relative to the device FD, appending it with _MIG is too subtle for me.
> This is also a GET operation for INFO, so I'd think for consistency
> with the existing vfio uAPI we'd name this something like
> VFIO_MIG_GET_PRECOPY_INFO where the structure might be named
> vfio_precopy_info.

Sure

> So if we don't think this is the right approach for STOP_COPY, then why
> are we pushing that it has any purpose outside of PRECOPY or might be
> implemented by a non-PRECOPY driver for use in STOP_COPY?

It is just simpler and more consistent to implement the math under
this ioctl in all cases then to try and artificially restrict it.

But I don't have a use case for it, so lets block it if you prefer.

Shameerali will you make these adjustments to the PRE_COPY patch?

Thanks,
Jason

  reply	other threads:[~2022-03-02  0:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28  9:01 [PATCH v6 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 03/10] hisi_acc_qm: Move PCI device IDs " Shameer Kolothum
2022-02-28 17:33   ` Alex Williamson
2022-02-28 20:12     ` Bjorn Helgaas
2022-02-28 20:23       ` Alex Williamson
2022-02-28 20:55         ` Bjorn Helgaas
2022-02-28  9:01 ` [PATCH v6 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
2022-02-28  9:01 ` [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
2022-02-28 14:57   ` Jason Gunthorpe
2022-02-28 18:01     ` Shameerali Kolothum Thodi
2022-02-28 18:05       ` Jason Gunthorpe
2022-02-28 20:16         ` Alex Williamson
2022-02-28 20:29           ` Jason Gunthorpe
2022-02-28 21:20             ` Alex Williamson
2022-02-28 23:47               ` Jason Gunthorpe
2022-03-01  4:41                 ` Alex Williamson
2022-03-01 13:15                   ` Jason Gunthorpe
2022-03-01 19:30                     ` Alex Williamson
2022-03-01 20:39                       ` Jason Gunthorpe
2022-03-01 22:44                         ` Alex Williamson
2022-03-02  0:03                           ` Jason Gunthorpe [this message]
2022-03-02  9:07                             ` Shameerali Kolothum Thodi
2022-02-28  9:01 ` [PATCH v6 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220302000329.GZ219866@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liulongfang@huawei.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.