All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	Linuxarm <linuxarm@huawei.com>,
	liulongfang <liulongfang@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	yuzenghui <yuzenghui@huawei.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver
Date: Mon, 28 Feb 2022 17:01:32 -0400	[thread overview]
Message-ID: <20220228210132.GT219866@nvidia.com> (raw)
In-Reply-To: <30066724-b100-a14e-e3d8-092645298d8a@oracle.com>

On Mon, Feb 28, 2022 at 01:01:39PM +0000, Joao Martins wrote:
> On 2/25/22 20:44, Jason Gunthorpe wrote:
> > On Fri, Feb 25, 2022 at 07:18:37PM +0000, Joao Martins wrote:
> >> On 2/23/22 01:03, Jason Gunthorpe wrote:
> >>> On Tue, Feb 22, 2022 at 11:55:55AM +0000, Joao Martins wrote:
> >>>>>> If by conclusion you mean the whole thing to be merged, how can the work be
> >>>>>> broken up to pieces if we busy-waiting on the new subsystem? Or maybe you meant
> >>>>>> in terms of direction...
> >>>>>
> >>>>> I think go ahead and build it on top of iommufd, start working out the
> >>>>> API details, etc. I think once the direction is concluded the new APIs
> >>>>> will go forward.
> >>>>>
> >>>> /me nods, will do. Looking at your repository it is looking good.
> >>>
> >>> I would like to come with some plan for dirty tracking on iommufd and
> >>> combine that with a plan for dirty tracking inside the new migration
> >>> drivers.
> >>>
> >> I had a few things going on my end over the past weeks, albeit it is
> >> getting a bit better now and I will be coming back to this topic. I hope/want
> >> to give you a more concrete update/feedback over the coming week or two wrt
> >> to dirty-tracking+iommufd+amd.
> >>
> >> So far, I am not particularly concerned that this will affect overall iommufd
> >> design. The main thing is really lookups to get vendor iopte, upon on what might
> >> be a iommu_sync_dirty_bitmap(domain, iova, size) API. For toggling
> >> the tracking,
> > 
> > I'm not very keen on these multiplexer interfaces. I think you should
> > just add a new ops to the new iommu_domain_ops 'set_dirty_tracking'
> > 'read_dirty_bits'
> > 
> > NULL op means not supported.
> > 
> > IMHO we don't need a kapi wrapper if only iommufd is going to call the
> > op.
> > 
> 
> OK, good point.
> 
> Even without a kapi wrapper I am still wondering whether the iommu op needs to
> be something like a generic iommu feature toggling (e.g. .set_feature()), rather
> than one that sits "hardcoded" as set_dirty(). Unless dirty right now is about
> the only feature we will change out-of-band in the protection-domain.
> I guess I can stay with set_ad_tracking/set_dirty_tracking and if should
> need arise we will expand with a generic .set_feature(dom, IOMMU_DIRTY | IOMMU_ACCESS).

I just generally dislike multiplexers like this. We are already
calling through a function pointer struct, why should the driver
implement another switch/case just to find out which function pointer
the caller really ment to use? It doesn't make things faster, it
doesn't make things smaller, it doesn't use less LOC. Why do it?

> Regarding the dirty 'data' that's one that I am wondering about. I called it 'sync'
> because the sync() doesn't only read, but also "writes" to root pagetable to update
> the dirty bit (and then IOTLB flush). And that's about what VFIO current interface
> does (i.e. there's only a GET_BITMAP in the ioctl) and no explicit interface to clear.

'read and clear' is what I'd suggest

> And TBH, the question on whether we need a clear op isn't immediately obvious: reading
> the access/dirty bit is cheap for the IOMMU, the problem OTOH is the expensive
> io page table walk thus expensive in sw. The clear-dirty part, though, is precise on what
> it wants to clear (in principle cheaper on io-page-table walk as you just iterate over
> sets of bits to clear) but then it incurs a DMA perf hit given that we need to flush the
> IOTLBs. Given the IOTLB flush is batched (over a course of a dirty updates) perhaps this
> isn't immediately clear that is a problem in terms of total overall ioctl cost. Hence my
> thinking in merging both in one sync_dirty_bitmap() as opposed to more KVM-style of
> get_dirty_bitmap() and clear_dirty_ditmap().

Yes, I wouldn't split them.

> > Questions I have:
> >  - Do we need ranges for some reason? You mentioned ARM SMMU wants
> >    ranges? how/what/why?
> > 
> Ignore that. I got mislead by the implementation and when I read the SDM
> I realized that the implementation was doing the same thing I was
> doing

Ok

> >  - What about the unmap and read dirty without races operation that
> >    vfio has?
> > 
> I am afraid that might need a new unmap iommu op that reads the dirty bit
> after clearing the page table entry. It's marshalling the bits from
> iopte into a bitmap as opposed to some logic added on top. As far as I
> looked for AMD this isn't difficult to add, (same for Intel) it can
> *I think* reuse all of the unmap code.

Ok. It feels necessary to be complete

> > Yes, this is a point that needs some answering. One option is to pass
> > in the tracking range list from userspace. Another is to query it in
> > the driver from the currently mapped areas in IOAS.
> > 
> I sort of like the second given that we de-duplicate info that is already
> tracked by IOAS -- it would be mostly internal and then it would be a
> matter of when does this device tracker is set up, and whether we should
> differentiate tracker "start"/"stop" vs "setup"/"teardown".

One problem with this is that devices that don't support dynamic
tracking are stuck in vIOMMU cases where the IOAS will have some tiny
set of all memory mapped. 

So I'd probably say qemu should forward the entire guest CPU memory
space, as well as any future memory hotplug area, as ranges and not
rely on the IOAS already mapping anything.

> > I know devices have limitations here in terms of how many/how big the
> > ranges can be, and devices probably can't track dynamic changes.
> > 
> /me nods
> 
> Should this be some sort of capability perhaps? Given that this may limit
> how many concurrent VFs can be migrated and how much ranges it can store,
> for example.
> 
> (interestingly and speaking of VF capabilities, it's yet another thing to
> tackle in the migration stream between src and dst hosts)

I don't know what to do with these limitations right now..

Jason

  reply	other threads:[~2022-02-28 21:01 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-02  9:58 [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Shameer Kolothum
2021-07-02  9:58 ` [RFC v2 1/4] hisi-acc-vfio-pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2021-07-02 20:29   ` Alex Williamson
2021-07-05  7:20     ` Shameerali Kolothum Thodi
2021-07-04  7:03   ` Leon Romanovsky
2021-07-05  8:47     ` Shameerali Kolothum Thodi
2021-07-05  9:41       ` Max Gurtovoy
2021-07-05 10:18         ` Shameerali Kolothum Thodi
2021-07-05 18:27           ` Leon Romanovsky
2021-07-05 18:32             ` Jason Gunthorpe
2021-07-06  3:59               ` Leon Romanovsky
2021-07-06  4:39               ` Christoph Hellwig
2021-07-06 11:51                 ` Jason Gunthorpe
2021-07-02  9:58 ` [RFC v2 2/4] hisi_acc_vfio_pci: Override ioctl method to limit BAR2 region size Shameer Kolothum
2021-07-02 20:29   ` Alex Williamson
2021-07-05  7:22     ` Shameerali Kolothum Thodi
2021-07-02  9:58 ` [RFC v2 3/4] crypto: hisilicon/qm - Export mailbox functions for common use Shameer Kolothum
2021-07-04  9:34   ` Max Gurtovoy
2021-07-05 10:23     ` Shameerali Kolothum Thodi
2021-07-02  9:58 ` [RFC v2 4/4] hisi_acc_vfio_pci: Add support for vfio live migration Shameer Kolothum
2022-02-02 13:14 ` [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Jason Gunthorpe
2022-02-02 14:34   ` Shameerali Kolothum Thodi
2022-02-02 15:39     ` Jason Gunthorpe
2022-02-02 16:10       ` Shameerali Kolothum Thodi
2022-02-02 17:03         ` Jason Gunthorpe
2022-02-02 19:05           ` Joao Martins
2022-02-03 15:18             ` Jason Gunthorpe
2022-02-04 19:53               ` Joao Martins
2022-02-04 23:07                 ` Jason Gunthorpe
2022-02-11 17:28                   ` Joao Martins
2022-02-11 17:49                     ` Jason Gunthorpe
2022-02-11 21:43                       ` Joao Martins
2022-02-12  0:01                         ` Jason Gunthorpe
2022-02-14 13:34                           ` Joao Martins
2022-02-14 14:06                             ` Jason Gunthorpe
2022-02-15 16:00                               ` Joao Martins
2022-02-15 16:21                                 ` Jason Gunthorpe
2022-02-22 11:55                                   ` Joao Martins
2022-02-23  1:03                                     ` Jason Gunthorpe
2022-02-25 19:18                                       ` Joao Martins
2022-02-25 20:44                                         ` Jason Gunthorpe
2022-02-28 13:01                                           ` Joao Martins
2022-02-28 21:01                                             ` Jason Gunthorpe [this message]
2022-03-01 13:06                                               ` Joao Martins
2022-03-01 13:54                                                 ` Jason Gunthorpe
2022-03-01 14:27                                                   ` Joao Martins
2022-03-11 13:51                                             ` iommufd(+vfio-compat) dirty tracking (Was: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver) Joao Martins
2022-03-15 19:29                                               ` Jason Gunthorpe
2022-03-16 16:36                                                 ` iommufd(+vfio-compat) dirty tracking Joao Martins
2022-03-16 20:37                                                   ` Joao Martins
2022-03-18 17:12                                                     ` Joao Martins
2022-03-18 17:34                                                       ` Jason Gunthorpe
2022-02-02 17:30     ` [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Alex Williamson
2022-02-02 18:04       ` Jason Gunthorpe
2022-02-18 16:37 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220228210132.GT219866@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liulongfang@huawei.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.