linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	Linuxarm <linuxarm@huawei.com>,
	liulongfang <liulongfang@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	yuzenghui <yuzenghui@huawei.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver
Date: Fri, 11 Feb 2022 20:01:17 -0400	[thread overview]
Message-ID: <20220212000117.GS4160@nvidia.com> (raw)
In-Reply-To: <34ac016b-000f-d6d6-acf1-f39ca1ca9c54@oracle.com>

On Fri, Feb 11, 2022 at 09:43:56PM +0000, Joao Martins wrote:
> The plumbing for the hw-accelerated vIOMMU is a little different that
> a regular vIOMMU, at least IIUC host does not take an active part in the
> GVA -> GPA translation. Suravee's preso explains it nicely, if you don't
> have time to fiddle with the SDM:
> 
> https://static.sched.com/hosted_files/kvmforum2021/da/vIOMMU%20KVM%20Forum%202021%20-%20v4.pdf

That looks like the same nesting everyone else is talking about.

I've been refering to this as 'user space page tables' to avoid a lot
of ambiguity becuase what it fundamentally does is allow userspace to
directly manage an IO page table - building the IO PTEs, issue
invalidations, etc.

To be secure the userspace page table is nested under a kernel owned
page table and all the IO PTE offsets/etc are understood to be within
the address space of the kernel owned page table.

When combined with KVM the userspace is just inside a VM.

> 1. Decodes the read and write intent from the memory access.
> 2. If P=0 in the page descriptor, fail the access.
> 3. Compare the A & D bits in the descriptor with the read and write intent in the request.
> 4. If the A or D bits need to be updated in the descriptor:

Ah, so the dirty update is actually atomic on the first write before
any DMA happens - and I suppose all of this happens when the entry is
first loaded into the IOTLB.

So the flush is to allow the IOTLB to see the cleared D bit..

> > split/collapse seems kind of orthogonal to me it doesn't really
> > connect to dirty tracking other than being mostly useful during dirty
> > tracking.
> > 
> > And I wonder how hard split is when trying to atomically preserve any
> > dirty bit..
> > 
> Would would it be hard? The D bit is supposed to be replicated when you
> split to smaller page size.

I guess it depends on how the 'acquire' is done, as the CPU has to
atomically replace a large entry with a pointer to a small entry,
flush the IOTLB then 'acquire' the dirty bit. If the dirty bit is set
in the old entry then it has to sprinkle it into the new entries with
atomics.

> This is just preemptive longterm thinking about the overal problem
> space (probably unnecessary noise at this stage). Particularly
> whenever I need to migrate 1 to 2TB VMs.  Particular that the stage
> *prior* to precopy takes way too long to transfer the whole
> memory. So I was thinking say only transfer the pages that are
> populated[*] in the second-stage page tables (for the CPU) coupled
> with IOMMU tracking from the beginning (prior to vcpus even
> entering). That could probably decrease 1024 1GB Dirtied IOVA
> entries, to maybe only dirty a smaller subset, saving a whole
> bootload of time.

Oh, you want to effectively optimize zero page detection..

> I wonder if we could start progressing the dirty tracking as a first initial series and
> then have the split + collapse handling as a second part? That would be quite
> nice to get me going! :D

I think so, and I think we should. It is such a big problem space, it
needs to get broken up.

Jason

  reply	other threads:[~2022-02-12  0:01 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-02  9:58 [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Shameer Kolothum
2021-07-02  9:58 ` [RFC v2 1/4] hisi-acc-vfio-pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2021-07-02 20:29   ` Alex Williamson
2021-07-05  7:20     ` Shameerali Kolothum Thodi
2021-07-04  7:03   ` Leon Romanovsky
2021-07-05  8:47     ` Shameerali Kolothum Thodi
2021-07-05  9:41       ` Max Gurtovoy
2021-07-05 10:18         ` Shameerali Kolothum Thodi
2021-07-05 18:27           ` Leon Romanovsky
2021-07-05 18:32             ` Jason Gunthorpe
2021-07-06  3:59               ` Leon Romanovsky
2021-07-06  4:39               ` Christoph Hellwig
2021-07-06 11:51                 ` Jason Gunthorpe
2021-07-02  9:58 ` [RFC v2 2/4] hisi_acc_vfio_pci: Override ioctl method to limit BAR2 region size Shameer Kolothum
2021-07-02 20:29   ` Alex Williamson
2021-07-05  7:22     ` Shameerali Kolothum Thodi
2021-07-02  9:58 ` [RFC v2 3/4] crypto: hisilicon/qm - Export mailbox functions for common use Shameer Kolothum
2021-07-04  9:34   ` Max Gurtovoy
2021-07-05 10:23     ` Shameerali Kolothum Thodi
2021-07-02  9:58 ` [RFC v2 4/4] hisi_acc_vfio_pci: Add support for vfio live migration Shameer Kolothum
2022-02-02 13:14 ` [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Jason Gunthorpe
2022-02-02 14:34   ` Shameerali Kolothum Thodi
2022-02-02 15:39     ` Jason Gunthorpe
2022-02-02 16:10       ` Shameerali Kolothum Thodi
2022-02-02 17:03         ` Jason Gunthorpe
2022-02-02 19:05           ` Joao Martins
2022-02-03 15:18             ` Jason Gunthorpe
2022-02-04 19:53               ` Joao Martins
2022-02-04 23:07                 ` Jason Gunthorpe
2022-02-11 17:28                   ` Joao Martins
2022-02-11 17:49                     ` Jason Gunthorpe
2022-02-11 21:43                       ` Joao Martins
2022-02-12  0:01                         ` Jason Gunthorpe [this message]
2022-02-14 13:34                           ` Joao Martins
2022-02-14 14:06                             ` Jason Gunthorpe
2022-02-15 16:00                               ` Joao Martins
2022-02-15 16:21                                 ` Jason Gunthorpe
2022-02-22 11:55                                   ` Joao Martins
2022-02-23  1:03                                     ` Jason Gunthorpe
2022-02-25 19:18                                       ` Joao Martins
2022-02-25 20:44                                         ` Jason Gunthorpe
2022-02-28 13:01                                           ` Joao Martins
2022-02-28 21:01                                             ` Jason Gunthorpe
2022-03-01 13:06                                               ` Joao Martins
2022-03-01 13:54                                                 ` Jason Gunthorpe
2022-03-01 14:27                                                   ` Joao Martins
2022-03-11 13:51                                             ` iommufd(+vfio-compat) dirty tracking (Was: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver) Joao Martins
2022-03-15 19:29                                               ` Jason Gunthorpe
2022-03-16 16:36                                                 ` iommufd(+vfio-compat) dirty tracking Joao Martins
2022-03-16 20:37                                                   ` Joao Martins
2022-03-18 17:12                                                     ` Joao Martins
2022-03-18 17:34                                                       ` Jason Gunthorpe
2022-02-02 17:30     ` [RFC v2 0/4] vfio/hisilicon: add acc live migration driver Alex Williamson
2022-02-02 18:04       ` Jason Gunthorpe
2022-02-18 16:37 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220212000117.GS4160@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liulongfang@huawei.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).