Re: [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible

From: Jason Gunthorpe <jgg@nvidia.com>
To: Marc Zyngier <maz@kernel.org>
Cc: ankita@nvidia.com, alex.williamson@redhat.com,
	naoya.horiguchi@nec.com, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible
Date: Thu, 13 Apr 2023 10:19:12 -0300	[thread overview]
Message-ID: <ZDgBULRSNXgs7Bmo@nvidia.com> (raw)
In-Reply-To: <86ile0kt2t.wl-maz@kernel.org>

On Thu, Apr 13, 2023 at 10:52:10AM +0100, Marc Zyngier wrote:

> > IMHO, from the mm perspective, the bug is using pfn_is_map_memory() to
> > determine the cachability or device memory status of a PFN in a
> > VMA. That is not what that API is for.
> 
> It is the right API for what KVM/arm64 has been designed for. RAM gets
> a normal memory mapping, and everything else gets device. 

The MM has a pretty flexible definition of "RAM" these days. For
instance, I don't think pfn_is_map_memory() works correctly for all
the cases we can do now with devm_memremap_pages().

> That may not suit your *new* use case, but that doesn't make it
> broken.

I've now spent alot of time working on improving VFIO and the related
ecosystem. I would to get to a point where we have a consistent VFIO
experience on all the platforms.

Currently, real NIC and GPU HW with wide VFIO deployments on x86 do
not work fully correctly on KVM/arm64. write-combining in the VM is
the big problem for existing HW, and this new CXL-like stuff has
problems with cachability.

I don't really care what we call it, as long as we can agree that VFIO
devices not working fully in VMs is a problem that should be fixed.

> Only if you insist on not losing coherency between the two aliases
> used at the same time (something that would seem pretty improbable).

This is VFIO so there is DMA involved. My understanding has been that
the SMMU is allowed to pull data out of the cache. So if the
hypervisor cachable side has pulled a line into cache and the VM
uncached side dirtied the physical memory, it is allowed that SMMU
will read stale cache data? Thus the VM will experience data
corruption on its DMAs.

With VFIO live migration I expect the hypervisor qemu side to be
actively reading from the cachable memory while the VM is running to
migrate it, so it does not seem improbable.

Jason