linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Marc Zyngier <maz@kernel.org>
Cc: ankita@nvidia.com, alex.williamson@redhat.com,
	naoya.horiguchi@nec.com, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible
Date: Thu, 13 Apr 2023 10:19:12 -0300	[thread overview]
Message-ID: <ZDgBULRSNXgs7Bmo@nvidia.com> (raw)
In-Reply-To: <86ile0kt2t.wl-maz@kernel.org>

On Thu, Apr 13, 2023 at 10:52:10AM +0100, Marc Zyngier wrote:

> > IMHO, from the mm perspective, the bug is using pfn_is_map_memory() to
> > determine the cachability or device memory status of a PFN in a
> > VMA. That is not what that API is for.
> 
> It is the right API for what KVM/arm64 has been designed for. RAM gets
> a normal memory mapping, and everything else gets device. 

The MM has a pretty flexible definition of "RAM" these days. For
instance, I don't think pfn_is_map_memory() works correctly for all
the cases we can do now with devm_memremap_pages().

> That may not suit your *new* use case, but that doesn't make it
> broken.

I've now spent alot of time working on improving VFIO and the related
ecosystem. I would to get to a point where we have a consistent VFIO
experience on all the platforms.

Currently, real NIC and GPU HW with wide VFIO deployments on x86 do
not work fully correctly on KVM/arm64. write-combining in the VM is
the big problem for existing HW, and this new CXL-like stuff has
problems with cachability.

I don't really care what we call it, as long as we can agree that VFIO
devices not working fully in VMs is a problem that should be fixed.

> Only if you insist on not losing coherency between the two aliases
> used at the same time (something that would seem pretty improbable).

This is VFIO so there is DMA involved. My understanding has been that
the SMMU is allowed to pull data out of the cache. So if the
hypervisor cachable side has pulled a line into cache and the VM
uncached side dirtied the physical memory, it is allowed that SMMU
will read stale cache data? Thus the VM will experience data
corruption on its DMAs.

With VFIO live migration I expect the hypervisor qemu side to be
actively reading from the cachable memory while the VM is running to
migrate it, so it does not seem improbable.

Jason

      reply	other threads:[~2023-04-13 13:19 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 18:01 [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible ankita
2023-04-05 18:01 ` [PATCH v3 1/6] kvm: determine memory type from VMA ankita
2023-04-12 12:43   ` Marc Zyngier
2023-04-12 13:01     ` Jason Gunthorpe
2023-05-31 11:35       ` Catalin Marinas
2023-06-14 12:44         ` Jason Gunthorpe
2023-07-14  8:10         ` Benjamin Herrenschmidt
2023-07-16 15:09           ` Catalin Marinas
2023-07-16 22:30             ` Jason Gunthorpe
2023-07-17 18:35               ` Alex Williamson
2023-07-25  6:18                 ` Benjamin Herrenschmidt
2023-04-05 18:01 ` [PATCH v3 2/6] vfio/nvgpu: expose GPU device memory as BAR1 ankita
2023-04-05 21:07   ` kernel test robot
2023-04-05 18:01 ` [PATCH v3 3/6] mm: handle poisoning of pfn without struct pages ankita
2023-04-05 21:07   ` kernel test robot
2023-05-09  9:51   ` HORIGUCHI NAOYA(堀口 直也)
2023-05-15 11:18     ` Ankit Agrawal
2023-05-23  5:43       ` HORIGUCHI NAOYA(堀口 直也)
2023-04-05 18:01 ` [PATCH v3 4/6] mm: Add poison error check in fixup_user_fault() for mapped PFN ankita
2023-04-05 18:01 ` [PATCH v3 5/6] mm: Change ghes code to allow poison of non-struct PFN ankita
2023-04-05 18:01 ` [PATCH v3 6/6] vfio/nvgpu: register device memory for poison handling ankita
2023-04-05 20:24   ` Zhi Wang
2023-04-05 21:50   ` kernel test robot
2023-05-24  9:53   ` Dan Carpenter
2023-04-06 12:07 ` [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible David Hildenbrand
2023-04-12  8:43   ` Ankit Agrawal
2023-04-12  9:48     ` Marc Zyngier
2023-04-12 12:28 ` Marc Zyngier
2023-04-12 12:53   ` Jason Gunthorpe
2023-04-13  9:52     ` Marc Zyngier
2023-04-13 13:19       ` Jason Gunthorpe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDgBULRSNXgs7Bmo@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=oliver.upton@linux.dev \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).