linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>,
	ankita@nvidia.com, alex.williamson@redhat.com,
	naoya.horiguchi@nec.com, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	Clint Sbisa <csbisa@amazon.com>,
	osamaabb@amazon.com
Subject: Re: [PATCH v3 1/6] kvm: determine memory type from VMA
Date: Fri, 14 Jul 2023 18:10:39 +1000	[thread overview]
Message-ID: <67a7374a72053107661ecc2b2f36fdb3ff6cc6ae.camel@kernel.crashing.org> (raw)
In-Reply-To: <ZHcxHbCb439I1Uk2@arm.com>

On Wed, 2023-05-31 at 12:35 +0100, Catalin Marinas wrote:
> There were several off-list discussions, I'm trying to summarise my
> understanding here. This series aims to relax the VFIO mapping to
> cacheable and have KVM map it into the guest with the same attributes.
> Somewhat related past threads also tried to relax the KVM device
> pass-through mapping from Device_nGnRnE (pgprot_noncached) to Normal_NC
> (pgprot_writecombine). Those were initially using the PCIe prefetchable
> BAR attribute but that's not a reliable means to infer whether Normal vs
> Device is safe. Anyway, I think we'd need to unify these threads and
> come up with some common handling that can cater for various attributes
> required by devices/drivers. Therefore replying in this thread.

So picking up on this as I was just trying to start a separate
discussion
on the subject for write combine :-)

In this case, not so much for KVM as much as for VFIO to userspace
though.

The rough idea is that the "userspace driver" (ie DPDK or equivalent)
for the device is the one to "know" wether a BAR or portion of a BAR
can/should be mapped write-combine, and is expected to also "know"
what to do to enforce ordering when necessary.

So the userspace component needs to be responsible for selecting the
mapping, the same way using the PCI sysfs resource files today allows
to do that by selecting the _wc variant.

I posted a separate message that Lorenzo CCed back to some of you, but
let's recap here to keep the discussion localized.

I don't know how much of this makes sense for KVM, but I think what we
really want is for userspace to be able to specify some "attributes"
(which we can initially limit to writecombine, full cachability
probably requires a device specific kernel driver providing adequate
authority, separate discussion in any case), for all or a portion of a
BAR mapping.

The easy way is an ioctl to affect the attributes of the next mmap but
it's a rather gross interface.

A better approach (still requires some coordination but not nearly as
bad) would be to have an ioctl to create "subregions", ie, dynamically
add new "struct vfio_pci_region" (using the existing dynamic index
API), which are children of existing regions (including real BARs) and
provide different attributes, which mmap can then honor.

This is particularly suited for the case (which used to exist, I don't
know if it still does) where the buffer that wants write combining
reside in the same BAR as registers that otherwise don't.

A simpler compromise if that latter case is deemed irrelevant would be
an ioctl to selectively set a region index (including BARs) to be WC
prior to mmap.

I don't know if that fits in the ideas you have for KVM, I think it
could by having the userspace component require mappings using a
"special" attribute which we could define as being the most relaxed
allowed to pass to a VM, which can then be architecture defined. The
guest can then enforce specifics. Does this make sense ?

Cheers
Ben.


  parent reply	other threads:[~2023-07-14  8:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 18:01 [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible ankita
2023-04-05 18:01 ` [PATCH v3 1/6] kvm: determine memory type from VMA ankita
2023-04-12 12:43   ` Marc Zyngier
2023-04-12 13:01     ` Jason Gunthorpe
2023-05-31 11:35       ` Catalin Marinas
2023-06-14 12:44         ` Jason Gunthorpe
2023-07-14  8:10         ` Benjamin Herrenschmidt [this message]
2023-07-16 15:09           ` Catalin Marinas
2023-07-16 22:30             ` Jason Gunthorpe
2023-07-17 18:35               ` Alex Williamson
2023-07-25  6:18                 ` Benjamin Herrenschmidt
2023-04-05 18:01 ` [PATCH v3 2/6] vfio/nvgpu: expose GPU device memory as BAR1 ankita
2023-04-05 21:07   ` kernel test robot
2023-04-05 18:01 ` [PATCH v3 3/6] mm: handle poisoning of pfn without struct pages ankita
2023-04-05 21:07   ` kernel test robot
2023-05-09  9:51   ` HORIGUCHI NAOYA(堀口 直也)
2023-05-15 11:18     ` Ankit Agrawal
2023-05-23  5:43       ` HORIGUCHI NAOYA(堀口 直也)
2023-04-05 18:01 ` [PATCH v3 4/6] mm: Add poison error check in fixup_user_fault() for mapped PFN ankita
2023-04-05 18:01 ` [PATCH v3 5/6] mm: Change ghes code to allow poison of non-struct PFN ankita
2023-04-05 18:01 ` [PATCH v3 6/6] vfio/nvgpu: register device memory for poison handling ankita
2023-04-05 20:24   ` Zhi Wang
2023-04-05 21:50   ` kernel test robot
2023-05-24  9:53   ` Dan Carpenter
2023-04-06 12:07 ` [PATCH v3 0/6] Expose GPU memory as coherently CPU accessible David Hildenbrand
2023-04-12  8:43   ` Ankit Agrawal
2023-04-12  9:48     ` Marc Zyngier
2023-04-12 12:28 ` Marc Zyngier
2023-04-12 12:53   ` Jason Gunthorpe
2023-04-13  9:52     ` Marc Zyngier
2023-04-13 13:19       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67a7374a72053107661ecc2b2f36fdb3ff6cc6ae.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=acurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=cjia@nvidia.com \
    --cc=csbisa@amazon.com \
    --cc=danw@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lpieralisi@kernel.org \
    --cc=maz@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=oliver.upton@linux.dev \
    --cc=osamaabb@amazon.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).