LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Vikram Sethi <vsethi@nvidia.com>
Cc: Mark Kettenis <mark.kettenis@xs4all.nl>,
	Marc Zyngier <maz@kernel.org>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	"will@kernel.org" <will@kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"christoffer.dall@arm.com" <christoffer.dall@arm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jason Sequeira <jsequeira@nvidia.com>
Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
Date: Tue, 4 May 2021 12:03:48 -0600
Message-ID: <20210504120348.2eec075b@redhat.com> (raw)
In-Reply-To: <BL0PR12MB2532BEAE226E7D68A8A2F97EBD5B9@BL0PR12MB2532.namprd12.prod.outlook.com>

On Mon, 3 May 2021 22:03:59 +0000
Vikram Sethi <vsethi@nvidia.com> wrote:

> Hi Alex,
> > From: Alex Williamson <alex.williamson@redhat.com>
> > On Mon, 3 May 2021 13:59:43 +0000
> > Vikram Sethi <vsethi@nvidia.com> wrote:  
> > > > From: Mark Kettenis <mark.kettenis@xs4all.nl>  
> > > > > From: Marc Zyngier <maz@kernel.org>  
> > >
> > > snip  
> > > > > If, by enumerating the properties of Prefetchable, you can show
> > > > > that they are a strict superset of Normal_NC, I'm on board. I
> > > > > haven't seen such an enumeration so far.
> > > > >  
> > > snip  
> > > > > Right, so we have made a small step in the direction of mapping
> > > > > "prefetchable" onto "Normal_NC", thanks for that. What about all
> > > > > the other properties (unaligned accesses, ordering, gathering)?  
> > > >  
> > > Regarding gathering/write combining, that is also allowed to
> > > prefetchable per PCI spec  
> > 
> > As others have stated, gather/write combining itself is not well defined.
> >   
> > > From 1.3.2.2 of 5/0 base spec:
> > > A PCI Express Endpoint requesting memory resources through a BAR must
> > > set the BAR's Prefetchable bit unless the range contains locations
> > > with read side-effects or locations in which the Function does not tolerate  
> > write merging.
> > 
> > "write merging"  This is a very specific thing, per PCI 3.0, 3.2.6:
> > 
> >   Byte Merging – occurs when a sequence of individual memory writes
> >   (bytes or words) are merged into a single DWORD.
> > 
> > The semantics suggest quadword support in addition to dword, but don't
> > require it.  Writes to bytes within a dword can be merged, but duplicate
> > writes cannot.
> > 
> > It seems like an extremely liberal application to suggest that this one write
> > semantic encompasses full write combining semantics, which itself is not
> > clearly defined.
> >  
> Talking to our PCIe SIG representative, PCIe switches are not allowed do any of the byte
> Merging/combining etc as defined in the PCI spec, and per a rather poorly
> worded Implementation note in the spec says that no known PCIe Host Briddges/Root 
> ports do it either. 
> So for PCIe we don't think believe there is any byte merging that happens in the PCIe
> fabric so it's really a matter of what happens in the CPU core and interconnect
> before it gets to the PCIe hierarchy.

Yes, but merged writes, no matter where they happen, are still the only
type of write combining that a prefetchable BAR on an endpoint is
required to support.

> Stepping back from this patchset, do you agree that it is desirable to support
> Write combining as understood by ioremap_wc to work in all ISA guests including
> ARMv8?

Yes, a userspace vfio driver should be able to take advantage of the
hardware capabilities.  I think where we disagree is whether it's
universally safe to assume write combining based on the PCI
prefetchable capability of a BAR.  If that's something that can be
assumed universally for ARMv8 based on the architecture specification
compatibility with the PCI definition of a prefetchable BAR, then I
would expect a helper somewhere in arch code that returns the right
page protection flags, so that arch maintainers don't need to scour
device drivers for architecture hacks.  Otherwise, it needs to be
exposed through the vfio uAPI to allow the userspace device driver
itself to select these semantics.

> You note that x86 virtualization doesn't have this issue, but KVM-ARM does
> because KVM maps all device BARs as Device Memory type nGnRE which 
> doesn't allow ioremap_wc from within the guest to get the actual semantics desired.
> 
> Marc and others have suggested that userspace should provide the hints. But the
> question is how would qemu vfio do this either? We would be stuck in the same
> arguments as here, as to what is the correct way to determine the desired attributes
> for a given BAR such that eventually when a driver in the guest asks for
> ioremap_wc it actually has a chance of working in the guest, in all ISAs. 
> Do you have any suggestions on how to make progress here?

We do need some way for userspace drivers to also make use of WC
semantics, there were some discussions in the past, I think others have
referenced them as well, but nothing has been proposed for a vfio API.

If we had that API, QEMU deciding to universally enable WC for all
vfio prefetchable BARs seems only marginally better than this approach.
Ultimately the mapping should be based on the guest driver semantics,
and if you don't have any visibility to that on KVM/arm like we have on
KVM/x86, then it seems like there's nothing to trigger a vfio API here
anyway.

If that's the case, I'd probably go back to letting the arch/arm64 folks
declare that WC is compatible with the definition of PCI prefetchable
and export some sort of pgprot_pci_prefetchable() helper where the
default would be to #define it as pgproc_noncached() #ifndef by the
arch.

> A device specific list of which BARs are OK to allow ioremap_wc for seems terrible
> and I'm not sure if a commandline qemu option is any better. Is the user of device 
> assignment/sysadmin supposed to know which BAR of which device is OK to allow 
> ioremap_wc for?

No, a device specific userspace driver should know such device
semantics, but QEMU is not such a driver.  Burdening the hypervisor
user/admin is not a good solution either.  I'd lean on KVM/arm64 folks
to know how the guest driver semantics can be exposed to the
hypervisor.  Thanks,

Alex


  parent reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 16:29 [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Shanker Donthineni
2021-04-29 16:29 ` [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA Shanker Donthineni
2021-04-29 18:28   ` Alex Williamson
2021-04-29 19:14     ` Shanker R Donthineni
2021-04-29 19:46       ` Alex Williamson
2021-04-29 22:08         ` Vikram Sethi
2021-04-30 11:25         ` Shanker R Donthineni
     [not found]           ` <87czucngdc.wl-maz@kernel.org>
2021-04-30 13:07             ` Shanker R Donthineni
2021-04-30 14:58             ` Shanker R Donthineni
     [not found]               ` <878s4zokll.wl-maz@kernel.org>
2021-04-30 16:57                 ` Vikram Sethi
2021-05-01  9:30                   ` Marc Zyngier
2021-05-01 11:36                     ` Shanker R Donthineni
     [not found]                       ` <87czu8uowe.wl-maz@kernel.org>
2021-05-03 12:08                         ` Shanker R Donthineni
2021-05-02 17:56                     ` Vikram Sethi
2021-05-03 10:17                       ` Marc Zyngier
2021-05-03 13:35                         ` Mark Kettenis
2021-05-03 13:59                           ` Vikram Sethi
2021-05-03 14:44                             ` Alex Williamson
2021-05-03 22:03                               ` Vikram Sethi
2021-05-04  8:30                                 ` Will Deacon
2021-05-05 18:02                                   ` Catalin Marinas
2021-05-06  7:22                                     ` Christoph Hellwig
2021-05-08 16:33                                     ` Shanker R Donthineni
2021-06-02  9:37                                       ` Marc Zyngier
2021-05-04 18:03                                 ` Alex Williamson [this message]
2021-06-02  9:11                                   ` Marc Zyngier
2021-04-30  9:54   ` Lorenzo Pieralisi
2021-04-30 12:38     ` Jason Gunthorpe
2021-04-29 16:29 ` [RFC 2/2] KVM: arm64: Add write-combine support for stage-2 entries Shanker Donthineni
2021-05-03  7:01 ` [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210504120348.2eec075b@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=jsequeira@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.kettenis@xs4all.nl \
    --cc=maz@kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git