LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Vikram Sethi <vsethi@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jason Sequeira <jsequeira@nvidia.com>
Subject: RE: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
Date: Thu, 29 Apr 2021 22:08:26 +0000
Message-ID: <BL0PR12MB2532C10511989BE9EAD2A032BD5F9@BL0PR12MB2532.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20210429134659.321a5c3c@redhat.com>

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region
> in VMA
> On Thu, 29 Apr 2021 14:14:50 -0500
> Shanker R Donthineni <sdonthineni@nvidia.com> wrote:
> 
> > Thanks Alex for quick reply.
> >
> > On 4/29/21 1:28 PM, Alex Williamson wrote:
> > > If this were a valid thing to do, it should be done for all
> > > architectures, not just ARM64.  However, a prefetchable range only
> > > necessarily allows merged writes, which seems like a subset of the
> > > semantics implied by a WC attribute, therefore this doesn't seem
> > > universally valid.
> > >
I didn't get your exact concern. If we removed the check for ARM arch
and simply stored that this is a prefetchable region in VMA, then each arch KVM
port could decide which PTE mappings are OK for prefetchable BAR.
KVM doesn't want to go through PCIe enumeration, and would rather
have the properties stored in VMA.
Beyond that, on arm64 specifically there is no WC Memtype, but we use
Normal Non Cacheable mapping for ioremap_wc which can be prefetched 
and can be write combined. What semantics break for a device if
its prefetchable BAR is marked as Normal Noncacheable on arm64?

We need a way for write combining to work in a KVM-ARM guest, as it is
an important usecase for GPUs and NICs and also NVMe CMB IIRC. So
*some* way is needed of letting KVM know to map as write combine 
(Normal NC) at stage2. Do you have a better solution in mind? 

> > > I'm also a bit confused by your problem statement that indicates
> > > that without WC you're seeing unaligned accesses, does this suggest
> > > that your driver is actually relying on WC semantics to perform
> > > merging to achieve alignment?  That seems rather like a driver bug,
> > > I'd expect UC vs WC is largely a difference in performance, not a
> > > means to enforce proper driver access patterns.  Per the PCI spec,
> > > the bridge itself can merge writes to prefetchable areas, presumably
> > > regardless of this processor attribute, perhaps that's the feature
> > > your driver is relying on that might be missing here.  Thanks,
> > The driver uses WC semantics, It's mapping PCI prefetchable BARS using
> > ioremap_wc().  We don't see any issue for x86 architecture, driver
> > works fine in the host and guest kernel. The same driver works on
> > ARM64 kernel but crashes inside VM. GPU driver uses the architecture
> > agnostic function ioremap_wc() like other drivers. This limitation
> > applies to all the drivers if they use WC memory and follow ARM64
> > NORMAL-NC access rules.
> 
> x86 KVM works for other reasons, KVM will trust the vCPU attributes for the
> memory range rather than relying only on the host mapping.
> 
> > On ARM64, ioremap_wc() is mapped to non-cacheable memory-type, no
> side
> > effects on reads and unaligned accesses are allowed as per ARM-ARM
> > architecture. The driver behavior is different in host vs guest on
> > ARM64.
> 
> Per the PCI spec, prefetchable memory only necessarily allows the bridge to
> merge writes.  I believe this is only a subset of what WC mappings allow,
> therefore I expect this is incompatible with drivers that do not use WC
> mappings.
> 
> > ARM CPU generating alignment faults before transaction reaches the
> > PCI-RC/switch/end-point-device.
> 
> If an alignment fault is fixed by configuring a WC mapping, doesn't that
> suggest that the driver performed an unaligned access itself and is relying on
> write combining by the processor to correct that error?
> That's wrong.  Fix the driver or please offer another explanation of how the
> WC mapping resolves this.  I suspect you could enable tracing in QEMU,
> disable MMIO mmaps on the vfio-pci device and find the invalid access.
> 
> > We've two concerns here:
> >    - Performance impacts for pass-through devices.
> >    - The definition of ioremap_wc() function doesn't match the host
> > kernel on ARM64
> 
> Performance I can understand, but I think you're also using it to mask a driver
> bug which should be resolved first.  Thanks,
> 
> Alex


  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 16:29 [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Shanker Donthineni
2021-04-29 16:29 ` [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA Shanker Donthineni
2021-04-29 18:28   ` Alex Williamson
2021-04-29 19:14     ` Shanker R Donthineni
2021-04-29 19:46       ` Alex Williamson
2021-04-29 22:08         ` Vikram Sethi [this message]
2021-04-30 11:25         ` Shanker R Donthineni
     [not found]           ` <87czucngdc.wl-maz@kernel.org>
2021-04-30 13:07             ` Shanker R Donthineni
2021-04-30 14:58             ` Shanker R Donthineni
     [not found]               ` <878s4zokll.wl-maz@kernel.org>
2021-04-30 16:57                 ` Vikram Sethi
2021-05-01  9:30                   ` Marc Zyngier
2021-05-01 11:36                     ` Shanker R Donthineni
     [not found]                       ` <87czu8uowe.wl-maz@kernel.org>
2021-05-03 12:08                         ` Shanker R Donthineni
2021-05-02 17:56                     ` Vikram Sethi
2021-05-03 10:17                       ` Marc Zyngier
2021-05-03 13:35                         ` Mark Kettenis
2021-05-03 13:59                           ` Vikram Sethi
2021-05-03 14:44                             ` Alex Williamson
2021-05-03 22:03                               ` Vikram Sethi
2021-05-04  8:30                                 ` Will Deacon
2021-05-05 18:02                                   ` Catalin Marinas
2021-05-06  7:22                                     ` Christoph Hellwig
2021-05-08 16:33                                     ` Shanker R Donthineni
2021-06-02  9:37                                       ` Marc Zyngier
2021-05-04 18:03                                 ` Alex Williamson
2021-06-02  9:11                                   ` Marc Zyngier
2021-04-30  9:54   ` Lorenzo Pieralisi
2021-04-30 12:38     ` Jason Gunthorpe
2021-04-29 16:29 ` [RFC 2/2] KVM: arm64: Add write-combine support for stage-2 entries Shanker Donthineni
2021-05-03  7:01 ` [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR12MB2532C10511989BE9EAD2A032BD5F9@BL0PR12MB2532.namprd12.prod.outlook.com \
    --to=vsethi@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=jsequeira@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git