LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Mark Kettenis <mark.kettenis@xs4all.nl>
To: Marc Zyngier <maz@kernel.org>
Cc: vsethi@nvidia.com, sdonthineni@nvidia.com,
	alex.williamson@redhat.com, will@kernel.org,
	catalin.marinas@arm.com, christoffer.dall@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, jsequeira@nvidia.com
Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
Date: Mon, 3 May 2021 15:35:25 +0200 (CEST)
Message-ID: <c1bd514a531988c9@bloch.sibelius.xs4all.nl> (raw)
In-Reply-To: <87bl9sunnw.wl-maz@kernel.org> (message from Marc Zyngier on Mon, 03 May 2021 11:17:23 +0100)

> Date: Mon, 03 May 2021 11:17:23 +0100
> From: Marc Zyngier <maz@kernel.org>
> 
> Hi Vikram,
> 
> On Sun, 02 May 2021 18:56:31 +0100,
> Vikram Sethi <vsethi@nvidia.com> wrote:
> > 
> > Hi Marc, 
> > 
> > > From: Marc Zyngier <maz@kernel.org>
> > > Hi Vikram,
> > > 
> >  
> > > The problem I see is that we have VM and userspace being written in terms
> > > of Write-Combine, which is:
> > > 
> > > - loosely defined even on x86
> > > 
> > > - subject to interpretations in the way it maps to PCI
> > > 
> > > - has no direct equivalent in the ARMv8 collection of memory
> > >   attributes (and Normal_NC comes with speculation capabilities which
> > >   strikes me as extremely undesirable on arbitrary devices)
> > 
> > If speculation with Normal NC to prefetchable BARs in devices was a
> > problem, those devices would already be broken in baremetal with
> > ioremap_wc on arm64, and we would need quirks there to not do Normal
> > NC for them but Device GRE, and if such a quirk was needed on
> > baremetal, it could be picked up by vfio/KVM as well. But we haven't
> > seen any broken devices doing wc on baremetal on ARM64, have we?

I think the SC2A11 SoC used in the Socionext developerbox counts as
"broken":

https://www.96boards.org/documentation/enterprise/developerbox/support/known-issues.html

I'm not sure my understanding of the issue is 100% correct, but I
believe the firmware workaround described there uses the stage 2
translation tables to map "Normal NC" onto "Device nGRE" or something
even more restricted.  Now this hardware may be classified as simply
broken.  However...

On hardware based on the NXP LX2160A SoC we're seeing some weird
behaviour when using "Normal NC" mappings with an AMD GPU that
disappear by using "Device nGnRnE" mappings on OpenBSD.  No such issue
was observed with hardware based on an Ampere eMAG SoC.  I don't fully
understand this issue yet, and it may very well be a bug in OpenBSD
code, but it does show there are potential pitfalls with using "Normal
NC" for mapping prefetchable BARs of PCIe devices.

> The lack of evidence does not equate to a proof, and your devices not
> misbehaving doesn't mean it is the right thing, specially when we have
> such a wide range of CPU and interconnect implementation. Which is why
> I really want an answer at the architecture level. Not a "it works for
> me" type of answer.
> 
> Furthermore, as I replied to Shanker in a separate email, what
> Linux/arm64 does is pretty much irrelevant. KVM/arm64 implements the
> ARMv8 architecture, and it is at that level that we need to solve the
> problem.
> 
> If, by enumerating the properties of Prefetchable, you can show that
> they are a strict superset of Normal_NC, I'm on board. I haven't seen
> such an enumeration so far.
> 
> > I know we have tested NICs write combining on arm64 in baremetal, as
> > well as GPU and NVMe CMB without issues.
> > 
> > Further, I don't see why speculation to non cacheble would be an
> > issue if prefetch without side effects is allowed by the device,
> > which is what a prefetchable BAR is.
> > If it is an issue for a device I would consider that a bug already needing a quirk in
> > Baremetal/host kernel already. 
> > From PCI spec " A prefetchable address range may have write side effects, 
> > but it may not have read side effects."
> 
> Right, so we have made a small step in the direction of mapping
> "prefetchable" onto "Normal_NC", thanks for that. What about all the
> other properties (unaligned accesses, ordering, gathering)?

On x86 WC:

1. Is not cached (but stores are buffered).

2. Allows unaligned access just like normal memory.

3. Allows speculative reads.

4. Has weaker ordering than normal memory; [lsm]fence instructions are
   needed to guarantee a particular ordering of writes with respect to
   other writes and reads.

5. Stores are buffered.  This buffer isn't snooped so it has to be
   flushed before changes are globally visible.  The [sm]fence
   instructions flush the store buffer.

6. The store buffer may combine multiple writes into a single write.

Now whether the fact the unaligned access is allowed is really part of
the semantics of WC mappings is debatable as x86 always allows
unaligned access, even for areas mapped with ioremap().

However, this is where userland comes in.  The userland graphics stack
does assume that graphics memory mapped throug a prefetchable PCIe BAR
allows unaligned access if the architecture allows unaligned access
for cacheable memory.  On arm64 this means that such memory needs to
be "Normal NC".  And since kernel drivers tend to map such memory
using ioremap_wc() that pretty much implies ioremap_wc() shoul use
"Normal NC" as well isn't it?

> > > How do we translate this into something consistent? I'd like to
> > > see an actual description of what we *really* expect from WC on
> > > prefetchable PCI regions, turn that into a documented definition
> > > agreed across architectures, and then we can look at
> > > implementing it with one memory type or another on arm64.
> > > 
> > > Because once we expose that memory type at S2 for KVM guests, it
> > > becomes ABI and there is no turning back. So I want to get it
> > > right once and for all.
> > > 
> > I agree that we need a precise definition for the Linux ioremap_wc
> > API wrt what drivers (kernel and userspace) can expect and whether
> > memset/memcpy is expected to work or not and whether aligned
> > accesses are a requirement.
> > To the extent ABI is set, I would think that the ABI is also already
> > set in the host kernel for arm64 WC = Normal NC, so why should that
> > not also be the ABI for same driver in VMs.
> 
> KVM is an implementation of the ARM architecture, and doesn't really
> care about what WC is. If we come to the conclusion that Normal_NC is
> the natural match for Prefetchable attributes, than we're good and we
> can have Normal_NC being set by userspace, or even VFIO. But I don't
> want to set it only because "it works when bare-metal Linux uses it".
> Remember KVM doesn't only run Linux as guests.
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 16:29 [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Shanker Donthineni
2021-04-29 16:29 ` [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA Shanker Donthineni
2021-04-29 18:28   ` Alex Williamson
2021-04-29 19:14     ` Shanker R Donthineni
2021-04-29 19:46       ` Alex Williamson
2021-04-29 22:08         ` Vikram Sethi
2021-04-30 11:25         ` Shanker R Donthineni
     [not found]           ` <87czucngdc.wl-maz@kernel.org>
2021-04-30 13:07             ` Shanker R Donthineni
2021-04-30 14:58             ` Shanker R Donthineni
     [not found]               ` <878s4zokll.wl-maz@kernel.org>
2021-04-30 16:57                 ` Vikram Sethi
2021-05-01  9:30                   ` Marc Zyngier
2021-05-01 11:36                     ` Shanker R Donthineni
     [not found]                       ` <87czu8uowe.wl-maz@kernel.org>
2021-05-03 12:08                         ` Shanker R Donthineni
2021-05-02 17:56                     ` Vikram Sethi
2021-05-03 10:17                       ` Marc Zyngier
2021-05-03 13:35                         ` Mark Kettenis [this message]
2021-05-03 13:59                           ` Vikram Sethi
2021-05-03 14:44                             ` Alex Williamson
2021-05-03 22:03                               ` Vikram Sethi
2021-05-04  8:30                                 ` Will Deacon
2021-05-05 18:02                                   ` Catalin Marinas
2021-05-06  7:22                                     ` Christoph Hellwig
2021-05-08 16:33                                     ` Shanker R Donthineni
2021-06-02  9:37                                       ` Marc Zyngier
2021-05-04 18:03                                 ` Alex Williamson
2021-06-02  9:11                                   ` Marc Zyngier
2021-04-30  9:54   ` Lorenzo Pieralisi
2021-04-30 12:38     ` Jason Gunthorpe
2021-04-29 16:29 ` [RFC 2/2] KVM: arm64: Add write-combine support for stage-2 entries Shanker Donthineni
2021-05-03  7:01 ` [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1bd514a531988c9@bloch.sibelius.xs4all.nl \
    --to=mark.kettenis@xs4all.nl \
    --cc=alex.williamson@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=jsequeira@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git