LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Vikram Sethi <vsethi@nvidia.com>
To: Mark Kettenis <mark.kettenis@xs4all.nl>, Marc Zyngier <maz@kernel.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"will@kernel.org" <will@kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"christoffer.dall@arm.com" <christoffer.dall@arm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jason Sequeira <jsequeira@nvidia.com>
Subject: RE: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
Date: Mon, 3 May 2021 13:59:43 +0000
Message-ID: <BL0PR12MB253296086906C4A850EC68E6BD5B9@BL0PR12MB2532.namprd12.prod.outlook.com> (raw)
In-Reply-To: <c1bd514a531988c9@bloch.sibelius.xs4all.nl>



> From: Mark Kettenis <mark.kettenis@xs4all.nl>
> > From: Marc Zyngier <maz@kernel.org>

snip
> > If, by enumerating the properties of Prefetchable, you can show that
> > they are a strict superset of Normal_NC, I'm on board. I haven't seen
> > such an enumeration so far.
> >
snip
> > Right, so we have made a small step in the direction of mapping
> > "prefetchable" onto "Normal_NC", thanks for that. What about all the
> > other properties (unaligned accesses, ordering, gathering)?
> 
Regarding gathering/write combining, that is also allowed to prefetchable per PCI spec
From 1.3.2.2 of 5/0 base spec:
A PCI Express Endpoint requesting memory resources through a BAR must set the BAR's Prefetchable bit unless
the range contains locations with read side-effects or locations in which the Function does not tolerate write
merging.
Further 7.5.1.2.1 says " A Function is permitted
to mark a range as prefetchable if there are no side effects on reads, the Function returns all bytes on reads regardless of
the byte enables, and host bridges can merge processor writes into this range139 without causing errors"

The "regardless of byte enables" suggests to me that unaligned is OK, as only 
certain byte enables may be set, what do you think?

So to me prefetchable in PCIe spec allows for write combining, read without
sideeffect (prefetch/speculative as long as uncached), and unaligned. Regarding
ordering I didn't find a statement one way or other in PCIe prefetchable definition, but
I think that goes beyond what PCIe says or doesn't say anyway since reordering can 
also happen in the CPU, and since driver must be aware of correctness issues in its 
producer/consumer models it will need the right barriers where they are required 
for correctness anyway (required for the driver/userspace to work on host w/ ioremap_wc).

But perhaps the bigger question is since WC doesn't exist as a Memory type
on armv8, yet we are trying to fit something onto ioremap_wc which came from
x86 world, shouldn't the arm64 MT we use for WC match the semantics of 
whatever drivers and userspace expected from ioremap_wc as defined on 
x86, which as Mark notes below includes unaligned? If we agree to that, 
we can codify it in the documentation of ioremap_wc and allow for
Normal NC on arm64 for ioremap_wc in host or guest. 
Beyond that, if we don't want to do it automatically based on prefetchable
but from explicit call from userspace is fine too. 

> On x86 WC:
> 
> 1. Is not cached (but stores are buffered).
> 
> 2. Allows unaligned access just like normal memory.
> 
> 3. Allows speculative reads.
> 
> 4. Has weaker ordering than normal memory; [lsm]fence instructions are
>    needed to guarantee a particular ordering of writes with respect to
>    other writes and reads.
> 
> 5. Stores are buffered.  This buffer isn't snooped so it has to be
>    flushed before changes are globally visible.  The [sm]fence
>    instructions flush the store buffer.
> 
> 6. The store buffer may combine multiple writes into a single write.
> 
> Now whether the fact the unaligned access is allowed is really part of the
> semantics of WC mappings is debatable as x86 always allows unaligned
> access, even for areas mapped with ioremap().
> 
> However, this is where userland comes in.  The userland graphics stack does
> assume that graphics memory mapped throug a prefetchable PCIe BAR
> allows unaligned access if the architecture allows unaligned access for
> cacheable memory.  On arm64 this means that such memory needs to be
> "Normal NC".  And since kernel drivers tend to map such memory using
> ioremap_wc() that pretty much implies ioremap_wc() shoul use "Normal NC"
> as well isn't it?
> 
> > > > How do we translate this into something consistent? I'd like to
> > > > see an actual description of what we *really* expect from WC on
> > > > prefetchable PCI regions, turn that into a documented definition
> > > > agreed across architectures, and then we can look at implementing
> > > > it with one memory type or another on arm64.
> > > >
> > > > Because once we expose that memory type at S2 for KVM guests, it
> > > > becomes ABI and there is no turning back. So I want to get it
> > > > right once and for all.
> > > >
> > > I agree that we need a precise definition for the Linux ioremap_wc
> > > API wrt what drivers (kernel and userspace) can expect and whether
> > > memset/memcpy is expected to work or not and whether aligned
> > > accesses are a requirement.
> > > To the extent ABI is set, I would think that the ABI is also already
> > > set in the host kernel for arm64 WC = Normal NC, so why should that
> > > not also be the ABI for same driver in VMs.
> >
> > KVM is an implementation of the ARM architecture, and doesn't really
> > care about what WC is. If we come to the conclusion that Normal_NC is
> > the natural match for Prefetchable attributes, than we're good and we
> > can have Normal_NC being set by userspace, or even VFIO. But I don't
> > want to set it only because "it works when bare-metal Linux uses it".
> > Remember KVM doesn't only run Linux as guests.
> >
> >       M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >

  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 16:29 [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Shanker Donthineni
2021-04-29 16:29 ` [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA Shanker Donthineni
2021-04-29 18:28   ` Alex Williamson
2021-04-29 19:14     ` Shanker R Donthineni
2021-04-29 19:46       ` Alex Williamson
2021-04-29 22:08         ` Vikram Sethi
2021-04-30 11:25         ` Shanker R Donthineni
     [not found]           ` <87czucngdc.wl-maz@kernel.org>
2021-04-30 13:07             ` Shanker R Donthineni
2021-04-30 14:58             ` Shanker R Donthineni
     [not found]               ` <878s4zokll.wl-maz@kernel.org>
2021-04-30 16:57                 ` Vikram Sethi
2021-05-01  9:30                   ` Marc Zyngier
2021-05-01 11:36                     ` Shanker R Donthineni
     [not found]                       ` <87czu8uowe.wl-maz@kernel.org>
2021-05-03 12:08                         ` Shanker R Donthineni
2021-05-02 17:56                     ` Vikram Sethi
2021-05-03 10:17                       ` Marc Zyngier
2021-05-03 13:35                         ` Mark Kettenis
2021-05-03 13:59                           ` Vikram Sethi [this message]
2021-05-03 14:44                             ` Alex Williamson
2021-05-03 22:03                               ` Vikram Sethi
2021-05-04  8:30                                 ` Will Deacon
2021-05-05 18:02                                   ` Catalin Marinas
2021-05-06  7:22                                     ` Christoph Hellwig
2021-05-08 16:33                                     ` Shanker R Donthineni
2021-06-02  9:37                                       ` Marc Zyngier
2021-05-04 18:03                                 ` Alex Williamson
2021-06-02  9:11                                   ` Marc Zyngier
2021-04-30  9:54   ` Lorenzo Pieralisi
2021-04-30 12:38     ` Jason Gunthorpe
2021-04-29 16:29 ` [RFC 2/2] KVM: arm64: Add write-combine support for stage-2 entries Shanker Donthineni
2021-05-03  7:01 ` [RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR12MB253296086906C4A850EC68E6BD5B9@BL0PR12MB2532.namprd12.prod.outlook.com \
    --to=vsethi@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=jsequeira@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.kettenis@xs4all.nl \
    --cc=maz@kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git