All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Pieralisi <lpieralisi@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Will Deacon <will@kernel.org>,
	ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Thu, 19 Oct 2023 15:35:24 +0200	[thread overview]
Message-ID: <ZTEwnBoPLL3PE3Xo@lpieralisi> (raw)
In-Reply-To: <ZTEN_oe97VRWbnHb@arm.com>

On Thu, Oct 19, 2023 at 12:07:42PM +0100, Catalin Marinas wrote:

[...]

> > VFIO issues a config space write to reset the PCI function.  Config
> > space writes MUST NOT write combine with anything. This is already
> > impossible for PCIe since they are different TLP types at the PCIe
> > level.
> 
> Yes, config space writes are fine, vfio-pci even maps them as
> Device_nGnRnE. But AFAIK a guest won't have direct access to the config
> space.
> 
> > By the PCIe rules, config space write must order strictly after all
> > other CPU's accesses. Once the reset non-posted write returns back to
> > VFIO we know that:
> > 
> >  1) There is no reference in any CPU page table to the MMIO PFN
> >  2) No CPU has pending data in any write buffer
> >  3) The interconnect and PCIe fabric have no inflight operations
> >  4) The device is in a clean post-reset state
> 
> I think from the CPU perspective, we can guarantee that a Normal_NC
> write on CPU0 for example reaches a serialisation point before a config
> space (Device_nGnRnE) write on CPU1 by the host as long as CPU1 issued a
> TLBI+DSB. Now, what I'm not sure is what this serialisation point is. If
> it is the PCIe root complex, we are probably fine, we hope it deals with
> any ordering between the Normal_NC write and the config space one.

If it is the PCI host bridge (and for PCI it should be since it is the
logic between the ARM world - where ARM ordering rules and barriers
apply - and PCI protocol), either it enforces PCI ordering rules or it
is broken by design; if it is the latter, at that stage device
assignment would be the least of our problems.

For non-PCI device assignment, I am not sure at all we can rely on
anything other than what Jason mentioned, eg resets (and the components
that through eg MMIO are carrying them out) are not architected, the
device MMIO space and the MMIO space used to trigger the reset (again,
it is an example) may well be placed on different interconnect paths,
it is device specific.

Lorenzo

> Talking to Will earlier, I think we can deem the PCIe scenario
> (somewhat) safe but not as a generic mechanism for other non-PCIe
> devices (e.g. platform). With this concern, can we make this Stage 2
> relaxation in KVM only for vfio-pci mappings? I don't have an example of
> non-PCIe device assignment to figure out how this should work though.
> 
> > > knows all the details. The safest is for the VMM to keep it as Device (I
> > > think vfio-pci goes for the strongest nGnRnE).
> > 
> > We are probably going to allow VFIO to let userspace pick if it should
> > be pgprot_device or pgprot_writecombine.
> 
> I guess that's for the direct use by an application rather than VMM+VM.
> IIUC people work around this currently by mapping PCIe BARs as
> pgprot_writecombine() via sysfs. Getting vfio-pci to allow different
> mappings is probably a good idea, though it doesn't currently help with
> the KVM case as we can't force the VMM to know the specifics of the
> device it is giving to a guest.
> 
> > The alias issue could be resolved by teaching KVM how to insert a
> > physical PFN based on some VFIO FD/dmabuf rather than a VMA so that
> > the PFNs are never mapped in the hypervisor side.
> 
> This should work as well and solves the aliasing problem, though it
> requires changes to the VMM as well, not just KVM, which currently
> relies on vfio-pci mmap().
> 
> -- 
> Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Lorenzo Pieralisi <lpieralisi@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Will Deacon <will@kernel.org>,
	ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Thu, 19 Oct 2023 15:35:24 +0200	[thread overview]
Message-ID: <ZTEwnBoPLL3PE3Xo@lpieralisi> (raw)
In-Reply-To: <ZTEN_oe97VRWbnHb@arm.com>

On Thu, Oct 19, 2023 at 12:07:42PM +0100, Catalin Marinas wrote:

[...]

> > VFIO issues a config space write to reset the PCI function.  Config
> > space writes MUST NOT write combine with anything. This is already
> > impossible for PCIe since they are different TLP types at the PCIe
> > level.
> 
> Yes, config space writes are fine, vfio-pci even maps them as
> Device_nGnRnE. But AFAIK a guest won't have direct access to the config
> space.
> 
> > By the PCIe rules, config space write must order strictly after all
> > other CPU's accesses. Once the reset non-posted write returns back to
> > VFIO we know that:
> > 
> >  1) There is no reference in any CPU page table to the MMIO PFN
> >  2) No CPU has pending data in any write buffer
> >  3) The interconnect and PCIe fabric have no inflight operations
> >  4) The device is in a clean post-reset state
> 
> I think from the CPU perspective, we can guarantee that a Normal_NC
> write on CPU0 for example reaches a serialisation point before a config
> space (Device_nGnRnE) write on CPU1 by the host as long as CPU1 issued a
> TLBI+DSB. Now, what I'm not sure is what this serialisation point is. If
> it is the PCIe root complex, we are probably fine, we hope it deals with
> any ordering between the Normal_NC write and the config space one.

If it is the PCI host bridge (and for PCI it should be since it is the
logic between the ARM world - where ARM ordering rules and barriers
apply - and PCI protocol), either it enforces PCI ordering rules or it
is broken by design; if it is the latter, at that stage device
assignment would be the least of our problems.

For non-PCI device assignment, I am not sure at all we can rely on
anything other than what Jason mentioned, eg resets (and the components
that through eg MMIO are carrying them out) are not architected, the
device MMIO space and the MMIO space used to trigger the reset (again,
it is an example) may well be placed on different interconnect paths,
it is device specific.

Lorenzo

> Talking to Will earlier, I think we can deem the PCIe scenario
> (somewhat) safe but not as a generic mechanism for other non-PCIe
> devices (e.g. platform). With this concern, can we make this Stage 2
> relaxation in KVM only for vfio-pci mappings? I don't have an example of
> non-PCIe device assignment to figure out how this should work though.
> 
> > > knows all the details. The safest is for the VMM to keep it as Device (I
> > > think vfio-pci goes for the strongest nGnRnE).
> > 
> > We are probably going to allow VFIO to let userspace pick if it should
> > be pgprot_device or pgprot_writecombine.
> 
> I guess that's for the direct use by an application rather than VMM+VM.
> IIUC people work around this currently by mapping PCIe BARs as
> pgprot_writecombine() via sysfs. Getting vfio-pci to allow different
> mappings is probably a good idea, though it doesn't currently help with
> the KVM case as we can't force the VMM to know the specifics of the
> device it is giving to a guest.
> 
> > The alias issue could be resolved by teaching KVM how to insert a
> > physical PFN based on some VFIO FD/dmabuf rather than a VMA so that
> > the PFNs are never mapped in the hypervisor side.
> 
> This should work as well and solves the aliasing problem, though it
> requires changes to the VMM as well, not just KVM, which currently
> relies on vfio-pci mmap().
> 
> -- 
> Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-10-19 13:35 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 18:14 [PATCH v1 0/2] KVM: arm64: support write combining and cachable IO memory in VMs ankita
2023-09-07 18:14 ` ankita
2023-09-07 18:14 ` [PATCH v1 1/2] KVM: arm64: determine memory type from VMA ankita
2023-09-07 18:14   ` ankita
2023-09-07 19:12   ` Jason Gunthorpe
2023-09-07 19:12     ` Jason Gunthorpe
2023-10-05 16:15   ` Catalin Marinas
2023-10-05 16:15     ` Catalin Marinas
2023-10-05 16:54     ` Jason Gunthorpe
2023-10-05 16:54       ` Jason Gunthorpe
2023-10-10 14:25       ` Catalin Marinas
2023-10-10 14:25         ` Catalin Marinas
2023-10-10 15:05         ` Jason Gunthorpe
2023-10-10 15:05           ` Jason Gunthorpe
2023-10-10 17:19           ` Catalin Marinas
2023-10-10 17:19             ` Catalin Marinas
2023-10-10 18:23             ` Jason Gunthorpe
2023-10-10 18:23               ` Jason Gunthorpe
2023-10-11 17:45               ` Catalin Marinas
2023-10-11 17:45                 ` Catalin Marinas
2023-10-11 18:38                 ` Jason Gunthorpe
2023-10-11 18:38                   ` Jason Gunthorpe
2023-10-12 16:16                   ` Catalin Marinas
2023-10-12 16:16                     ` Catalin Marinas
2024-03-10  3:49                     ` Ankit Agrawal
2024-03-10  3:49                       ` Ankit Agrawal
2024-03-19 13:38                       ` Jason Gunthorpe
2024-03-19 13:38                         ` Jason Gunthorpe
2023-10-23 13:20   ` Shameerali Kolothum Thodi
2023-10-23 13:20     ` Shameerali Kolothum Thodi
2023-09-07 18:14 ` [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita
2023-09-07 18:14   ` ankita
2023-09-08 16:40   ` Catalin Marinas
2023-09-08 16:40     ` Catalin Marinas
2023-09-11 14:57   ` Lorenzo Pieralisi
2023-09-11 14:57     ` Lorenzo Pieralisi
2023-09-11 17:20     ` Jason Gunthorpe
2023-09-11 17:20       ` Jason Gunthorpe
2023-09-13 15:26       ` Lorenzo Pieralisi
2023-09-13 15:26         ` Lorenzo Pieralisi
2023-09-13 18:54         ` Jason Gunthorpe
2023-09-13 18:54           ` Jason Gunthorpe
2023-09-26  8:31           ` Lorenzo Pieralisi
2023-09-26  8:31             ` Lorenzo Pieralisi
2023-09-26 12:25             ` Jason Gunthorpe
2023-09-26 12:25               ` Jason Gunthorpe
2023-09-26 13:52             ` Catalin Marinas
2023-09-26 13:52               ` Catalin Marinas
2023-09-26 16:12               ` Lorenzo Pieralisi
2023-09-26 16:12                 ` Lorenzo Pieralisi
2023-10-05  9:56               ` Lorenzo Pieralisi
2023-10-05  9:56                 ` Lorenzo Pieralisi
2023-10-05 11:56                 ` Jason Gunthorpe
2023-10-05 11:56                   ` Jason Gunthorpe
2023-10-05 14:08                   ` Lorenzo Pieralisi
2023-10-05 14:08                     ` Lorenzo Pieralisi
2023-10-12 12:35                 ` Will Deacon
2023-10-12 12:35                   ` Will Deacon
2023-10-12 13:20                   ` Jason Gunthorpe
2023-10-12 13:20                     ` Jason Gunthorpe
2023-10-12 14:29                     ` Lorenzo Pieralisi
2023-10-12 14:29                       ` Lorenzo Pieralisi
2023-10-12 13:53                   ` Catalin Marinas
2023-10-12 13:53                     ` Catalin Marinas
2023-10-12 14:48                     ` Will Deacon
2023-10-12 14:48                       ` Will Deacon
2023-10-12 15:44                       ` Jason Gunthorpe
2023-10-12 15:44                         ` Jason Gunthorpe
2023-10-12 16:39                         ` Will Deacon
2023-10-12 16:39                           ` Will Deacon
2023-10-12 18:36                           ` Jason Gunthorpe
2023-10-12 18:36                             ` Jason Gunthorpe
2023-10-13  9:29                             ` Will Deacon
2023-10-13  9:29                               ` Will Deacon
2023-10-12 17:26                       ` Catalin Marinas
2023-10-12 17:26                         ` Catalin Marinas
2023-10-13  9:29                         ` Will Deacon
2023-10-13  9:29                           ` Will Deacon
2023-10-13 13:08                           ` Catalin Marinas
2023-10-13 13:08                             ` Catalin Marinas
2023-10-13 13:45                             ` Jason Gunthorpe
2023-10-13 13:45                               ` Jason Gunthorpe
2023-10-19 11:07                               ` Catalin Marinas
2023-10-19 11:07                                 ` Catalin Marinas
2023-10-19 11:51                                 ` Jason Gunthorpe
2023-10-19 11:51                                   ` Jason Gunthorpe
2023-10-20 11:21                                   ` Catalin Marinas
2023-10-20 11:21                                     ` Catalin Marinas
2023-10-20 11:47                                     ` Jason Gunthorpe
2023-10-20 11:47                                       ` Jason Gunthorpe
2023-10-20 14:03                                       ` Lorenzo Pieralisi
2023-10-20 14:03                                         ` Lorenzo Pieralisi
2023-10-20 14:28                                         ` Jason Gunthorpe
2023-10-20 14:28                                           ` Jason Gunthorpe
2023-10-19 13:35                                 ` Lorenzo Pieralisi [this message]
2023-10-19 13:35                                   ` Lorenzo Pieralisi
2023-10-13 15:28                             ` Lorenzo Pieralisi
2023-10-13 15:28                               ` Lorenzo Pieralisi
2023-10-19 11:12                               ` Catalin Marinas
2023-10-19 11:12                                 ` Catalin Marinas
2023-11-09 15:34                             ` Lorenzo Pieralisi
2023-11-09 15:34                               ` Lorenzo Pieralisi
2023-11-10 14:26                               ` Jason Gunthorpe
2023-11-10 14:26                                 ` Jason Gunthorpe
2023-11-13  0:42                                 ` Lorenzo Pieralisi
2023-11-13  0:42                                   ` Lorenzo Pieralisi
2023-11-13 17:41                               ` Catalin Marinas
2023-11-13 17:41                                 ` Catalin Marinas
2023-10-12 12:27   ` Will Deacon
2023-10-12 12:27     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTEwnBoPLL3PE3Xo@lpieralisi \
    --to=lpieralisi@kernel.org \
    --cc=acurrid@nvidia.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=kwankhede@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.