All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Thu, 19 Oct 2023 08:51:42 -0300	[thread overview]
Message-ID: <20231019115142.GQ3952@nvidia.com> (raw)
In-Reply-To: <ZTEN_oe97VRWbnHb@arm.com>

On Thu, Oct 19, 2023 at 12:07:42PM +0100, Catalin Marinas wrote:

> The issue is when the device is reclaimed to be given to another
> user-space process, do we know the previous transaction reached the
> device? With the TLBI + DSB in unmap, we can only tell that a subsequent
> map + write (in a new process) is ordered after the write in the old
> process. Maybe that's sufficient in most cases.

I can't think of a case where ordering is insufficient. As long as the
device perceives a strict seperation in order of writes preceeding the
'reset' writel and writes following the reset writel() the absolute
time of completion can not matter.

At least for VFIO PCI the reset sequences uses non-posted config TLPs
followed by config read TLPs.

For something like mlx5 when it reclaims a WC VMA it does a RPC which
is dma_wmb(), a writel(), a device DMA read, a device DMA write and a
CPU read fo DMA'd data.

Both of these are pretty "firm" ordering barriers.

> > Unmapping the VMA's must already have some NormalNC friendly ordering
> > barrier across all CPUs or we have a bigger problem. This barrier
> > definately must close write combining.
> 
> I think what we have is TLBI + DSB generating the DVM/DVMSync messages
> should ensure that the Normal NC writes on other CPUs reach some
> serialisation point (not necessarily the device, AFAICT we can't
> guarantee end-point completion here).

This is what our internal architects thought as well.

> Talking to Will earlier, I think we can deem the PCIe scenario
> (somewhat) safe but not as a generic mechanism for other non-PCIe
> devices (e.g. platform). With this concern, can we make this Stage 2
> relaxation in KVM only for vfio-pci mappings? I don't have an example of
> non-PCIe device assignment to figure out how this should work though.

It is not a KVM problem. As I implied above it is VFIO's
responsibility to reliably reset the device, not KVMs. If for some
reason VFIO cannot do that on some SOC then VFIO devices should not
exist.

It is not KVM's job to double guess VFIO's own security properties.

Specifically about platform the generic VFIO platform driver is the
ACPI based one. If the FW provides an ACPI method for device reset
that is not properly serializing, that is a FW bug. We can quirk it in
VFIO and block using those devices if they actually exist.

I expect the non-generic VFIO platform drivers to take care of this
issue internally with, barriers, read from devices, whatver is
required to make their SOCs order properly. Just as I would expect a
normal Linux platform driver to directly manage whatever
implementation specific ordering quirks the SOC may have.

Generic PCI drivers are the ones that rely on the implicit ordering at
the PCIe TLP level from TLBI + DSB. I would say this is part of the
undocumented arch API around pgprot_wc

> > > knows all the details. The safest is for the VMM to keep it as Device (I
> > > think vfio-pci goes for the strongest nGnRnE).
> > 
> > We are probably going to allow VFIO to let userspace pick if it should
> > be pgprot_device or pgprot_writecombine.
> 
> I guess that's for the direct use by an application rather than
> VMM+VM.

Yes, there have been requests for this.

> IIUC people work around this currently by mapping PCIe BARs as
> pgprot_writecombine() via sysfs. Getting vfio-pci to allow different
> mappings is probably a good idea, though it doesn't currently help with
> the KVM case as we can't force the VMM to know the specifics of the
> device it is giving to a guest.

Yes to all points

Thanks,
Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Thu, 19 Oct 2023 08:51:42 -0300	[thread overview]
Message-ID: <20231019115142.GQ3952@nvidia.com> (raw)
In-Reply-To: <ZTEN_oe97VRWbnHb@arm.com>

On Thu, Oct 19, 2023 at 12:07:42PM +0100, Catalin Marinas wrote:

> The issue is when the device is reclaimed to be given to another
> user-space process, do we know the previous transaction reached the
> device? With the TLBI + DSB in unmap, we can only tell that a subsequent
> map + write (in a new process) is ordered after the write in the old
> process. Maybe that's sufficient in most cases.

I can't think of a case where ordering is insufficient. As long as the
device perceives a strict seperation in order of writes preceeding the
'reset' writel and writes following the reset writel() the absolute
time of completion can not matter.

At least for VFIO PCI the reset sequences uses non-posted config TLPs
followed by config read TLPs.

For something like mlx5 when it reclaims a WC VMA it does a RPC which
is dma_wmb(), a writel(), a device DMA read, a device DMA write and a
CPU read fo DMA'd data.

Both of these are pretty "firm" ordering barriers.

> > Unmapping the VMA's must already have some NormalNC friendly ordering
> > barrier across all CPUs or we have a bigger problem. This barrier
> > definately must close write combining.
> 
> I think what we have is TLBI + DSB generating the DVM/DVMSync messages
> should ensure that the Normal NC writes on other CPUs reach some
> serialisation point (not necessarily the device, AFAICT we can't
> guarantee end-point completion here).

This is what our internal architects thought as well.

> Talking to Will earlier, I think we can deem the PCIe scenario
> (somewhat) safe but not as a generic mechanism for other non-PCIe
> devices (e.g. platform). With this concern, can we make this Stage 2
> relaxation in KVM only for vfio-pci mappings? I don't have an example of
> non-PCIe device assignment to figure out how this should work though.

It is not a KVM problem. As I implied above it is VFIO's
responsibility to reliably reset the device, not KVMs. If for some
reason VFIO cannot do that on some SOC then VFIO devices should not
exist.

It is not KVM's job to double guess VFIO's own security properties.

Specifically about platform the generic VFIO platform driver is the
ACPI based one. If the FW provides an ACPI method for device reset
that is not properly serializing, that is a FW bug. We can quirk it in
VFIO and block using those devices if they actually exist.

I expect the non-generic VFIO platform drivers to take care of this
issue internally with, barriers, read from devices, whatver is
required to make their SOCs order properly. Just as I would expect a
normal Linux platform driver to directly manage whatever
implementation specific ordering quirks the SOC may have.

Generic PCI drivers are the ones that rely on the implicit ordering at
the PCIe TLP level from TLBI + DSB. I would say this is part of the
undocumented arch API around pgprot_wc

> > > knows all the details. The safest is for the VMM to keep it as Device (I
> > > think vfio-pci goes for the strongest nGnRnE).
> > 
> > We are probably going to allow VFIO to let userspace pick if it should
> > be pgprot_device or pgprot_writecombine.
> 
> I guess that's for the direct use by an application rather than
> VMM+VM.

Yes, there have been requests for this.

> IIUC people work around this currently by mapping PCIe BARs as
> pgprot_writecombine() via sysfs. Getting vfio-pci to allow different
> mappings is probably a good idea, though it doesn't currently help with
> the KVM case as we can't force the VMM to know the specifics of the
> device it is giving to a guest.

Yes to all points

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-10-19 11:51 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 18:14 [PATCH v1 0/2] KVM: arm64: support write combining and cachable IO memory in VMs ankita
2023-09-07 18:14 ` ankita
2023-09-07 18:14 ` [PATCH v1 1/2] KVM: arm64: determine memory type from VMA ankita
2023-09-07 18:14   ` ankita
2023-09-07 19:12   ` Jason Gunthorpe
2023-09-07 19:12     ` Jason Gunthorpe
2023-10-05 16:15   ` Catalin Marinas
2023-10-05 16:15     ` Catalin Marinas
2023-10-05 16:54     ` Jason Gunthorpe
2023-10-05 16:54       ` Jason Gunthorpe
2023-10-10 14:25       ` Catalin Marinas
2023-10-10 14:25         ` Catalin Marinas
2023-10-10 15:05         ` Jason Gunthorpe
2023-10-10 15:05           ` Jason Gunthorpe
2023-10-10 17:19           ` Catalin Marinas
2023-10-10 17:19             ` Catalin Marinas
2023-10-10 18:23             ` Jason Gunthorpe
2023-10-10 18:23               ` Jason Gunthorpe
2023-10-11 17:45               ` Catalin Marinas
2023-10-11 17:45                 ` Catalin Marinas
2023-10-11 18:38                 ` Jason Gunthorpe
2023-10-11 18:38                   ` Jason Gunthorpe
2023-10-12 16:16                   ` Catalin Marinas
2023-10-12 16:16                     ` Catalin Marinas
2024-03-10  3:49                     ` Ankit Agrawal
2024-03-10  3:49                       ` Ankit Agrawal
2024-03-19 13:38                       ` Jason Gunthorpe
2024-03-19 13:38                         ` Jason Gunthorpe
2023-10-23 13:20   ` Shameerali Kolothum Thodi
2023-10-23 13:20     ` Shameerali Kolothum Thodi
2023-09-07 18:14 ` [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita
2023-09-07 18:14   ` ankita
2023-09-08 16:40   ` Catalin Marinas
2023-09-08 16:40     ` Catalin Marinas
2023-09-11 14:57   ` Lorenzo Pieralisi
2023-09-11 14:57     ` Lorenzo Pieralisi
2023-09-11 17:20     ` Jason Gunthorpe
2023-09-11 17:20       ` Jason Gunthorpe
2023-09-13 15:26       ` Lorenzo Pieralisi
2023-09-13 15:26         ` Lorenzo Pieralisi
2023-09-13 18:54         ` Jason Gunthorpe
2023-09-13 18:54           ` Jason Gunthorpe
2023-09-26  8:31           ` Lorenzo Pieralisi
2023-09-26  8:31             ` Lorenzo Pieralisi
2023-09-26 12:25             ` Jason Gunthorpe
2023-09-26 12:25               ` Jason Gunthorpe
2023-09-26 13:52             ` Catalin Marinas
2023-09-26 13:52               ` Catalin Marinas
2023-09-26 16:12               ` Lorenzo Pieralisi
2023-09-26 16:12                 ` Lorenzo Pieralisi
2023-10-05  9:56               ` Lorenzo Pieralisi
2023-10-05  9:56                 ` Lorenzo Pieralisi
2023-10-05 11:56                 ` Jason Gunthorpe
2023-10-05 11:56                   ` Jason Gunthorpe
2023-10-05 14:08                   ` Lorenzo Pieralisi
2023-10-05 14:08                     ` Lorenzo Pieralisi
2023-10-12 12:35                 ` Will Deacon
2023-10-12 12:35                   ` Will Deacon
2023-10-12 13:20                   ` Jason Gunthorpe
2023-10-12 13:20                     ` Jason Gunthorpe
2023-10-12 14:29                     ` Lorenzo Pieralisi
2023-10-12 14:29                       ` Lorenzo Pieralisi
2023-10-12 13:53                   ` Catalin Marinas
2023-10-12 13:53                     ` Catalin Marinas
2023-10-12 14:48                     ` Will Deacon
2023-10-12 14:48                       ` Will Deacon
2023-10-12 15:44                       ` Jason Gunthorpe
2023-10-12 15:44                         ` Jason Gunthorpe
2023-10-12 16:39                         ` Will Deacon
2023-10-12 16:39                           ` Will Deacon
2023-10-12 18:36                           ` Jason Gunthorpe
2023-10-12 18:36                             ` Jason Gunthorpe
2023-10-13  9:29                             ` Will Deacon
2023-10-13  9:29                               ` Will Deacon
2023-10-12 17:26                       ` Catalin Marinas
2023-10-12 17:26                         ` Catalin Marinas
2023-10-13  9:29                         ` Will Deacon
2023-10-13  9:29                           ` Will Deacon
2023-10-13 13:08                           ` Catalin Marinas
2023-10-13 13:08                             ` Catalin Marinas
2023-10-13 13:45                             ` Jason Gunthorpe
2023-10-13 13:45                               ` Jason Gunthorpe
2023-10-19 11:07                               ` Catalin Marinas
2023-10-19 11:07                                 ` Catalin Marinas
2023-10-19 11:51                                 ` Jason Gunthorpe [this message]
2023-10-19 11:51                                   ` Jason Gunthorpe
2023-10-20 11:21                                   ` Catalin Marinas
2023-10-20 11:21                                     ` Catalin Marinas
2023-10-20 11:47                                     ` Jason Gunthorpe
2023-10-20 11:47                                       ` Jason Gunthorpe
2023-10-20 14:03                                       ` Lorenzo Pieralisi
2023-10-20 14:03                                         ` Lorenzo Pieralisi
2023-10-20 14:28                                         ` Jason Gunthorpe
2023-10-20 14:28                                           ` Jason Gunthorpe
2023-10-19 13:35                                 ` Lorenzo Pieralisi
2023-10-19 13:35                                   ` Lorenzo Pieralisi
2023-10-13 15:28                             ` Lorenzo Pieralisi
2023-10-13 15:28                               ` Lorenzo Pieralisi
2023-10-19 11:12                               ` Catalin Marinas
2023-10-19 11:12                                 ` Catalin Marinas
2023-11-09 15:34                             ` Lorenzo Pieralisi
2023-11-09 15:34                               ` Lorenzo Pieralisi
2023-11-10 14:26                               ` Jason Gunthorpe
2023-11-10 14:26                                 ` Jason Gunthorpe
2023-11-13  0:42                                 ` Lorenzo Pieralisi
2023-11-13  0:42                                   ` Lorenzo Pieralisi
2023-11-13 17:41                               ` Catalin Marinas
2023-11-13 17:41                                 ` Catalin Marinas
2023-10-12 12:27   ` Will Deacon
2023-10-12 12:27     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231019115142.GQ3952@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acurrid@nvidia.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=kwankhede@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.