linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
@ 2019-05-22 20:12 Logan Gunthorpe
  2019-06-18 20:40 ` Bjorn Helgaas
  0 siblings, 1 reply; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-22 20:12 UTC (permalink / raw)
  To: linux-pci; +Cc: Logan Gunthorpe, Christian König, Bjorn Helgaas

Presently, there is no path to DMA map P2PDMA memory, so if a TLP
targeting this memory hits the root complex and an IOMMU is present,
the IOMMU will reject the transaction, even if the RC would support
P2PDMA.

So until the kernel knows to map these DMA addresses in the IOMMU,
we should not enable the whitelist when an IOMMU is present.

While we are at it, remove the comment mentioning future work
to add a white list.

Fixes: 0f97da831026 ("PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Hey,

I realized recently that I missed this issue between the IOMMU and
the whitelist when reviewing Christian's patch.

Unless there are any objections, I think this should be squashed
with the commit marked in the Fixes tag (from pci-v5.2-changes).

Thanks,

Logan

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 742928d0053e..4d2f6a44cba3 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -18,6 +18,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/random.h>
 #include <linux/seq_buf.h>
+#include <linux/iommu.h>

 struct pci_p2pdma {
 	struct percpu_ref devmap_ref;
@@ -284,6 +285,9 @@ static bool root_complex_whitelist(struct pci_dev *dev)
 	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
 	unsigned short vendor, device;

+	if (iommu_present(dev->dev.bus))
+		return false;
+
 	if (!root)
 		return false;

@@ -453,8 +457,7 @@ static int upstream_bridge_distance_warn(struct pci_dev *provider,
  *
  * For now, "compatible" means the provider and the clients are all behind
  * the same PCI root port. This cuts out cases that may work but is safest
- * for the user. Future work can expand this to white-list root complexes that
- * can safely forward between each ports.
+ * for the user.
  */
 int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 			     int num_clients, bool verbose)
--
2.20.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-22 20:12 [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present Logan Gunthorpe
@ 2019-06-18 20:40 ` Bjorn Helgaas
  2019-06-18 20:51   ` Logan Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2019-06-18 20:40 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-pci, Christian König

On Wed, May 22, 2019 at 02:12:52PM -0600, Logan Gunthorpe wrote:
> Presently, there is no path to DMA map P2PDMA memory, so if a TLP
> targeting this memory hits the root complex and an IOMMU is present,
> the IOMMU will reject the transaction, even if the RC would support
> P2PDMA.
> 
> So until the kernel knows to map these DMA addresses in the IOMMU,
> we should not enable the whitelist when an IOMMU is present.
> 
> While we are at it, remove the comment mentioning future work
> to add a white list.

There was a lot of discussion about this.  Did everybody come to a
consensus about what should be done?  Can you post a patch with
reviewed-by if appropriate?

> Fixes: 0f97da831026 ("PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex")
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/p2pdma.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Hey,
> 
> I realized recently that I missed this issue between the IOMMU and
> the whitelist when reviewing Christian's patch.
> 
> Unless there are any objections, I think this should be squashed
> with the commit marked in the Fixes tag (from pci-v5.2-changes).
> 
> Thanks,
> 
> Logan
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 742928d0053e..4d2f6a44cba3 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -18,6 +18,7 @@
>  #include <linux/percpu-refcount.h>
>  #include <linux/random.h>
>  #include <linux/seq_buf.h>
> +#include <linux/iommu.h>
> 
>  struct pci_p2pdma {
>  	struct percpu_ref devmap_ref;
> @@ -284,6 +285,9 @@ static bool root_complex_whitelist(struct pci_dev *dev)
>  	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>  	unsigned short vendor, device;
> 
> +	if (iommu_present(dev->dev.bus))
> +		return false;
> +
>  	if (!root)
>  		return false;
> 
> @@ -453,8 +457,7 @@ static int upstream_bridge_distance_warn(struct pci_dev *provider,
>   *
>   * For now, "compatible" means the provider and the clients are all behind
>   * the same PCI root port. This cuts out cases that may work but is safest
> - * for the user. Future work can expand this to white-list root complexes that
> - * can safely forward between each ports.
> + * for the user.
>   */
>  int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>  			     int num_clients, bool verbose)
> --
> 2.20.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-06-18 20:40 ` Bjorn Helgaas
@ 2019-06-18 20:51   ` Logan Gunthorpe
  2019-06-18 23:50     ` Logan Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Logan Gunthorpe @ 2019-06-18 20:51 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Christian König



On 2019-06-18 2:40 p.m., Bjorn Helgaas wrote:
> On Wed, May 22, 2019 at 02:12:52PM -0600, Logan Gunthorpe wrote:
>> Presently, there is no path to DMA map P2PDMA memory, so if a TLP
>> targeting this memory hits the root complex and an IOMMU is present,
>> the IOMMU will reject the transaction, even if the RC would support
>> P2PDMA.
>>
>> So until the kernel knows to map these DMA addresses in the IOMMU,
>> we should not enable the whitelist when an IOMMU is present.
>>
>> While we are at it, remove the comment mentioning future work
>> to add a white list.
> 
> There was a lot of discussion about this.  Did everybody come to a
> consensus about what should be done?  Can you post a patch with
> reviewed-by if appropriate?

I think we have consensus that it's broken and needs to be fixed for the
short term. Preferably before 5.3. I'm not sure we have consensus on the
proper fix.

The two easy things I can see to do is to either revert it or add the
iommu_is_present() check that I did in the above patch.

@Christian, which do you prefer? I think I'd prefer the
iommu_is_present() route as it maintains the information about
white-listed devices and is easier to change once we have the correct
solution.

I can send a patch tomorrow one way or another.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-06-18 20:51   ` Logan Gunthorpe
@ 2019-06-18 23:50     ` Logan Gunthorpe
  2019-06-19  9:26       ` Koenig, Christian
  0 siblings, 1 reply; 23+ messages in thread
From: Logan Gunthorpe @ 2019-06-18 23:50 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Christian König



On 2019-06-18 2:51 p.m., Logan Gunthorpe wrote:
> On 2019-06-18 2:40 p.m., Bjorn Helgaas wrote:
>> On Wed, May 22, 2019 at 02:12:52PM -0600, Logan Gunthorpe wrote:
>>> Presently, there is no path to DMA map P2PDMA memory, so if a TLP
>>> targeting this memory hits the root complex and an IOMMU is present,
>>> the IOMMU will reject the transaction, even if the RC would support
>>> P2PDMA.
>>>
>>> So until the kernel knows to map these DMA addresses in the IOMMU,
>>> we should not enable the whitelist when an IOMMU is present.
>>>
>>> While we are at it, remove the comment mentioning future work
>>> to add a white list.
>>
>> There was a lot of discussion about this.  Did everybody come to a
>> consensus about what should be done?  Can you post a patch with
>> reviewed-by if appropriate?
> 
> I think we have consensus that it's broken and needs to be fixed for the
> short term. Preferably before 5.3. I'm not sure we have consensus on the
> proper fix.
> 
> The two easy things I can see to do is to either revert it or add the
> iommu_is_present() check that I did in the above patch.
> 
> @Christian, which do you prefer? I think I'd prefer the
> iommu_is_present() route as it maintains the information about
> white-listed devices and is easier to change once we have the correct
> solution.
> 
> I can send a patch tomorrow one way or another.

Also, looks like one of my clients has an interest in seeing work like
this happen. So I'll be writing some patches in the next couple weeks to
do this properly. I'll try to send them to the lists early next cycle.

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-06-18 23:50     ` Logan Gunthorpe
@ 2019-06-19  9:26       ` Koenig, Christian
  2019-06-19  9:29         ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Koenig, Christian @ 2019-06-19  9:26 UTC (permalink / raw)
  To: Logan Gunthorpe, Bjorn Helgaas; +Cc: linux-pci

Am 19.06.19 um 01:50 schrieb Logan Gunthorpe:
>
> On 2019-06-18 2:51 p.m., Logan Gunthorpe wrote:
>> On 2019-06-18 2:40 p.m., Bjorn Helgaas wrote:
>>> On Wed, May 22, 2019 at 02:12:52PM -0600, Logan Gunthorpe wrote:
>>>> Presently, there is no path to DMA map P2PDMA memory, so if a TLP
>>>> targeting this memory hits the root complex and an IOMMU is present,
>>>> the IOMMU will reject the transaction, even if the RC would support
>>>> P2PDMA.
>>>>
>>>> So until the kernel knows to map these DMA addresses in the IOMMU,
>>>> we should not enable the whitelist when an IOMMU is present.
>>>>
>>>> While we are at it, remove the comment mentioning future work
>>>> to add a white list.
>>> There was a lot of discussion about this.  Did everybody come to a
>>> consensus about what should be done?  Can you post a patch with
>>> reviewed-by if appropriate?
>> I think we have consensus that it's broken and needs to be fixed for the
>> short term. Preferably before 5.3.

Yeah, completely agree.

>> I'm not sure we have consensus on the
>> proper fix.
>>
>> The two easy things I can see to do is to either revert it or add the
>> iommu_is_present() check that I did in the above patch.
>>
>> @Christian, which do you prefer? I think I'd prefer the
>> iommu_is_present() route as it maintains the information about
>> white-listed devices and is easier to change once we have the correct
>> solution.

Your original iommu_is_prevent() patch sound like the best option so far.

If that hasn't changed feel free to add a Reviewed-by: Christian König 
<christian.koenig@amd.com> to that one.

>> I can send a patch tomorrow one way or another.
> Also, looks like one of my clients has an interest in seeing work like
> this happen. So I'll be writing some patches in the next couple weeks to
> do this properly. I'll try to send them to the lists early next cycle.

Oh, nice.

I was hoping to get my use case into 5.4 or 5.5, but we are still stuck 
with some of the DMA-buf related pieces.

Regards,
Christian.

>
> Logan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-06-19  9:26       ` Koenig, Christian
@ 2019-06-19  9:29         ` Christoph Hellwig
  2019-06-19  9:39           ` Koenig, Christian
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2019-06-19  9:29 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: Logan Gunthorpe, Bjorn Helgaas, linux-pci

On Wed, Jun 19, 2019 at 09:26:42AM +0000, Koenig, Christian wrote:
> I was hoping to get my use case into 5.4 or 5.5, but we are still stuck 
> with some of the DMA-buf related pieces.

Can you make sure you have Logand and me, and maybe the iommu list
Cced when you posted it?  I'd like to make sure we all have a common
understanding where we are moving with the APIs and use cases.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-06-19  9:29         ` Christoph Hellwig
@ 2019-06-19  9:39           ` Koenig, Christian
  0 siblings, 0 replies; 23+ messages in thread
From: Koenig, Christian @ 2019-06-19  9:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Logan Gunthorpe, Bjorn Helgaas, linux-pci

Am 19.06.19 um 11:29 schrieb Christoph Hellwig:
> On Wed, Jun 19, 2019 at 09:26:42AM +0000, Koenig, Christian wrote:
>> I was hoping to get my use case into 5.4 or 5.5, but we are still stuck
>> with some of the DMA-buf related pieces.
> Can you make sure you have Logand and me, and maybe the iommu list
> Cced when you posted it?  I'd like to make sure we all have a common
> understanding where we are moving with the APIs and use cases.

Yeah, sure. I'm still not settled on APIs anyway, so feedback on this is 
highly welcome.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-24 14:12                 ` Christoph Hellwig
@ 2019-05-24 16:18                   ` Logan Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-24 16:18 UTC (permalink / raw)
  To: Christoph Hellwig, Koenig, Christian; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-24 8:12 a.m., Christoph Hellwig wrote:
> I think we've got two escalation levels here:
> 
>  (1) fix the NVMe/RDMA p2p breakage for 5.2-rc - for that we just need
>      the above fix in the p2p map_sg routine (and add an unmap_sg
>      routine there)

Yes, fixing the p2p_map_sg routine or add the check if the iommu is
present (as I did in the patch that started this discussion) until we
can get the map stuff sorted out.

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-24 12:40               ` Koenig, Christian
  2019-05-24 14:12                 ` Christoph Hellwig
@ 2019-05-24 16:06                 ` Logan Gunthorpe
  1 sibling, 0 replies; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-24 16:06 UTC (permalink / raw)
  To: Koenig, Christian, Christoph Hellwig; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-24 6:40 a.m., Koenig, Christian wrote:
> Am 23.05.19 um 17:59 schrieb Christoph Hellwig:
>> [CAUTION: External Email]
>>
>> On Thu, May 23, 2019 at 09:53:53AM -0600, Logan Gunthorpe wrote:
>>>> The problem shows up if pci_bus_address() returns a different address
>>>> than pci_resource_start(), should be easy to check if that happens.
>>>> IIRC it is something mostly seen on embedded SOCs.
>>>>
>>> I think it's a bit more complicated then that: If you're calling
>>> dma_map_resource() to program the IOMMU then I'm pretty sure you'd want
>>> to use the pci_resource_start() address as the phys_addr_t. If you're
>>> bypassing the root complex (like the current p2pdma code enforces), then
>>> you'd simply use a pci_bus_address() directly as the dma_addr and would
>>> not program the IOMMU at all seeing it's not involved (which is what is
>>> currently done).
>> True.  What we need is:
>>
>>   if (both device are behind the same root port (using a switch)) {
>>          use the current direct map + offset code
>>   } else {
>>          call ->map_resource()
>>   }
> 
> Sounds sane to me as well.
> 
> But since I don't have a struct pages backing my PCI BAR I won't be able 
> to use pci_p2pdma_map_sg.
> 
> How should we work around that?

I'd add something like:

int pci_p2pdma_map_resource(struct pci_dev *provider, struct device
*mapper, phys_addr_t addr, size_t size,  enum dma_data_direction dir,
unsigned long attrs);

But keep in mind we have to update pci_p2pdma_map_sg() as well before we
can enable the whitelist as there are existing users. The map_sg()
command should be able to get the provider device through the pgmap. And
that will be easier to do once Dan's patch[1] lands because he creates a
private pgmap struct.

Logan

[1]
https://lore.kernel.org/lkml/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.com/T/#u

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-24 12:40               ` Koenig, Christian
@ 2019-05-24 14:12                 ` Christoph Hellwig
  2019-05-24 16:18                   ` Logan Gunthorpe
  2019-05-24 16:06                 ` Logan Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2019-05-24 14:12 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-pci, Bjorn Helgaas

On Fri, May 24, 2019 at 12:40:57PM +0000, Koenig, Christian wrote:
> Sounds sane to me as well.
> 
> But since I don't have a struct pages backing my PCI BAR I won't be able 
> to use pci_p2pdma_map_sg.
> 
> How should we work around that?

I think we've got two escalation levels here:

 (1) fix the NVMe/RDMA p2p breakage for 5.2-rc - for that we just need
     the above fix in the p2p map_sg routine (and add an unmap_sg
     routine there)
 (2) figure out how to handle the switched case for your code.  I'm not
     sure about that yet, and without a struct page we'd have to somehow
     track the relationship explicitly.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23 15:59             ` Christoph Hellwig
@ 2019-05-24 12:40               ` Koenig, Christian
  2019-05-24 14:12                 ` Christoph Hellwig
  2019-05-24 16:06                 ` Logan Gunthorpe
  0 siblings, 2 replies; 23+ messages in thread
From: Koenig, Christian @ 2019-05-24 12:40 UTC (permalink / raw)
  To: Christoph Hellwig, Logan Gunthorpe; +Cc: linux-pci, Bjorn Helgaas

Am 23.05.19 um 17:59 schrieb Christoph Hellwig:
> [CAUTION: External Email]
>
> On Thu, May 23, 2019 at 09:53:53AM -0600, Logan Gunthorpe wrote:
>>> The problem shows up if pci_bus_address() returns a different address
>>> than pci_resource_start(), should be easy to check if that happens.
>>> IIRC it is something mostly seen on embedded SOCs.
>>>
>> I think it's a bit more complicated then that: If you're calling
>> dma_map_resource() to program the IOMMU then I'm pretty sure you'd want
>> to use the pci_resource_start() address as the phys_addr_t. If you're
>> bypassing the root complex (like the current p2pdma code enforces), then
>> you'd simply use a pci_bus_address() directly as the dma_addr and would
>> not program the IOMMU at all seeing it's not involved (which is what is
>> currently done).
> True.  What we need is:
>
>   if (both device are behind the same root port (using a switch)) {
>          use the current direct map + offset code
>   } else {
>          call ->map_resource()
>   }

Sounds sane to me as well.

But since I don't have a struct pages backing my PCI BAR I won't be able 
to use pci_p2pdma_map_sg.

How should we work around that?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  9:48       ` Koenig, Christian
  2019-05-23  9:50         ` Christoph Hellwig
@ 2019-05-23 16:14         ` Logan Gunthorpe
  1 sibling, 0 replies; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-23 16:14 UTC (permalink / raw)
  To: Koenig, Christian, Christoph Hellwig; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-23 3:48 a.m., Koenig, Christian wrote:
> Am 23.05.19 um 11:43 schrieb Christoph Hellwig:
>> [CAUTION: External Email]
>>
>> On Thu, May 23, 2019 at 08:12:18AM +0000, Koenig, Christian wrote:
>>>> Are you DMA-mapping the addresses outside the P2PDMA code? If so there's
>>>> a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If
>>>> you're not dma-mapping then I can't see how it could work because the
>>>> IOMMU should reject any requests to access those addresses.
>>> Well, we are using the DMA API (dma_map_resource) for this. If the P2P
>>> code is not using this then I would rather say that the P2P code is
>>> actually broken.
>>>
>>> Adding Christoph as well, cause he is usually the one discussion stuff
>>> like that with me.
>> Heh.  Actually dma_map_resource-ish APIs are the right thing to do,
>> but I'm not sure how you managed to be able to use it for PCIe P2P
>> yet, as it fails to account for any difference in the PCIe level
>> "physical" address with the hosts view of "physical" addresses.
>>
>> Do these offsets now how up on AMD platforms?  Do you adjust for them
>> elsewhere?
> 
> I don't adjust the address manually anywhere. I just call 
> dma_map_resource() and use the resulting DMA address to access the other 
> devices PCI BAR.
> 
> At least on my test system (AMD CPU + AMD GPUs) this seems to work 
> totally fine. Currently trying to find time and an Intel box to test it 
> there as well.

I'm sure this will work fine in all cases (assuming the RC/IOMMU
supports P2P). It's just you're breaking the existing p2pdma users which
currently work by avoiding the IOMMU and use the pci_bus_address() instead.

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23 15:53           ` Logan Gunthorpe
@ 2019-05-23 15:59             ` Christoph Hellwig
  2019-05-24 12:40               ` Koenig, Christian
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2019-05-23 15:59 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Christoph Hellwig, Koenig, Christian, linux-pci, Bjorn Helgaas

On Thu, May 23, 2019 at 09:53:53AM -0600, Logan Gunthorpe wrote:
> > The problem shows up if pci_bus_address() returns a different address
> > than pci_resource_start(), should be easy to check if that happens.
> > IIRC it is something mostly seen on embedded SOCs.
> > 
> 
> I think it's a bit more complicated then that: If you're calling
> dma_map_resource() to program the IOMMU then I'm pretty sure you'd want
> to use the pci_resource_start() address as the phys_addr_t. If you're
> bypassing the root complex (like the current p2pdma code enforces), then
> you'd simply use a pci_bus_address() directly as the dma_addr and would
> not program the IOMMU at all seeing it's not involved (which is what is
> currently done).

True.  What we need is:

 if (both device are behind the same root port (using a switch)) {
 	use the current direct map + offset code
 } else {
 	call ->map_resource()
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23 10:26             ` Christoph Hellwig
@ 2019-05-23 15:59               ` Logan Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-23 15:59 UTC (permalink / raw)
  To: Christoph Hellwig, Koenig, Christian; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-23 4:26 a.m., Christoph Hellwig wrote:
> On Thu, May 23, 2019 at 10:06:28AM +0000, Koenig, Christian wrote:
>> Ok, we certainly don't have a system which exercise this user case. 
>> Could ask around if we have an ARM SOC with that properties somewhere.
>>
>> But asking the other way around: Where is the right place to start 
>> fixing all this? dma_map_resource()?
> 
> That is the the big gorrilla in the room.  The offset applies to the
> device whos BARs/resources we map.  The current dma_map_resource API
> does not even have the right information.  So I think we need to
> enhance the API to pass a second struct device and we could fix it
> there and then in the next steps add a map_sg version of
> dma_map_resource and eventually also convert the PCIe P2P map_sg
> over to that.

IMO, this logic belongs in pci_p2pdma_map_* helpers that call dma_map_*
helpers when appropriate. Changing dma_map_resource() to take two
devices won't always make sense. For example, there are existing use
cases of dma_map_resource() that work with the Intel IOAT DMA engine and
thus it's known that those transactions will always go through the IOMMU
if it's enabled; and therefore the existing dma_map_resource() is
appropriate.

Logan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  9:50         ` Christoph Hellwig
  2019-05-23 10:06           ` Koenig, Christian
@ 2019-05-23 15:53           ` Logan Gunthorpe
  2019-05-23 15:59             ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-23 15:53 UTC (permalink / raw)
  To: Christoph Hellwig, Koenig, Christian; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-23 3:50 a.m., Christoph Hellwig wrote:
> On Thu, May 23, 2019 at 09:48:40AM +0000, Koenig, Christian wrote:
>> I don't adjust the address manually anywhere. I just call 
>> dma_map_resource() and use the resulting DMA address to access the other 
>> devices PCI BAR.
>>
>> At least on my test system (AMD CPU + AMD GPUs) this seems to work 
>> totally fine. Currently trying to find time and an Intel box to test it 
>> there as well.
> 
> The problem shows up if pci_bus_address() returns a different address
> than pci_resource_start(), should be easy to check if that happens.
> IIRC it is something mostly seen on embedded SOCs.
> 

I think it's a bit more complicated then that: If you're calling
dma_map_resource() to program the IOMMU then I'm pretty sure you'd want
to use the pci_resource_start() address as the phys_addr_t. If you're
bypassing the root complex (like the current p2pdma code enforces), then
you'd simply use a pci_bus_address() directly as the dma_addr and would
not program the IOMMU at all seeing it's not involved (which is what is
currently done).

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  8:12   ` Koenig, Christian
  2019-05-23  9:43     ` Christoph Hellwig
@ 2019-05-23 15:47     ` Logan Gunthorpe
  1 sibling, 0 replies; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-23 15:47 UTC (permalink / raw)
  To: Koenig, Christian, Christoph Hellwig; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-23 2:12 a.m., Koenig, Christian wrote:
>> Are you DMA-mapping the addresses outside the P2PDMA code? If so there's
>> a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If
>> you're not dma-mapping then I can't see how it could work because the
>> IOMMU should reject any requests to access those addresses.
> 
> Well, we are using the DMA API (dma_map_resource) for this. If the P2P 
> code is not using this then I would rather say that the P2P code is 
> actually broken.

No, the P2P code is doing what it was designed to do: peer-to-peer
without going through the root complex. If you DMA map in the presence
of an IOMMU, then you force the TLPs to go through the root complex.
This is not what we want and will not be supported by some RCs so you
effectively drop support for everything but the one entry you've added
in the white list. If we're going to start allowing transactions to go
through an IOMMU then the existing users of P2PDMA needs to be updated
to support this without dropping support for existing use cases.

>> By adding the whitelist in this way you will break any user that
>> attempts to use P2P in nvme-fabrics on whitelisted RCs with an IOMMU
>> enabled.
>>
>> Currently, the users of P2PDMA use pci_p2pdma_map_sg() which only
>> returns the PCI bus address. If P2PDMA transactions can now go through
>> an IOMMU, then this is wrong and broken.
>>
>> We need to ensure that all users of P2PDMA map this memory in the same
>> way. Which means, if the TLPs will go through an IOMMU they get
>> dma-map'd and, if they don't, they use the PCI Bus address (as the
>> current code does).
> 
> Well that is exactly what dma_map_resource() already does, so we should 
> probably just make using the DMA API mandatory.

No, it's absolutely not what dma_map_resource() does. DMA map resource
only adds IOVA addresses in the IOMMU for a physical address (if an
IOMMU is present). It does not determine whether that's appropriate to
do for a given transaction. Such logic would belong in pci_p2pdma_map_*
helpers.

>> Without the change proposed in this patch, I have to retract my review
>> and NAK your patch until we can sort out the mapping issues.
> 
> Yeah, completely agree we can't do that right now.

Why not? It's not that complicated. We just need a way for
pci_p2pdma_map_* to determine if the transfer path involves the IOMMU.
If it does, call an existing dma_map_* function, if it does not then do
what it currently does and use the PCI bus address. This probably means
creating some kind of cache with flags for the distance between devices.

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23 10:06           ` Koenig, Christian
@ 2019-05-23 10:26             ` Christoph Hellwig
  2019-05-23 15:59               ` Logan Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2019-05-23 10:26 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-pci, Bjorn Helgaas

On Thu, May 23, 2019 at 10:06:28AM +0000, Koenig, Christian wrote:
> Ok, we certainly don't have a system which exercise this user case. 
> Could ask around if we have an ARM SOC with that properties somewhere.
> 
> But asking the other way around: Where is the right place to start 
> fixing all this? dma_map_resource()?

That is the the big gorrilla in the room.  The offset applies to the
device whos BARs/resources we map.  The current dma_map_resource API
does not even have the right information.  So I think we need to
enhance the API to pass a second struct device and we could fix it
there and then in the next steps add a map_sg version of
dma_map_resource and eventually also convert the PCIe P2P map_sg
over to that.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  9:50         ` Christoph Hellwig
@ 2019-05-23 10:06           ` Koenig, Christian
  2019-05-23 10:26             ` Christoph Hellwig
  2019-05-23 15:53           ` Logan Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Koenig, Christian @ 2019-05-23 10:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Logan Gunthorpe, linux-pci, Bjorn Helgaas

Am 23.05.19 um 11:50 schrieb Christoph Hellwig:
> [CAUTION: External Email]
>
> On Thu, May 23, 2019 at 09:48:40AM +0000, Koenig, Christian wrote:
>> I don't adjust the address manually anywhere. I just call
>> dma_map_resource() and use the resulting DMA address to access the other
>> devices PCI BAR.
>>
>> At least on my test system (AMD CPU + AMD GPUs) this seems to work
>> totally fine. Currently trying to find time and an Intel box to test it
>> there as well.
> The problem shows up if pci_bus_address() returns a different address
> than pci_resource_start(), should be easy to check if that happens.
> IIRC it is something mostly seen on embedded SOCs.

Ok, we certainly don't have a system which exercise this user case. 
Could ask around if we have an ARM SOC with that properties somewhere.

But asking the other way around: Where is the right place to start 
fixing all this? dma_map_resource()?

Christian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  9:48       ` Koenig, Christian
@ 2019-05-23  9:50         ` Christoph Hellwig
  2019-05-23 10:06           ` Koenig, Christian
  2019-05-23 15:53           ` Logan Gunthorpe
  2019-05-23 16:14         ` Logan Gunthorpe
  1 sibling, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2019-05-23  9:50 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-pci, Bjorn Helgaas

On Thu, May 23, 2019 at 09:48:40AM +0000, Koenig, Christian wrote:
> I don't adjust the address manually anywhere. I just call 
> dma_map_resource() and use the resulting DMA address to access the other 
> devices PCI BAR.
> 
> At least on my test system (AMD CPU + AMD GPUs) this seems to work 
> totally fine. Currently trying to find time and an Intel box to test it 
> there as well.

The problem shows up if pci_bus_address() returns a different address
than pci_resource_start(), should be easy to check if that happens.
IIRC it is something mostly seen on embedded SOCs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  9:43     ` Christoph Hellwig
@ 2019-05-23  9:48       ` Koenig, Christian
  2019-05-23  9:50         ` Christoph Hellwig
  2019-05-23 16:14         ` Logan Gunthorpe
  0 siblings, 2 replies; 23+ messages in thread
From: Koenig, Christian @ 2019-05-23  9:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Logan Gunthorpe, linux-pci, Bjorn Helgaas

Am 23.05.19 um 11:43 schrieb Christoph Hellwig:
> [CAUTION: External Email]
>
> On Thu, May 23, 2019 at 08:12:18AM +0000, Koenig, Christian wrote:
>>> Are you DMA-mapping the addresses outside the P2PDMA code? If so there's
>>> a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If
>>> you're not dma-mapping then I can't see how it could work because the
>>> IOMMU should reject any requests to access those addresses.
>> Well, we are using the DMA API (dma_map_resource) for this. If the P2P
>> code is not using this then I would rather say that the P2P code is
>> actually broken.
>>
>> Adding Christoph as well, cause he is usually the one discussion stuff
>> like that with me.
> Heh.  Actually dma_map_resource-ish APIs are the right thing to do,
> but I'm not sure how you managed to be able to use it for PCIe P2P
> yet, as it fails to account for any difference in the PCIe level
> "physical" address with the hosts view of "physical" addresses.
>
> Do these offsets now how up on AMD platforms?  Do you adjust for them
> elsewhere?

I don't adjust the address manually anywhere. I just call 
dma_map_resource() and use the resulting DMA address to access the other 
devices PCI BAR.

At least on my test system (AMD CPU + AMD GPUs) this seems to work 
totally fine. Currently trying to find time and an Intel box to test it 
there as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-23  8:12   ` Koenig, Christian
@ 2019-05-23  9:43     ` Christoph Hellwig
  2019-05-23  9:48       ` Koenig, Christian
  2019-05-23 15:47     ` Logan Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2019-05-23  9:43 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Logan Gunthorpe, Christoph Hellwig, linux-pci, Bjorn Helgaas

On Thu, May 23, 2019 at 08:12:18AM +0000, Koenig, Christian wrote:
> > Are you DMA-mapping the addresses outside the P2PDMA code? If so there's
> > a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If
> > you're not dma-mapping then I can't see how it could work because the
> > IOMMU should reject any requests to access those addresses.
> 
> Well, we are using the DMA API (dma_map_resource) for this. If the P2P 
> code is not using this then I would rather say that the P2P code is 
> actually broken.
> 
> Adding Christoph as well, cause he is usually the one discussion stuff 
> like that with me.

Heh.  Actually dma_map_resource-ish APIs are the right thing to do,
but I'm not sure how you managed to be able to use it for PCIe P2P
yet, as it fails to account for any difference in the PCIe level
"physical" address with the hosts view of "physical" addresses.

Do these offsets now how up on AMD platforms?  Do you adjust for them
elsewhere?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
  2019-05-22 20:41 ` Logan Gunthorpe
@ 2019-05-23  8:12   ` Koenig, Christian
  2019-05-23  9:43     ` Christoph Hellwig
  2019-05-23 15:47     ` Logan Gunthorpe
  0 siblings, 2 replies; 23+ messages in thread
From: Koenig, Christian @ 2019-05-23  8:12 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig; +Cc: linux-pci, Bjorn Helgaas

Am 22.05.19 um 22:41 schrieb Logan Gunthorpe:
> [CAUTION: External Email]
>
> On 2019-05-22 2:30 p.m., Koenig, Christian wrote:
>> Hi Logan,
>>
>> Am 22.05.2019 22:12 schrieb Logan Gunthorpe <logang@deltatee.com>:
>>
>>     [CAUTION: External Email]
>>
>>     Presently, there is no path to DMA map P2PDMA memory, so if a TLP
>>     targeting this memory hits the root complex and an IOMMU is present,
>>     the IOMMU will reject the transaction, even if the RC would support
>>     P2PDMA.
>>
>>     So until the kernel knows to map these DMA addresses in the IOMMU,
>>     we should not enable the whitelist when an IOMMU is present.
>>
>>
>> Well NAK, cause that is exactly what we are doing.
>
> Are you DMA-mapping the addresses outside the P2PDMA code? If so there's
> a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If
> you're not dma-mapping then I can't see how it could work because the
> IOMMU should reject any requests to access those addresses.

Well, we are using the DMA API (dma_map_resource) for this. If the P2P 
code is not using this then I would rather say that the P2P code is 
actually broken.

Adding Christoph as well, cause he is usually the one discussion stuff 
like that with me.

> By adding the whitelist in this way you will break any user that
> attempts to use P2P in nvme-fabrics on whitelisted RCs with an IOMMU
> enabled.
>
> Currently, the users of P2PDMA use pci_p2pdma_map_sg() which only
> returns the PCI bus address. If P2PDMA transactions can now go through
> an IOMMU, then this is wrong and broken.
>
> We need to ensure that all users of P2PDMA map this memory in the same
> way. Which means, if the TLPs will go through an IOMMU they get
> dma-map'd and, if they don't, they use the PCI Bus address (as the
> current code does).

Well that is exactly what dma_map_resource() already does, so we should 
probably just make using the DMA API mandatory.

> Without the change proposed in this patch, I have to retract my review
> and NAK your patch until we can sort out the mapping issues.

Yeah, completely agree we can't do that right now.

Regards,
Christian.

>
>
> Logan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present
       [not found] <a98bff67-a76e-4ddc-a317-96f2bdc9af72@email.android.com>
@ 2019-05-22 20:41 ` Logan Gunthorpe
  2019-05-23  8:12   ` Koenig, Christian
  0 siblings, 1 reply; 23+ messages in thread
From: Logan Gunthorpe @ 2019-05-22 20:41 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: linux-pci, Bjorn Helgaas



On 2019-05-22 2:30 p.m., Koenig, Christian wrote:
> Hi Logan,
> 
> Am 22.05.2019 22:12 schrieb Logan Gunthorpe <logang@deltatee.com>:
> 
>     [CAUTION: External Email]
> 
>     Presently, there is no path to DMA map P2PDMA memory, so if a TLP
>     targeting this memory hits the root complex and an IOMMU is present,
>     the IOMMU will reject the transaction, even if the RC would support
>     P2PDMA.
> 
>     So until the kernel knows to map these DMA addresses in the IOMMU,
>     we should not enable the whitelist when an IOMMU is present.
> 
> 
> Well NAK, cause that is exactly what we are doing.

Are you DMA-mapping the addresses outside the P2PDMA code? If so there's 
a huge mismatch with the existing users of P2PDMA (nvme-fabrics). If 
you're not dma-mapping then I can't see how it could work because the 
IOMMU should reject any requests to access those addresses.

By adding the whitelist in this way you will break any user that 
attempts to use P2P in nvme-fabrics on whitelisted RCs with an IOMMU 
enabled.

Currently, the users of P2PDMA use pci_p2pdma_map_sg() which only 
returns the PCI bus address. If P2PDMA transactions can now go through 
an IOMMU, then this is wrong and broken.

We need to ensure that all users of P2PDMA map this memory in the same 
way. Which means, if the TLPs will go through an IOMMU they get 
dma-map'd and, if they don't, they use the PCI Bus address (as the 
current code does).

Without the change proposed in this patch, I have to retract my review 
and NAK your patch until we can sort out the mapping issues.

Logan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-06-19  9:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22 20:12 [PATCH] PCI/P2PDMA: Root complex whitelist should not apply when an IOMMU is present Logan Gunthorpe
2019-06-18 20:40 ` Bjorn Helgaas
2019-06-18 20:51   ` Logan Gunthorpe
2019-06-18 23:50     ` Logan Gunthorpe
2019-06-19  9:26       ` Koenig, Christian
2019-06-19  9:29         ` Christoph Hellwig
2019-06-19  9:39           ` Koenig, Christian
     [not found] <a98bff67-a76e-4ddc-a317-96f2bdc9af72@email.android.com>
2019-05-22 20:41 ` Logan Gunthorpe
2019-05-23  8:12   ` Koenig, Christian
2019-05-23  9:43     ` Christoph Hellwig
2019-05-23  9:48       ` Koenig, Christian
2019-05-23  9:50         ` Christoph Hellwig
2019-05-23 10:06           ` Koenig, Christian
2019-05-23 10:26             ` Christoph Hellwig
2019-05-23 15:59               ` Logan Gunthorpe
2019-05-23 15:53           ` Logan Gunthorpe
2019-05-23 15:59             ` Christoph Hellwig
2019-05-24 12:40               ` Koenig, Christian
2019-05-24 14:12                 ` Christoph Hellwig
2019-05-24 16:18                   ` Logan Gunthorpe
2019-05-24 16:06                 ` Logan Gunthorpe
2019-05-23 16:14         ` Logan Gunthorpe
2019-05-23 15:47     ` Logan Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).