linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: "Stephen Bates" <sbates@raithlin.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Jerome Glisse" <jglisse@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Keith Busch <keith.busch@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jason Gunthorpe <jgg@mellanox.com>,
	Max Gurtovoy <maxg@mellanox.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches
Date: Thu, 10 May 2018 10:32:22 -0600	[thread overview]
Message-ID: <f0101988-cf82-7ffb-2aca-d3b8d2f4b504@deltatee.com> (raw)
In-Reply-To: <94C8FE12-7FC3-48BD-9DCA-E6A427E71810@raithlin.com>



On 10/05/18 08:16 AM, Stephen  Bates wrote:
> Hi Christian
> 
>> Why would a switch not identify that as a peer address? We use the PASID 
>>    together with ATS to identify the address space which a transaction 
>>    should use.
> 
> I think you are conflating two types of TLPs here. If the device supports ATS then it will issue a TR TLP to obtain a translated address from the IOMMU. This TR TLP will be addressed to the RP and so regardless of ACS it is going up to the Root Port. When it gets the response it gets the physical address and can use that with the TA bit set for the p2pdma. In the case of ATS support we also have more control over ACS as we can disable it just for TA addresses (as per 7.7.7.7.2 of the spec).

Yes. Remember if we are using the IOMMU the EP is being programmed
(regardless of whether it's a DMA engine, NTB window or GPUVA) with an
IOVA address which is separate from the device's PCI bus address. Any
packet addressed to an IOVA address is going to go back to the root
complex no matter what the ACS bits say. Only once ATS translates the
addres back into the PCI bus address will the EP send packets to the
peer and the switch will attempt to root them to the peer and only then
do the ACS bits apply. And the direct translated ACS bit allows packets
that have purportedly been translated through.

>  >   If I'm not completely mistaken when you disable ACS it is perfectly 
>  >   possible that a bridge identifies a transaction as belonging to a peer 
>  >   address, which isn't what we want here.
>    
> You are right here and I think this illustrates a problem for using the IOMMU at all when P2PDMA devices do not support ATS. Let me explain:
> 
> If we want to do a P2PDMA and the DMA device does not support ATS then I think we have to disable the IOMMU (something Mike suggested earlier). The reason is that since ATS is not an option the EP must initiate the DMA using the addresses passed down to it. If the IOMMU is on then this is an IOVA that could (with some non-zero probability) point to an IO Memory address in the same PCI domain. So if we disable ACS we are in trouble as we might MemWr to the wrong place but if we enable ACS we lose much of the benefit of P2PDMA. Disabling the IOMMU removes the IOVA risk and ironically also resolves the IOMMU grouping issues.
> So I think if we want to support performant P2PDMA for devices that don't have ATS (and no NVMe SSDs today support ATS) then we have to disable the IOMMU. I know this is problematic for AMDs use case so perhaps we also need to consider a mode for P2PDMA for devices that DO support ATS where we can enable the IOMMU (but in this case EPs without ATS cannot participate as P2PDMA DMA iniators).
> 
> Make sense?

Not to me. In the p2pdma code we specifically program DMA engines with
the PCI bus address. So regardless of whether we are using the IOMMU or
not, the packets will be forwarded directly to the peer. If the ACS
Redir bits are on they will be forced back to the RC by the switch and
the transaction will fail. If we clear the ACS bits, the TLPs will go
where we want and everything will work (but we lose the isolation of ACS).

For EPs that support ATS, we should (but don't necessarily have to)
program them with the IOVA address so they can go through the
translation process which will allow P2P without disabling the ACS Redir
bits -- provided the ACS direct translation bit is set. (And btw, if it
is, then we lose the benefit of ACS protecting against malicious EPs).
But, per above, the ATS transaction should involve only the IOVA address
so the ACS bits not being set should not break ATS.

Logan

  parent reply	other threads:[~2018-05-10 16:33 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-23 23:30 [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 01/14] PCI/P2PDMA: Support peer-to-peer memory Logan Gunthorpe
2018-05-07 23:00   ` Bjorn Helgaas
2018-05-07 23:09     ` Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 02/14] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 03/14] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-05-07 23:02   ` Bjorn Helgaas
2018-04-23 23:30 ` [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-04-24  3:33   ` Randy Dunlap
2018-05-07 23:13   ` Bjorn Helgaas
2018-05-08  7:17     ` Christian König
2018-05-08 14:25       ` Stephen  Bates
2018-05-08 16:37         ` Christian König
2018-05-08 16:27       ` Logan Gunthorpe
2018-05-08 16:50         ` Christian König
2018-05-08 19:13           ` Logan Gunthorpe
2018-05-08 19:34             ` Alex Williamson
2018-05-08 19:45               ` Logan Gunthorpe
2018-05-08 20:13                 ` Alex Williamson
2018-05-08 20:19                   ` Logan Gunthorpe
2018-05-08 20:43                     ` Alex Williamson
2018-05-08 20:49                       ` Logan Gunthorpe
2018-05-08 21:26                         ` Alex Williamson
2018-05-08 21:42                           ` Stephen  Bates
2018-05-08 22:03                             ` Alex Williamson
2018-05-08 22:10                               ` Logan Gunthorpe
2018-05-08 22:25                                 ` Stephen  Bates
2018-05-08 23:11                                   ` Alex Williamson
2018-05-08 23:31                                     ` Logan Gunthorpe
2018-05-09  0:17                                       ` Alex Williamson
2018-05-08 22:32                                 ` Alex Williamson
2018-05-08 23:00                                   ` Dan Williams
2018-05-08 23:15                                     ` Logan Gunthorpe
2018-05-09 12:38                                       ` Stephen  Bates
2018-05-08 22:21                               ` Don Dutile
2018-05-09 12:44                                 ` Stephen  Bates
2018-05-09 15:58                                   ` Don Dutile
2018-05-08 20:50                     ` Jerome Glisse
2018-05-08 21:35                       ` Stephen  Bates
2018-05-09 13:12                       ` Stephen  Bates
2018-05-09 13:40                         ` Christian König
2018-05-09 15:41                           ` Stephen  Bates
2018-05-09 16:07                             ` Jerome Glisse
2018-05-09 16:30                               ` Stephen  Bates
2018-05-09 17:49                                 ` Jerome Glisse
2018-05-10 14:20                                   ` Stephen  Bates
2018-05-10 14:29                                     ` Christian König
2018-05-10 14:59                                       ` Jerome Glisse
2018-05-10 18:44                                         ` Stephen  Bates
2018-05-09 16:45                           ` Logan Gunthorpe
2018-05-10 12:52                             ` Christian König
2018-05-10 14:16                               ` Stephen  Bates
2018-05-10 14:41                                 ` Jerome Glisse
2018-05-10 18:41                                   ` Stephen  Bates
2018-05-10 18:59                                     ` Logan Gunthorpe
2018-05-10 19:10                                     ` Alex Williamson
2018-05-10 19:24                                       ` Jerome Glisse
2018-05-10 16:32                                 ` Logan Gunthorpe [this message]
2018-05-10 17:11                                   ` Stephen  Bates
2018-05-10 17:15                                     ` Logan Gunthorpe
2018-05-11  8:52                                       ` Christian König
2018-05-11 15:48                                         ` Logan Gunthorpe
2018-05-11 21:50                                           ` Stephen  Bates
2018-05-11 22:24                                             ` Stephen  Bates
2018-05-11 22:55                                               ` Logan Gunthorpe
2018-05-08 14:31   ` Dan Williams
2018-05-08 14:44     ` Stephen  Bates
2018-05-08 21:04       ` Don Dutile
2018-05-08 21:27         ` Stephen  Bates
2018-05-08 23:06           ` Don Dutile
2018-05-09  0:01             ` Alex Williamson
2018-05-09 12:35               ` Stephen  Bates
2018-05-09 14:44                 ` Alex Williamson
2018-05-09 15:52                   ` Don Dutile
2018-05-09 15:47               ` Don Dutile
2018-05-09 15:53           ` Don Dutile
2018-04-23 23:30 ` [PATCH v4 05/14] docs-rst: Add a new directory for PCI documentation Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 06/14] PCI/P2PDMA: Add P2P DMA driver writer's documentation Logan Gunthorpe
2018-05-07 23:20   ` Bjorn Helgaas
2018-05-22 21:24   ` Randy Dunlap
2018-05-22 21:28     ` Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 07/14] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 08/14] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 09/14] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 10/14] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 11/14] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 12/14] nvmet: Introduce helper functions to allocate and free request SGLs Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 13/14] nvmet-rdma: Use new SGL alloc/free helper for requests Logan Gunthorpe
2018-04-23 23:30 ` [PATCH v4 14/14] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-05-02 11:51 ` [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory Christian König
2018-05-02 15:56   ` Logan Gunthorpe
2018-05-03  9:05     ` Christian König
2018-05-03 15:59       ` Logan Gunthorpe
2018-05-03 17:29         ` Christian König
2018-05-03 18:43           ` Logan Gunthorpe
2018-05-04 14:27             ` Christian König
2018-05-04 15:52               ` Logan Gunthorpe
2018-05-07 23:23 ` Bjorn Helgaas
2018-05-07 23:34   ` Logan Gunthorpe
2018-05-08 16:57   ` Alex Williamson
2018-05-08 19:14     ` Logan Gunthorpe
2018-05-08 21:25     ` Don Dutile
2018-05-08 21:40       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0101988-cf82-7ffb-2aca-d3b8d2f4b504@deltatee.com \
    --to=logang@deltatee.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).