All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
Date: Fri, 31 Aug 2018 17:19:06 +0100	[thread overview]
Message-ID: <20180831171906.00002751@huawei.com> (raw)
In-Reply-To: <20180830185352.3369-2-logang@deltatee.com>

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
	"Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Keith Busch" <keith.busch@intel.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
Date: Fri, 31 Aug 2018 17:19:06 +0100	[thread overview]
Message-ID: <20180831171906.00002751@huawei.com> (raw)
In-Reply-To: <20180830185352.3369-2-logang@deltatee.com>

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <jonathan.cameron-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Cc: "Alex Williamson"
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Jérôme Glisse" <jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Jason Gunthorpe" <jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	"Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>,
	"Benjamin Herrenschmidt"
	<benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>,
	"Bjorn Helgaas"
	<bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	"Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	"Christoph Hellwig" <hch-jcswGhMUV9g@public.gmane.org>
Subject: Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
Date: Fri, 31 Aug 2018 17:19:06 +0100	[thread overview]
Message-ID: <20180831171906.00002751@huawei.com> (raw)
In-Reply-To: <20180830185352.3369-2-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: "Keith Busch" <keith.busch@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	"Stephen Bates" <sbates@raithlin.com>,
	linux-block@vger.kernel.org, "Jérôme Glisse" <jglisse@redhat.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
Date: Fri, 31 Aug 2018 17:19:06 +0100	[thread overview]
Message-ID: <20180831171906.00002751@huawei.com> (raw)
In-Reply-To: <20180830185352.3369-2-logang@deltatee.com>

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: jonathan.cameron@huawei.com (Jonathan Cameron)
Subject: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
Date: Fri, 31 Aug 2018 17:19:06 +0100	[thread overview]
Message-ID: <20180831171906.00002751@huawei.com> (raw)
In-Reply-To: <20180830185352.3369-2-logang@deltatee.com>

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Logan Gunthorpe <logang at deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

  parent reply	other threads:[~2018-08-31 16:19 UTC|newest]

Thread overview: 265+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 18:53 [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  8:04   ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31 15:48     ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-09-01  8:27       ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-08-31 16:19   ` Jonathan Cameron [this message]
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:26     ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:34   ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31 15:44     ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31  8:08   ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31 15:51     ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 17:38       ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 19:11         ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 19:11   ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:17     ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:19       ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-09-01  8:28     ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-03 22:26       ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-05 19:26         ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:33           ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:45             ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:53               ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:56               ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:54                 ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 20:11                   ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:09                     ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:14                       ` Jens Axboe
2018-09-05 20:14                         ` Jens Axboe
2018-09-05 20:14                         ` Jens Axboe
2018-09-05 20:18                         ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:19                           ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:32                             ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:36                               ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 21:03                                 ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:13                                   ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:18                                   ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-10 16:41                                   ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 18:11                                     ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-11  7:10                                       ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-08-30 18:53 ` [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:18   ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-30 18:53 ` [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-09-04 15:16   ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:47     ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-05 19:22       ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-08-30 18:53 ` [PATCH v5 11/13] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:14   ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-30 18:53 ` [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:25   ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31 15:41     ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-30 19:20 ` [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:30   ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180831171906.00002751@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maxg@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.