All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 18:53 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Hi Everyone,

Now that the patchset which creates a command line option to disable
ACS redirection has landed it's time to revisit the P2P patchset for
copy offoad in NVMe fabrics.

I present version 5 wihch no longer does any magic with the ACS bits and
instead will reject P2P transactions between devices that would be affected
by them. A few other cleanups were done which are described in the
changelog below.

This version is based on v4.19-rc1 and a git repo is here:

https://github.com/sbates130272/linux-p2pmem pci-p2p-v5

Thanks,

Logan

--

Changes in v5:

* Rebased on v4.19-rc1

* Drop changing ACS settings in this patchset. Now, the code
  will only allow P2P transactions between devices whos
  downstream ports do not restrict P2P TLPs.

* Drop the REQ_PCI_P2PDMA block flag and instead use
  is_pci_p2pdma_page() to tell if a request is P2P or not. In that
  case we check for queue support and enforce using REQ_NOMERGE.
  Per feedback from Christoph.

* Drop the pci_p2pdma_unmap_sg() function as it was empty and only
  there for symmetry and compatibility with dma_unmap_sg. Per feedback
  from Christoph.

* Split off the logic to handle enabling P2P in NVMe fabrics' configfs
  into specific helpers in the p2pdma code. Per feedback from Christoph.

* A number of other minor cleanups and fixes as pointed out by
  Christoph and others.

Changes in v4:

* Change the original upstream_bridges_match() function to
  upstream_bridge_distance() which calculates the distance between two
  devices as long as they are behind the same root port. This should
  address Bjorn's concerns that the code was to focused on
  being behind a single switch.

* The disable ACS function now disables ACS for all bridge ports instead
  of switch ports (ie. those that had two upstream_bridge ports).

* Change the pci_p2pmem_alloc_sgl() and pci_p2pmem_free_sgl()
  API to be more like sgl_alloc() in that the alloc function returns
  the allocated scatterlist and nents is not required bythe free
  function.

* Moved the new documentation into the driver-api tree as requested
  by Jonathan

* Add SGL alloc and free helpers in the nvmet code so that the
  individual drivers can share the code that allocates P2P memory.
  As requested by Christoph.

* Cleanup the nvmet_p2pmem_store() function as Christoph
  thought my first attempt was ugly.

* Numerous commit message and comment fix-ups

Changes in v3:

* Many more fixes and minor cleanups that were spotted by Bjorn

* Additional explanation of the ACS change in both the commit message
  and Kconfig doc. Also, the code that disables the ACS bits is surrounded
  explicitly by an #ifdef

* Removed the flag we added to rdma_rw_ctx() in favour of using
  is_pci_p2pdma_page(), as suggested by Sagi.

* Adjust pci_p2pmem_find() so that it prefers P2P providers that
  are closest to (or the same as) the clients using them. In cases
  of ties, the provider is randomly chosen.

* Modify the NVMe Target code so that the PCI device name of the provider
  may be explicitly specified, bypassing the logic in pci_p2pmem_find().
  (Note: it's still enforced that the provider must be behind the
   same switch as the clients).

* As requested by Bjorn, added documentation for driver writers.


Changes in v2:

* Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
  as a bunch of cleanup and spelling fixes he pointed out in the last
  series.

* To address Alex's ACS concerns, we change to a simpler method of
  just disabling ACS behind switches for any kernel that has
  CONFIG_PCI_P2PDMA.

* We also reject using devices that employ 'dma_virt_ops' which should
  fairly simply handle Jason's concerns that this work might break with
  the HFI, QIB and rxe drivers that use the virtual ops to implement
  their own special DMA operations.

--

This is a continuation of our work to enable using Peer-to-Peer PCI
memory in the kernel with initial support for the NVMe fabrics target
subsystem. Many thanks go to Christoph Hellwig who provided valuable
feedback to get these patches to where they are today.

The concept here is to use memory that's exposed on a PCI BAR as
data buffers in the NVMe target code such that data can be transferred
from an RDMA NIC to the special memory and then directly to an NVMe
device avoiding system memory entirely. The upside of this is better
QoS for applications running on the CPU utilizing memory and lower
PCI bandwidth required to the CPU (such that systems could be designed
with fewer lanes connected to the CPU).

Due to these trade-offs we've designed the system to only enable using
the PCI memory in cases where the NIC, NVMe devices and memory are all
behind the same PCI switch hierarchy. This will mean many setups that
could likely work well will not be supported so that we can be more
confident it will work and not place any responsibility on the user to
understand their topology. (We chose to go this route based on feedback
we received at the last LSF). Future work may enable these transfers
using a white list of known good root complexes. However, at this time,
there is no reliable way to ensure that Peer-to-Peer transactions are
permitted between PCI Root Ports.

In order to enable this functionality, we introduce a few new PCI
functions such that a driver can register P2P memory with the system.
Struct pages are created for this memory using devm_memremap_pages()
and the PCI bus offset is stored in the corresponding pagemap structure.

When the PCI P2PDMA config option is selected the ACS bits in every
bridge port in the system are turned off to allow traffic to
pass freely behind the root port. At this time, the bit must be disabled
at boot so the IOMMU subsystem can correctly create the groups, though
this could be addressed in the future. There is no way to dynamically
disable the bit and alter the groups.

Another set of functions allow a client driver to create a list of
client devices that will be used in a given P2P transactions and then
use that list to find any P2P memory that is supported by all the
client devices.

In the block layer, we also introduce a P2P request flag to indicate a
given request targets P2P memory as well as a flag for a request queue
to indicate a given queue supports targeting P2P memory. P2P requests
will only be accepted by queues that support it. Also, P2P requests
are marked to not be merged seeing a non-homogenous request would
complicate the DMA mapping requirements.

In the PCI NVMe driver, we modify the existing CMB support to utilize
the new PCI P2P memory infrastructure and also add support for P2P
memory in its request queue. When a P2P request is received it uses the
pci_p2pmem_map_sg() function which applies the necessary transformation
to get the corrent pci_bus_addr_t for the DMA transactions.

In the RDMA core, we also adjust rdma_rw_ctx_init() and
rdma_rw_ctx_destroy() to take a flags argument which indicates whether
to use the PCI P2P mapping functions or not. To avoid odd RDMA devices
that don't use the proper DMA infrastructure this code rejects using
any device that employs the virt_dma_ops implementation.

Finally, in the NVMe fabrics target port we introduce a new
configuration boolean: 'allow_p2pmem'. When set, the port will attempt
to find P2P memory supported by the RDMA NIC and all namespaces. If
supported memory is found, it will be used in all IO transfers. And if
a port is using P2P memory, adding new namespaces that are not supported
by that memory will fail.

These patches have been tested on a number of Intel based systems and
for a variety of RDMA NICs (Mellanox, Broadcomm, Chelsio) and NVMe
SSDs (Intel, Seagate, Samsung) and p2pdma devices (Eideticom,
Microsemi, Chelsio and Everspin) using switches from both Microsemi
and Broadcomm.

Logan Gunthorpe (13):
  PCI/P2PDMA: Support peer-to-peer memory
  PCI/P2PDMA: Add sysfs group to display p2pmem stats
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  block: Add PCI P2P flag for request queue and check support for
    requests
  IB/core: Ensure we map P2P memory correctly in
    rdma_rw_ctx_[init|destroy]()
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Add a quirk for a pseudo CMB
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvmet: Optionally use PCI P2P memory

 Documentation/ABI/testing/sysfs-bus-pci    |  25 +
 Documentation/driver-api/index.rst         |   2 +-
 Documentation/driver-api/pci/index.rst     |  21 +
 Documentation/driver-api/pci/p2pdma.rst    | 170 ++++++
 Documentation/driver-api/{ => pci}/pci.rst |   0
 block/blk-core.c                           |  14 +
 drivers/infiniband/core/rw.c               |  11 +-
 drivers/nvme/host/core.c                   |   4 +
 drivers/nvme/host/nvme.h                   |   8 +
 drivers/nvme/host/pci.c                    | 121 ++--
 drivers/nvme/target/configfs.c             |  36 ++
 drivers/nvme/target/core.c                 | 149 +++++
 drivers/nvme/target/nvmet.h                |  15 +
 drivers/nvme/target/rdma.c                 |  22 +-
 drivers/pci/Kconfig                        |  17 +
 drivers/pci/Makefile                       |   1 +
 drivers/pci/p2pdma.c                       | 941 +++++++++++++++++++++++++++++
 include/linux/blkdev.h                     |   3 +
 include/linux/memremap.h                   |   6 +
 include/linux/mm.h                         |  18 +
 include/linux/pci-p2pdma.h                 | 124 ++++
 include/linux/pci.h                        |   4 +
 22 files changed, 1658 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

--
2.11.0
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 18:53 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Hi Everyone,

Now that the patchset which creates a command line option to disable
ACS redirection has landed it's time to revisit the P2P patchset for
copy offoad in NVMe fabrics.

I present version 5 wihch no longer does any magic with the ACS bits and
instead will reject P2P transactions between devices that would be affected
by them. A few other cleanups were done which are described in the
changelog below.

This version is based on v4.19-rc1 and a git repo is here:

https://github.com/sbates130272/linux-p2pmem pci-p2p-v5

Thanks,

Logan

--

Changes in v5:

* Rebased on v4.19-rc1

* Drop changing ACS settings in this patchset. Now, the code
  will only allow P2P transactions between devices whos
  downstream ports do not restrict P2P TLPs.

* Drop the REQ_PCI_P2PDMA block flag and instead use
  is_pci_p2pdma_page() to tell if a request is P2P or not. In that
  case we check for queue support and enforce using REQ_NOMERGE.
  Per feedback from Christoph.

* Drop the pci_p2pdma_unmap_sg() function as it was empty and only
  there for symmetry and compatibility with dma_unmap_sg. Per feedback
  from Christoph.

* Split off the logic to handle enabling P2P in NVMe fabrics' configfs
  into specific helpers in the p2pdma code. Per feedback from Christoph.

* A number of other minor cleanups and fixes as pointed out by
  Christoph and others.

Changes in v4:

* Change the original upstream_bridges_match() function to
  upstream_bridge_distance() which calculates the distance between two
  devices as long as they are behind the same root port. This should
  address Bjorn's concerns that the code was to focused on
  being behind a single switch.

* The disable ACS function now disables ACS for all bridge ports instead
  of switch ports (ie. those that had two upstream_bridge ports).

* Change the pci_p2pmem_alloc_sgl() and pci_p2pmem_free_sgl()
  API to be more like sgl_alloc() in that the alloc function returns
  the allocated scatterlist and nents is not required bythe free
  function.

* Moved the new documentation into the driver-api tree as requested
  by Jonathan

* Add SGL alloc and free helpers in the nvmet code so that the
  individual drivers can share the code that allocates P2P memory.
  As requested by Christoph.

* Cleanup the nvmet_p2pmem_store() function as Christoph
  thought my first attempt was ugly.

* Numerous commit message and comment fix-ups

Changes in v3:

* Many more fixes and minor cleanups that were spotted by Bjorn

* Additional explanation of the ACS change in both the commit message
  and Kconfig doc. Also, the code that disables the ACS bits is surrounded
  explicitly by an #ifdef

* Removed the flag we added to rdma_rw_ctx() in favour of using
  is_pci_p2pdma_page(), as suggested by Sagi.

* Adjust pci_p2pmem_find() so that it prefers P2P providers that
  are closest to (or the same as) the clients using them. In cases
  of ties, the provider is randomly chosen.

* Modify the NVMe Target code so that the PCI device name of the provider
  may be explicitly specified, bypassing the logic in pci_p2pmem_find().
  (Note: it's still enforced that the provider must be behind the
   same switch as the clients).

* As requested by Bjorn, added documentation for driver writers.


Changes in v2:

* Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
  as a bunch of cleanup and spelling fixes he pointed out in the last
  series.

* To address Alex's ACS concerns, we change to a simpler method of
  just disabling ACS behind switches for any kernel that has
  CONFIG_PCI_P2PDMA.

* We also reject using devices that employ 'dma_virt_ops' which should
  fairly simply handle Jason's concerns that this work might break with
  the HFI, QIB and rxe drivers that use the virtual ops to implement
  their own special DMA operations.

--

This is a continuation of our work to enable using Peer-to-Peer PCI
memory in the kernel with initial support for the NVMe fabrics target
subsystem. Many thanks go to Christoph Hellwig who provided valuable
feedback to get these patches to where they are today.

The concept here is to use memory that's exposed on a PCI BAR as
data buffers in the NVMe target code such that data can be transferred
from an RDMA NIC to the special memory and then directly to an NVMe
device avoiding system memory entirely. The upside of this is better
QoS for applications running on the CPU utilizing memory and lower
PCI bandwidth required to the CPU (such that systems could be designed
with fewer lanes connected to the CPU).

Due to these trade-offs we've designed the system to only enable using
the PCI memory in cases where the NIC, NVMe devices and memory are all
behind the same PCI switch hierarchy. This will mean many setups that
could likely work well will not be supported so that we can be more
confident it will work and not place any responsibility on the user to
understand their topology. (We chose to go this route based on feedback
we received at the last LSF). Future work may enable these transfers
using a white list of known good root complexes. However, at this time,
there is no reliable way to ensure that Peer-to-Peer transactions are
permitted between PCI Root Ports.

In order to enable this functionality, we introduce a few new PCI
functions such that a driver can register P2P memory with the system.
Struct pages are created for this memory using devm_memremap_pages()
and the PCI bus offset is stored in the corresponding pagemap structure.

When the PCI P2PDMA config option is selected the ACS bits in every
bridge port in the system are turned off to allow traffic to
pass freely behind the root port. At this time, the bit must be disabled
at boot so the IOMMU subsystem can correctly create the groups, though
this could be addressed in the future. There is no way to dynamically
disable the bit and alter the groups.

Another set of functions allow a client driver to create a list of
client devices that will be used in a given P2P transactions and then
use that list to find any P2P memory that is supported by all the
client devices.

In the block layer, we also introduce a P2P request flag to indicate a
given request targets P2P memory as well as a flag for a request queue
to indicate a given queue supports targeting P2P memory. P2P requests
will only be accepted by queues that support it. Also, P2P requests
are marked to not be merged seeing a non-homogenous request would
complicate the DMA mapping requirements.

In the PCI NVMe driver, we modify the existing CMB support to utilize
the new PCI P2P memory infrastructure and also add support for P2P
memory in its request queue. When a P2P request is received it uses the
pci_p2pmem_map_sg() function which applies the necessary transformation
to get the corrent pci_bus_addr_t for the DMA transactions.

In the RDMA core, we also adjust rdma_rw_ctx_init() and
rdma_rw_ctx_destroy() to take a flags argument which indicates whether
to use the PCI P2P mapping functions or not. To avoid odd RDMA devices
that don't use the proper DMA infrastructure this code rejects using
any device that employs the virt_dma_ops implementation.

Finally, in the NVMe fabrics target port we introduce a new
configuration boolean: 'allow_p2pmem'. When set, the port will attempt
to find P2P memory supported by the RDMA NIC and all namespaces. If
supported memory is found, it will be used in all IO transfers. And if
a port is using P2P memory, adding new namespaces that are not supported
by that memory will fail.

These patches have been tested on a number of Intel based systems and
for a variety of RDMA NICs (Mellanox, Broadcomm, Chelsio) and NVMe
SSDs (Intel, Seagate, Samsung) and p2pdma devices (Eideticom,
Microsemi, Chelsio and Everspin) using switches from both Microsemi
and Broadcomm.

Logan Gunthorpe (13):
  PCI/P2PDMA: Support peer-to-peer memory
  PCI/P2PDMA: Add sysfs group to display p2pmem stats
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  block: Add PCI P2P flag for request queue and check support for
    requests
  IB/core: Ensure we map P2P memory correctly in
    rdma_rw_ctx_[init|destroy]()
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Add a quirk for a pseudo CMB
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvmet: Optionally use PCI P2P memory

 Documentation/ABI/testing/sysfs-bus-pci    |  25 +
 Documentation/driver-api/index.rst         |   2 +-
 Documentation/driver-api/pci/index.rst     |  21 +
 Documentation/driver-api/pci/p2pdma.rst    | 170 ++++++
 Documentation/driver-api/{ => pci}/pci.rst |   0
 block/blk-core.c                           |  14 +
 drivers/infiniband/core/rw.c               |  11 +-
 drivers/nvme/host/core.c                   |   4 +
 drivers/nvme/host/nvme.h                   |   8 +
 drivers/nvme/host/pci.c                    | 121 ++--
 drivers/nvme/target/configfs.c             |  36 ++
 drivers/nvme/target/core.c                 | 149 +++++
 drivers/nvme/target/nvmet.h                |  15 +
 drivers/nvme/target/rdma.c                 |  22 +-
 drivers/pci/Kconfig                        |  17 +
 drivers/pci/Makefile                       |   1 +
 drivers/pci/p2pdma.c                       | 941 +++++++++++++++++++++++++++++
 include/linux/blkdev.h                     |   3 +
 include/linux/memremap.h                   |   6 +
 include/linux/mm.h                         |  18 +
 include/linux/pci-p2pdma.h                 | 124 ++++
 include/linux/pci.h                        |   4 +
 22 files changed, 1658 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

--
2.11.0

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 18:53 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Hi Everyone,

Now that the patchset which creates a command line option to disable
ACS redirection has landed it's time to revisit the P2P patchset for
copy offoad in NVMe fabrics.

I present version 5 wihch no longer does any magic with the ACS bits and
instead will reject P2P transactions between devices that would be affected
by them. A few other cleanups were done which are described in the
changelog below.

This version is based on v4.19-rc1 and a git repo is here:

https://github.com/sbates130272/linux-p2pmem pci-p2p-v5

Thanks,

Logan

--

Changes in v5:

* Rebased on v4.19-rc1

* Drop changing ACS settings in this patchset. Now, the code
  will only allow P2P transactions between devices whos
  downstream ports do not restrict P2P TLPs.

* Drop the REQ_PCI_P2PDMA block flag and instead use
  is_pci_p2pdma_page() to tell if a request is P2P or not. In that
  case we check for queue support and enforce using REQ_NOMERGE.
  Per feedback from Christoph.

* Drop the pci_p2pdma_unmap_sg() function as it was empty and only
  there for symmetry and compatibility with dma_unmap_sg. Per feedback
  from Christoph.

* Split off the logic to handle enabling P2P in NVMe fabrics' configfs
  into specific helpers in the p2pdma code. Per feedback from Christoph.

* A number of other minor cleanups and fixes as pointed out by
  Christoph and others.

Changes in v4:

* Change the original upstream_bridges_match() function to
  upstream_bridge_distance() which calculates the distance between two
  devices as long as they are behind the same root port. This should
  address Bjorn's concerns that the code was to focused on
  being behind a single switch.

* The disable ACS function now disables ACS for all bridge ports instead
  of switch ports (ie. those that had two upstream_bridge ports).

* Change the pci_p2pmem_alloc_sgl() and pci_p2pmem_free_sgl()
  API to be more like sgl_alloc() in that the alloc function returns
  the allocated scatterlist and nents is not required bythe free
  function.

* Moved the new documentation into the driver-api tree as requested
  by Jonathan

* Add SGL alloc and free helpers in the nvmet code so that the
  individual drivers can share the code that allocates P2P memory.
  As requested by Christoph.

* Cleanup the nvmet_p2pmem_store() function as Christoph
  thought my first attempt was ugly.

* Numerous commit message and comment fix-ups

Changes in v3:

* Many more fixes and minor cleanups that were spotted by Bjorn

* Additional explanation of the ACS change in both the commit message
  and Kconfig doc. Also, the code that disables the ACS bits is surrounded
  explicitly by an #ifdef

* Removed the flag we added to rdma_rw_ctx() in favour of using
  is_pci_p2pdma_page(), as suggested by Sagi.

* Adjust pci_p2pmem_find() so that it prefers P2P providers that
  are closest to (or the same as) the clients using them. In cases
  of ties, the provider is randomly chosen.

* Modify the NVMe Target code so that the PCI device name of the provider
  may be explicitly specified, bypassing the logic in pci_p2pmem_find().
  (Note: it's still enforced that the provider must be behind the
   same switch as the clients).

* As requested by Bjorn, added documentation for driver writers.


Changes in v2:

* Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
  as a bunch of cleanup and spelling fixes he pointed out in the last
  series.

* To address Alex's ACS concerns, we change to a simpler method of
  just disabling ACS behind switches for any kernel that has
  CONFIG_PCI_P2PDMA.

* We also reject using devices that employ 'dma_virt_ops' which should
  fairly simply handle Jason's concerns that this work might break with
  the HFI, QIB and rxe drivers that use the virtual ops to implement
  their own special DMA operations.

--

This is a continuation of our work to enable using Peer-to-Peer PCI
memory in the kernel with initial support for the NVMe fabrics target
subsystem. Many thanks go to Christoph Hellwig who provided valuable
feedback to get these patches to where they are today.

The concept here is to use memory that's exposed on a PCI BAR as
data buffers in the NVMe target code such that data can be transferred
from an RDMA NIC to the special memory and then directly to an NVMe
device avoiding system memory entirely. The upside of this is better
QoS for applications running on the CPU utilizing memory and lower
PCI bandwidth required to the CPU (such that systems could be designed
with fewer lanes connected to the CPU).

Due to these trade-offs we've designed the system to only enable using
the PCI memory in cases where the NIC, NVMe devices and memory are all
behind the same PCI switch hierarchy. This will mean many setups that
could likely work well will not be supported so that we can be more
confident it will work and not place any responsibility on the user to
understand their topology. (We chose to go this route based on feedback
we received at the last LSF). Future work may enable these transfers
using a white list of known good root complexes. However, at this time,
there is no reliable way to ensure that Peer-to-Peer transactions are
permitted between PCI Root Ports.

In order to enable this functionality, we introduce a few new PCI
functions such that a driver can register P2P memory with the system.
Struct pages are created for this memory using devm_memremap_pages()
and the PCI bus offset is stored in the corresponding pagemap structure.

When the PCI P2PDMA config option is selected the ACS bits in every
bridge port in the system are turned off to allow traffic to
pass freely behind the root port. At this time, the bit must be disabled
at boot so the IOMMU subsystem can correctly create the groups, though
this could be addressed in the future. There is no way to dynamically
disable the bit and alter the groups.

Another set of functions allow a client driver to create a list of
client devices that will be used in a given P2P transactions and then
use that list to find any P2P memory that is supported by all the
client devices.

In the block layer, we also introduce a P2P request flag to indicate a
given request targets P2P memory as well as a flag for a request queue
to indicate a given queue supports targeting P2P memory. P2P requests
will only be accepted by queues that support it. Also, P2P requests
are marked to not be merged seeing a non-homogenous request would
complicate the DMA mapping requirements.

In the PCI NVMe driver, we modify the existing CMB support to utilize
the new PCI P2P memory infrastructure and also add support for P2P
memory in its request queue. When a P2P request is received it uses the
pci_p2pmem_map_sg() function which applies the necessary transformation
to get the corrent pci_bus_addr_t for the DMA transactions.

In the RDMA core, we also adjust rdma_rw_ctx_init() and
rdma_rw_ctx_destroy() to take a flags argument which indicates whether
to use the PCI P2P mapping functions or not. To avoid odd RDMA devices
that don't use the proper DMA infrastructure this code rejects using
any device that employs the virt_dma_ops implementation.

Finally, in the NVMe fabrics target port we introduce a new
configuration boolean: 'allow_p2pmem'. When set, the port will attempt
to find P2P memory supported by the RDMA NIC and all namespaces. If
supported memory is found, it will be used in all IO transfers. And if
a port is using P2P memory, adding new namespaces that are not supported
by that memory will fail.

These patches have been tested on a number of Intel based systems and
for a variety of RDMA NICs (Mellanox, Broadcomm, Chelsio) and NVMe
SSDs (Intel, Seagate, Samsung) and p2pdma devices (Eideticom,
Microsemi, Chelsio and Everspin) using switches from both Microsemi
and Broadcomm.

Logan Gunthorpe (13):
  PCI/P2PDMA: Support peer-to-peer memory
  PCI/P2PDMA: Add sysfs group to display p2pmem stats
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  block: Add PCI P2P flag for request queue and check support for
    requests
  IB/core: Ensure we map P2P memory correctly in
    rdma_rw_ctx_[init|destroy]()
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Add a quirk for a pseudo CMB
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvmet: Optionally use PCI P2P memory

 Documentation/ABI/testing/sysfs-bus-pci    |  25 +
 Documentation/driver-api/index.rst         |   2 +-
 Documentation/driver-api/pci/index.rst     |  21 +
 Documentation/driver-api/pci/p2pdma.rst    | 170 ++++++
 Documentation/driver-api/{ => pci}/pci.rst |   0
 block/blk-core.c                           |  14 +
 drivers/infiniband/core/rw.c               |  11 +-
 drivers/nvme/host/core.c                   |   4 +
 drivers/nvme/host/nvme.h                   |   8 +
 drivers/nvme/host/pci.c                    | 121 ++--
 drivers/nvme/target/configfs.c             |  36 ++
 drivers/nvme/target/core.c                 | 149 +++++
 drivers/nvme/target/nvmet.h                |  15 +
 drivers/nvme/target/rdma.c                 |  22 +-
 drivers/pci/Kconfig                        |  17 +
 drivers/pci/Makefile                       |   1 +
 drivers/pci/p2pdma.c                       | 941 +++++++++++++++++++++++++++++
 include/linux/blkdev.h                     |   3 +
 include/linux/memremap.h                   |   6 +
 include/linux/mm.h                         |  18 +
 include/linux/pci-p2pdma.h                 | 124 ++++
 include/linux/pci.h                        |   4 +
 22 files changed, 1658 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

--
2.11.0

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 18:53 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Hi Everyone,

Now that the patchset which creates a command line option to disable
ACS redirection has landed it's time to revisit the P2P patchset for
copy offoad in NVMe fabrics.

I present version 5 wihch no longer does any magic with the ACS bits and
instead will reject P2P transactions between devices that would be affected
by them. A few other cleanups were done which are described in the
changelog below.

This version is based on v4.19-rc1 and a git repo is here:

https://github.com/sbates130272/linux-p2pmem pci-p2p-v5

Thanks,

Logan

--

Changes in v5:

* Rebased on v4.19-rc1

* Drop changing ACS settings in this patchset. Now, the code
  will only allow P2P transactions between devices whos
  downstream ports do not restrict P2P TLPs.

* Drop the REQ_PCI_P2PDMA block flag and instead use
  is_pci_p2pdma_page() to tell if a request is P2P or not. In that
  case we check for queue support and enforce using REQ_NOMERGE.
  Per feedback from Christoph.

* Drop the pci_p2pdma_unmap_sg() function as it was empty and only
  there for symmetry and compatibility with dma_unmap_sg. Per feedback
  from Christoph.

* Split off the logic to handle enabling P2P in NVMe fabrics' configfs
  into specific helpers in the p2pdma code. Per feedback from Christoph.

* A number of other minor cleanups and fixes as pointed out by
  Christoph and others.

Changes in v4:

* Change the original upstream_bridges_match() function to
  upstream_bridge_distance() which calculates the distance between two
  devices as long as they are behind the same root port. This should
  address Bjorn's concerns that the code was to focused on
  being behind a single switch.

* The disable ACS function now disables ACS for all bridge ports instead
  of switch ports (ie. those that had two upstream_bridge ports).

* Change the pci_p2pmem_alloc_sgl() and pci_p2pmem_free_sgl()
  API to be more like sgl_alloc() in that the alloc function returns
  the allocated scatterlist and nents is not required bythe free
  function.

* Moved the new documentation into the driver-api tree as requested
  by Jonathan

* Add SGL alloc and free helpers in the nvmet code so that the
  individual drivers can share the code that allocates P2P memory.
  As requested by Christoph.

* Cleanup the nvmet_p2pmem_store() function as Christoph
  thought my first attempt was ugly.

* Numerous commit message and comment fix-ups

Changes in v3:

* Many more fixes and minor cleanups that were spotted by Bjorn

* Additional explanation of the ACS change in both the commit message
  and Kconfig doc. Also, the code that disables the ACS bits is surrounded
  explicitly by an #ifdef

* Removed the flag we added to rdma_rw_ctx() in favour of using
  is_pci_p2pdma_page(), as suggested by Sagi.

* Adjust pci_p2pmem_find() so that it prefers P2P providers that
  are closest to (or the same as) the clients using them. In cases
  of ties, the provider is randomly chosen.

* Modify the NVMe Target code so that the PCI device name of the provider
  may be explicitly specified, bypassing the logic in pci_p2pmem_find().
  (Note: it's still enforced that the provider must be behind the
   same switch as the clients).

* As requested by Bjorn, added documentation for driver writers.


Changes in v2:

* Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
  as a bunch of cleanup and spelling fixes he pointed out in the last
  series.

* To address Alex's ACS concerns, we change to a simpler method of
  just disabling ACS behind switches for any kernel that has
  CONFIG_PCI_P2PDMA.

* We also reject using devices that employ 'dma_virt_ops' which should
  fairly simply handle Jason's concerns that this work might break with
  the HFI, QIB and rxe drivers that use the virtual ops to implement
  their own special DMA operations.

--

This is a continuation of our work to enable using Peer-to-Peer PCI
memory in the kernel with initial support for the NVMe fabrics target
subsystem. Many thanks go to Christoph Hellwig who provided valuable
feedback to get these patches to where they are today.

The concept here is to use memory that's exposed on a PCI BAR as
data buffers in the NVMe target code such that data can be transferred
from an RDMA NIC to the special memory and then directly to an NVMe
device avoiding system memory entirely. The upside of this is better
QoS for applications running on the CPU utilizing memory and lower
PCI bandwidth required to the CPU (such that systems could be designed
with fewer lanes connected to the CPU).

Due to these trade-offs we've designed the system to only enable using
the PCI memory in cases where the NIC, NVMe devices and memory are all
behind the same PCI switch hierarchy. This will mean many setups that
could likely work well will not be supported so that we can be more
confident it will work and not place any responsibility on the user to
understand their topology. (We chose to go this route based on feedback
we received at the last LSF). Future work may enable these transfers
using a white list of known good root complexes. However, at this time,
there is no reliable way to ensure that Peer-to-Peer transactions are
permitted between PCI Root Ports.

In order to enable this functionality, we introduce a few new PCI
functions such that a driver can register P2P memory with the system.
Struct pages are created for this memory using devm_memremap_pages()
and the PCI bus offset is stored in the corresponding pagemap structure.

When the PCI P2PDMA config option is selected the ACS bits in every
bridge port in the system are turned off to allow traffic to
pass freely behind the root port. At this time, the bit must be disabled
at boot so the IOMMU subsystem can correctly create the groups, though
this could be addressed in the future. There is no way to dynamically
disable the bit and alter the groups.

Another set of functions allow a client driver to create a list of
client devices that will be used in a given P2P transactions and then
use that list to find any P2P memory that is supported by all the
client devices.

In the block layer, we also introduce a P2P request flag to indicate a
given request targets P2P memory as well as a flag for a request queue
to indicate a given queue supports targeting P2P memory. P2P requests
will only be accepted by queues that support it. Also, P2P requests
are marked to not be merged seeing a non-homogenous request would
complicate the DMA mapping requirements.

In the PCI NVMe driver, we modify the existing CMB support to utilize
the new PCI P2P memory infrastructure and also add support for P2P
memory in its request queue. When a P2P request is received it uses the
pci_p2pmem_map_sg() function which applies the necessary transformation
to get the corrent pci_bus_addr_t for the DMA transactions.

In the RDMA core, we also adjust rdma_rw_ctx_init() and
rdma_rw_ctx_destroy() to take a flags argument which indicates whether
to use the PCI P2P mapping functions or not. To avoid odd RDMA devices
that don't use the proper DMA infrastructure this code rejects using
any device that employs the virt_dma_ops implementation.

Finally, in the NVMe fabrics target port we introduce a new
configuration boolean: 'allow_p2pmem'. When set, the port will attempt
to find P2P memory supported by the RDMA NIC and all namespaces. If
supported memory is found, it will be used in all IO transfers. And if
a port is using P2P memory, adding new namespaces that are not supported
by that memory will fail.

These patches have been tested on a number of Intel based systems and
for a variety of RDMA NICs (Mellanox, Broadcomm, Chelsio) and NVMe
SSDs (Intel, Seagate, Samsung) and p2pdma devices (Eideticom,
Microsemi, Chelsio and Everspin) using switches from both Microsemi
and Broadcomm.

Logan Gunthorpe (13):
  PCI/P2PDMA: Support peer-to-peer memory
  PCI/P2PDMA: Add sysfs group to display p2pmem stats
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  block: Add PCI P2P flag for request queue and check support for
    requests
  IB/core: Ensure we map P2P memory correctly in
    rdma_rw_ctx_[init|destroy]()
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Add a quirk for a pseudo CMB
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvmet: Optionally use PCI P2P memory

 Documentation/ABI/testing/sysfs-bus-pci    |  25 +
 Documentation/driver-api/index.rst         |   2 +-
 Documentation/driver-api/pci/index.rst     |  21 +
 Documentation/driver-api/pci/p2pdma.rst    | 170 ++++++
 Documentation/driver-api/{ => pci}/pci.rst |   0
 block/blk-core.c                           |  14 +
 drivers/infiniband/core/rw.c               |  11 +-
 drivers/nvme/host/core.c                   |   4 +
 drivers/nvme/host/nvme.h                   |   8 +
 drivers/nvme/host/pci.c                    | 121 ++--
 drivers/nvme/target/configfs.c             |  36 ++
 drivers/nvme/target/core.c                 | 149 +++++
 drivers/nvme/target/nvmet.h                |  15 +
 drivers/nvme/target/rdma.c                 |  22 +-
 drivers/pci/Kconfig                        |  17 +
 drivers/pci/Makefile                       |   1 +
 drivers/pci/p2pdma.c                       | 941 +++++++++++++++++++++++++++++
 include/linux/blkdev.h                     |   3 +
 include/linux/memremap.h                   |   6 +
 include/linux/mm.h                         |  18 +
 include/linux/pci-p2pdma.h                 | 124 ++++
 include/linux/pci.h                        |   4 +
 22 files changed, 1658 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

--
2.11.0

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

The new interface requires a driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary (adding incompatible clients
will fail). With a suitable p2pmem device, memory can then be
allocated with pci_alloc_p2pmem() for use in DMA transactions.

Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/Kconfig        |  17 +
 drivers/pci/Makefile       |   1 +
 drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |   5 +
 include/linux/mm.h         |  18 ++
 include/linux/pci-p2pdma.h | 102 ++++++
 include/linux/pci.h        |   4 +
 7 files changed, 908 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 56ff8f6d31fc..deb68be4fdac 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -132,6 +132,23 @@ config PCI_PASID
 
 	  If unsure, say N.
 
+config PCI_P2PDMA
+	bool "PCI peer-to-peer transfer support"
+	depends on PCI && ZONE_DEVICE
+	select GENERIC_ALLOCATOR
+	help
+	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
+	  BARs that are exposed in other devices that are the part of
+	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+	  specification to work (ie. anything below a single PCI bridge).
+
+	  Many PCIe root complexes do not support P2P transactions and
+	  it's hard to tell which support it at all, so at this time,
+	  P2P DMA transations must be between devices behind the same root
+	  port.
+
+	  If unsure, say N.
+
 config PCI_LABEL
 	def_bool y if (DMI || ACPI)
 	depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 1b2cfe51e8d7..85f4a703b2be 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
 obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
 obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM)		+= ecam.o
+obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 # Endpoint library must be initialized before its users
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index 000000000000..88aaec5351cd
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,761 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ */
+
+#define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/pci-p2pdma.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/genalloc.h>
+#include <linux/memremap.h>
+#include <linux/percpu-refcount.h>
+#include <linux/random.h>
+#include <linux/seq_buf.h>
+
+struct pci_p2pdma {
+	struct percpu_ref devmap_ref;
+	struct completion devmap_ref_done;
+	struct gen_pool *pool;
+	bool p2pmem_published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+	struct pci_p2pdma *p2p =
+		container_of(ref, struct pci_p2pdma, devmap_ref);
+
+	complete_all(&p2p->devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+	struct percpu_ref *ref = data;
+
+	if (percpu_ref_is_dying(ref))
+		return;
+
+	percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void *data)
+{
+	struct pci_dev *pdev = data;
+
+	if (!pdev->p2pdma)
+		return;
+
+	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
+	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
+
+	gen_pool_destroy(pdev->p2pdma->pool);
+	pdev->p2pdma = NULL;
+}
+
+static int pci_p2pdma_setup(struct pci_dev *pdev)
+{
+	int error = -ENOMEM;
+	struct pci_p2pdma *p2p;
+
+	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
+	if (!p2p)
+		return -ENOMEM;
+
+	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
+	if (!p2p->pool)
+		goto out;
+
+	init_completion(&p2p->devmap_ref_done);
+	error = percpu_ref_init(&p2p->devmap_ref,
+			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+	if (error)
+		goto out_pool_destroy;
+
+	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+	if (error)
+		goto out_pool_destroy;
+
+	pdev->p2pdma = p2p;
+
+	return 0;
+
+out_pool_destroy:
+	gen_pool_destroy(p2p->pool);
+out:
+	devm_kfree(&pdev->dev, p2p);
+	return error;
+}
+
+/**
+ * pci_p2pdma_add_resource - add memory for use as p2p memory
+ * @pdev: the device to add the memory to
+ * @bar: PCI BAR to add
+ * @size: size of the memory to add, may be zero to use the whole BAR
+ * @offset: offset into the PCI BAR
+ *
+ * The memory will be given ZONE_DEVICE struct pages so that it may
+ * be used with any DMA request.
+ */
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+			    u64 offset)
+{
+	struct dev_pagemap *pgmap;
+	void *addr;
+	int error;
+
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+		return -EINVAL;
+
+	if (offset >= pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!size)
+		size = pci_resource_len(pdev, bar) - offset;
+
+	if (size + offset > pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!pdev->p2pdma) {
+		error = pci_p2pdma_setup(pdev);
+		if (error)
+			return error;
+	}
+
+	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
+	if (!pgmap)
+		return -ENOMEM;
+
+	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
+	pgmap->res.end = pgmap->res.start + size - 1;
+	pgmap->res.flags = pci_resource_flags(pdev, bar);
+	pgmap->ref = &pdev->p2pdma->devmap_ref;
+	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+
+	addr = devm_memremap_pages(&pdev->dev, pgmap);
+	if (IS_ERR(addr)) {
+		error = PTR_ERR(addr);
+		goto pgmap_free;
+	}
+
+	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
+			pci_bus_address(pdev, bar) + offset,
+			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
+	if (error)
+		goto pgmap_free;
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
+					  &pdev->p2pdma->devmap_ref);
+	if (error)
+		goto pgmap_free;
+
+	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
+		 &pgmap->res);
+
+	return 0;
+
+pgmap_free:
+	devres_free(pgmap);
+	return error;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+
+static struct pci_dev *find_parent_pci_dev(struct device *dev)
+{
+	struct device *parent;
+
+	dev = get_device(dev);
+
+	while (dev) {
+		if (dev_is_pci(dev))
+			return to_pci_dev(dev);
+
+		parent = get_device(dev->parent);
+		put_device(dev);
+		dev = parent;
+	}
+
+	return NULL;
+}
+
+/*
+ * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
+ * TLPs upstream via ACS. Returns 1 if the packets will be redirected
+ * upstream, 0 otherwise.
+ */
+static int pci_bridge_has_acs_redir(struct pci_dev *dev)
+{
+	int pos;
+	u16 ctrl;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return 0;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
+		return 1;
+
+	return 0;
+}
+
+static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
+{
+	if (!buf)
+		return;
+
+	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
+		       dev->bus->number, PCI_SLOT(dev->devfn),
+		       PCI_FUNC(dev->devfn));
+}
+
+/*
+ * Find the distance through the nearest common upstream bridge between
+ * two PCI devices.
+ *
+ * If the two devices are the same device then 0 will be returned.
+ *
+ * If there are two virtual functions of the same device behind the same
+ * bridge port then 2 will be returned (one step down to the PCIe switch,
+ * then one step back to the same device).
+ *
+ * In the case where two devices are connected to the same PCIe switch, the
+ * value 4 will be returned. This corresponds to the following PCI tree:
+ *
+ *     -+  Root Port
+ *      \+ Switch Upstream Port
+ *       +-+ Switch Downstream Port
+ *       + \- Device A
+ *       \-+ Switch Downstream Port
+ *         \- Device B
+ *
+ * The distance is 4 because we traverse from Device A through the downstream
+ * port of the switch, to the common upstream port, back up to the second
+ * downstream port and then to Device B.
+ *
+ * Any two devices that don't have a common upstream bridge will return -1.
+ * In this way devices on separate PCIe root ports will be rejected, which
+ * is what we want for peer-to-peer seeing each PCIe root port defines a
+ * separate hierarchy domain and there's no way to determine whether the root
+ * complex supports forwarding between them.
+ *
+ * In the case where two devices are connected to different PCIe switches,
+ * this function will still return a positive distance as long as both
+ * switches evenutally have a common upstream bridge. Note this covers
+ * the case of using multiple PCIe switches to achieve a desired level of
+ * fan-out from a root port. The exact distance will be a function of the
+ * number of switches between Device A and Device B.
+ *
+ * If a bridge which has any ACS redirection bits set is in the path
+ * then this functions will return -2. This is so we reject any
+ * cases where the TLPs are forwarded up into the root complex.
+ * In this case, a list of all infringing bridge addresses will be
+ * populated in acs_list (assuming it's non-null) for printk purposes.
+ */
+static int upstream_bridge_distance(struct pci_dev *a,
+				    struct pci_dev *b,
+				    struct seq_buf *acs_list)
+{
+	int dist_a = 0;
+	int dist_b = 0;
+	struct pci_dev *bb = NULL;
+	int acs_cnt = 0;
+
+	/*
+	 * Note, we don't need to take references to devices returned by
+	 * pci_upstream_bridge() seeing we hold a reference to a child
+	 * device which will already hold a reference to the upstream bridge.
+	 */
+
+	while (a) {
+		dist_b = 0;
+
+		if (pci_bridge_has_acs_redir(a)) {
+			seq_buf_print_bus_devfn(acs_list, a);
+			acs_cnt++;
+		}
+
+		bb = b;
+
+		while (bb) {
+			if (a == bb)
+				goto check_b_path_acs;
+
+			bb = pci_upstream_bridge(bb);
+			dist_b++;
+		}
+
+		a = pci_upstream_bridge(a);
+		dist_a++;
+	}
+
+	return -1;
+
+check_b_path_acs:
+	bb = b;
+
+	while (bb) {
+		if (a == bb)
+			break;
+
+		if (pci_bridge_has_acs_redir(bb)) {
+			seq_buf_print_bus_devfn(acs_list, bb);
+			acs_cnt++;
+		}
+
+		bb = pci_upstream_bridge(bb);
+	}
+
+	if (acs_cnt)
+		return -2;
+
+	return dist_a + dist_b;
+}
+
+static int upstream_bridge_distance_warn(struct pci_dev *provider,
+					 struct pci_dev *client)
+{
+	struct seq_buf acs_list;
+	int ret;
+
+	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
+
+	ret = upstream_bridge_distance(provider, client, &acs_list);
+	if (ret == -2) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
+		/* Drop final semicolon */
+		acs_list.buffer[acs_list.len-1] = 0;
+		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+			 acs_list.buffer);
+
+	} else if (ret < 0) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
+	}
+
+	kfree(acs_list.buffer);
+
+	return ret;
+}
+
+struct pci_p2pdma_client {
+	struct list_head list;
+	struct pci_dev *client;
+	struct pci_dev *provider;
+};
+
+/**
+ * pci_p2pdma_add_client - allocate a new element in a client device list
+ * @head: list head of p2pdma clients
+ * @dev: device to add to the list
+ *
+ * This adds @dev to a list of clients used by a p2pdma device.
+ * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
+ * been called successfully, the list will be bound to a specific p2pdma
+ * device and new clients can only be added to the list if they are
+ * supported by that p2pdma device.
+ *
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ *
+ * Returns 0 if the client was successfully added.
+ */
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *item, *new_item;
+	struct pci_dev *provider = NULL;
+	struct pci_dev *client;
+	int ret;
+
+	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
+		return -ENODEV;
+	}
+
+	client = find_parent_pci_dev(dev);
+	if (!client) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
+		return -ENODEV;
+	}
+
+	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
+	if (item && item->provider) {
+		provider = item->provider;
+
+		ret = upstream_bridge_distance_warn(provider, client);
+		if (ret < 0) {
+			ret = -EXDEV;
+			goto put_client;
+		}
+	}
+
+	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
+	if (!new_item) {
+		ret = -ENOMEM;
+		goto put_client;
+	}
+
+	new_item->client = client;
+	new_item->provider = pci_dev_get(provider);
+
+	list_add_tail(&new_item->list, head);
+
+	return 0;
+
+put_client:
+	pci_dev_put(client);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
+
+static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
+{
+	list_del(&item->list);
+	pci_dev_put(item->client);
+	pci_dev_put(item->provider);
+	kfree(item);
+}
+
+/**
+ * pci_p2pdma_remove_client - remove and free a p2pdma client
+ * @head: list head of p2pdma clients
+ * @dev: device to remove from the list
+ *
+ * This removes @dev from a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+	struct pci_dev *pdev;
+
+	pdev = find_parent_pci_dev(dev);
+	if (!pdev)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, head, list) {
+		if (pos->client != pdev)
+			continue;
+
+		pci_p2pdma_client_free(pos);
+	}
+
+	pci_dev_put(pdev);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
+
+/**
+ * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
+ * @head: list head of p2pdma clients
+ *
+ * This removes all devices in a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2pdma functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_client_list_free(struct list_head *head)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, head, list)
+		pci_p2pdma_client_free(pos);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
+
+/**
+ * pci_p2pdma_distance - Determive the cumulative distance between
+ *	a p2pdma provider and the clients in use.
+ * @provider: p2pdma provider to check against the client list
+ * @clients: list of devices to check (NULL-terminated)
+ * @verbose: if true, print warnings for devices when we return -1
+ *
+ * Returns -1 if any of the clients are not compatible (behind the same
+ * root port as the provider), otherwise returns a positive number where
+ * the lower number is the preferrable choice. (If there's one client
+ * that's the same as the provider it will return 0, which is best choice).
+ *
+ * For now, "compatible" means the provider and the clients are all behind
+ * the same PCI root port. This cuts out cases that may work but is safest
+ * for the user. Future work can expand this to white-list root complexes that
+ * can safely forward between each ports.
+ */
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose)
+{
+	struct pci_p2pdma_client *pos;
+	int ret;
+	int distance = 0;
+	bool not_supported = false;
+
+	if (list_empty(clients))
+		return -1;
+
+	list_for_each_entry(pos, clients, list) {
+		if (verbose)
+			ret = upstream_bridge_distance_warn(provider,
+							    pos->client);
+		else
+			ret = upstream_bridge_distance(provider, pos->client,
+						       NULL);
+
+		if (ret < 0)
+			not_supported = true;
+
+		if (not_supported && !verbose)
+			break;
+
+		distance += ret;
+	}
+
+	if (not_supported)
+		return -1;
+
+	return distance;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
+
+/**
+ * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
+ *	and assign a provider to a list of clients
+ * @provider: p2pdma provider to assign to the client list
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * Returns false if any of the clients are not compatible, true if the
+ * provider was successfully assigned to the clients.
+ */
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients)
+{
+	struct pci_p2pdma_client *pos;
+
+	if (pci_p2pdma_distance(provider, clients, true) < 0)
+		return false;
+
+	list_for_each_entry(pos, clients, list)
+		pos->provider = provider;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
+
+/**
+ * pci_has_p2pmem - check if a given PCI device has published any p2pmem
+ * @pdev: PCI device to check
+ */
+bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
+}
+EXPORT_SYMBOL_GPL(pci_has_p2pmem);
+
+/**
+ * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
+ *	the specified list of clients and shortest distance (as determined
+ *	by pci_p2pmem_dma())
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * If multiple devices are behind the same switch, the one "closest" to the
+ * client devices in use will be chosen first. (So if one of the providers are
+ * the same as one of the clients, that provider will be used ahead of any
+ * other providers that are unrelated). If multiple providers are an equal
+ * distance away, one will be chosen at random.
+ *
+ * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
+ * to return the reference) or NULL if no compatible device is found. The
+ * found provider will also be assigned to the client list.
+ */
+struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	struct pci_dev *pdev = NULL;
+	struct pci_p2pdma_client *pos;
+	int distance;
+	int closest_distance = INT_MAX;
+	struct pci_dev **closest_pdevs;
+	int dev_cnt = 0;
+	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
+	int i;
+
+	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
+		if (!pci_has_p2pmem(pdev))
+			continue;
+
+		distance = pci_p2pdma_distance(pdev, clients, false);
+		if (distance < 0 || distance > closest_distance)
+			continue;
+
+		if (distance == closest_distance && dev_cnt >= max_devs)
+			continue;
+
+		if (distance < closest_distance) {
+			for (i = 0; i < dev_cnt; i++)
+				pci_dev_put(closest_pdevs[i]);
+
+			dev_cnt = 0;
+			closest_distance = distance;
+		}
+
+		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
+	}
+
+	if (dev_cnt)
+		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
+
+	for (i = 0; i < dev_cnt; i++)
+		pci_dev_put(closest_pdevs[i]);
+
+	if (pdev)
+		list_for_each_entry(pos, clients, list)
+			pos->provider = pdev;
+
+	kfree(closest_pdevs);
+	return pdev;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_find);
+
+/**
+ * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
+ * @pdev: the device to allocate memory from
+ * @size: number of bytes to allocate
+ *
+ * Returns the allocated memory or NULL on error.
+ */
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	void *ret;
+
+	if (unlikely(!pdev->p2pdma))
+		return NULL;
+
+	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
+		return NULL;
+
+	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
+
+	if (unlikely(!ret))
+		percpu_ref_put(&pdev->p2pdma->devmap_ref);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
+
+/**
+ * pci_free_p2pmem - allocate peer-to-peer DMA memory
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ * @size: number of bytes that was allocated
+ */
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
+{
+	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
+	percpu_ref_put(&pdev->p2pdma->devmap_ref);
+}
+EXPORT_SYMBOL_GPL(pci_free_p2pmem);
+
+/**
+ * pci_virt_to_bus - return the PCI bus address for a given virtual
+ *	address obtained with pci_alloc_p2pmem()
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+	if (!addr)
+		return 0;
+	if (!pdev->p2pdma)
+		return 0;
+
+	/*
+	 * Note: when we added the memory to the pool we used the PCI
+	 * bus address as the physical address. So gen_pool_virt_to_phys()
+	 * actually returns the bus address despite the misleading name.
+	 */
+	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
+
+/**
+ * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ * @length: number of bytes to allocate
+ *
+ * Returns 0 on success
+ */
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length)
+{
+	struct scatterlist *sg;
+	void *addr;
+
+	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
+	if (!sg)
+		return NULL;
+
+	sg_init_table(sg, 1);
+
+	addr = pci_alloc_p2pmem(pdev, length);
+	if (!addr)
+		goto out_free_sg;
+
+	sg_set_buf(sg, addr, length);
+	*nents = 1;
+	return sg;
+
+out_free_sg:
+	kfree(sg);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
+
+/**
+ * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ */
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
+{
+	struct scatterlist *sg;
+	int count;
+
+	for_each_sg(sgl, sg, INT_MAX, count) {
+		if (!sg)
+			break;
+
+		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
+	}
+	kfree(sgl);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
+
+/**
+ * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
+ *	other devices with pci_p2pmem_find()
+ * @pdev: the device with peer-to-peer DMA memory to publish
+ * @publish: set to true to publish the memory, false to unpublish it
+ *
+ * Published memory can be used by other PCI device drivers for
+ * peer-2-peer DMA operations. Non-published memory is reserved for
+ * exlusive use of the device driver that registers the peer-to-peer
+ * memory.
+ */
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+	if (publish && !pdev->p2pdma)
+		return;
+
+	pdev->p2pdma->p2pmem_published = publish;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e763557..9553370ebdad 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,16 @@ struct vmem_altmap {
  * wakeup event whenever a page is unpinned and becomes idle. This
  * wakeup is used to coordinate physical address space management (ex:
  * fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_PCI_P2PDMA:
+ * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
+ * transactions.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
+	MEMORY_DEVICE_PCI_P2PDMA,
 };
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..2055df412a77 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
+}
+#else /* CONFIG_PCI_P2PDMA */
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
new file mode 100644
index 000000000000..7b2b0f547528
--- /dev/null
+++ b/include/linux/pci-p2pdma.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ *
+ */
+
+#ifndef _LINUX_PCI_P2PDMA_H
+#define _LINUX_PCI_P2PDMA_H
+
+#include <linux/pci.h>
+
+struct block_device;
+struct scatterlist;
+
+#ifdef CONFIG_PCI_P2PDMA
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+		u64 offset);
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_client_list_free(struct list_head *head);
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose);
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients);
+bool pci_has_p2pmem(struct pci_dev *pdev);
+struct pci_dev *pci_p2pmem_find(struct list_head *clients);
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length);
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+#else /* CONFIG_PCI_P2PDMA */
+static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
+		size_t size, u64 offset)
+{
+	return -EOPNOTSUPP;
+}
+static inline int pci_p2pdma_add_client(struct list_head *head,
+		struct device *dev)
+{
+	return 0;
+}
+static inline void pci_p2pdma_remove_client(struct list_head *head,
+		struct device *dev)
+{
+}
+static inline void pci_p2pdma_client_list_free(struct list_head *head)
+{
+}
+static inline int pci_p2pdma_distance(struct pci_dev *provider,
+				      struct list_head *clients,
+				      bool verbose)
+{
+	return -1;
+}
+static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+					      struct list_head *clients)
+{
+	return false;
+}
+static inline bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return false;
+}
+static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	return NULL;
+}
+static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	return NULL;
+}
+static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
+		size_t size)
+{
+}
+static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
+						    void *addr)
+{
+	return 0;
+}
+static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+		unsigned int *nents, u32 length)
+{
+	return NULL;
+}
+static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
+		struct scatterlist *sgl)
+{
+}
+static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+}
+#endif /* CONFIG_PCI_P2PDMA */
+#endif /* _LINUX_PCI_P2P_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..5d95dbf21f4a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -281,6 +281,7 @@ struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
+struct pci_p2pdma;
 
 /* The pci_dev structure describes PCI devices */
 struct pci_dev {
@@ -439,6 +440,9 @@ struct pci_dev {
 #ifdef CONFIG_PCI_PASID
 	u16		pasid_features;
 #endif
+#ifdef CONFIG_PCI_P2PDMA
+	struct pci_p2pdma *p2pdma;
+#endif
 	phys_addr_t	rom;		/* Physical address if not from BAR */
 	size_t		romlen;		/* Length if not from BAR */
 	char		*driver_override; /* Driver name to force a match */
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

The new interface requires a driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary (adding incompatible clients
will fail). With a suitable p2pmem device, memory can then be
allocated with pci_alloc_p2pmem() for use in DMA transactions.

Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/Kconfig        |  17 +
 drivers/pci/Makefile       |   1 +
 drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |   5 +
 include/linux/mm.h         |  18 ++
 include/linux/pci-p2pdma.h | 102 ++++++
 include/linux/pci.h        |   4 +
 7 files changed, 908 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 56ff8f6d31fc..deb68be4fdac 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -132,6 +132,23 @@ config PCI_PASID
 
 	  If unsure, say N.
 
+config PCI_P2PDMA
+	bool "PCI peer-to-peer transfer support"
+	depends on PCI && ZONE_DEVICE
+	select GENERIC_ALLOCATOR
+	help
+	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
+	  BARs that are exposed in other devices that are the part of
+	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+	  specification to work (ie. anything below a single PCI bridge).
+
+	  Many PCIe root complexes do not support P2P transactions and
+	  it's hard to tell which support it at all, so at this time,
+	  P2P DMA transations must be between devices behind the same root
+	  port.
+
+	  If unsure, say N.
+
 config PCI_LABEL
 	def_bool y if (DMI || ACPI)
 	depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 1b2cfe51e8d7..85f4a703b2be 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
 obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
 obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM)		+= ecam.o
+obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 # Endpoint library must be initialized before its users
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index 000000000000..88aaec5351cd
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,761 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ */
+
+#define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/pci-p2pdma.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/genalloc.h>
+#include <linux/memremap.h>
+#include <linux/percpu-refcount.h>
+#include <linux/random.h>
+#include <linux/seq_buf.h>
+
+struct pci_p2pdma {
+	struct percpu_ref devmap_ref;
+	struct completion devmap_ref_done;
+	struct gen_pool *pool;
+	bool p2pmem_published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+	struct pci_p2pdma *p2p =
+		container_of(ref, struct pci_p2pdma, devmap_ref);
+
+	complete_all(&p2p->devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+	struct percpu_ref *ref = data;
+
+	if (percpu_ref_is_dying(ref))
+		return;
+
+	percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void *data)
+{
+	struct pci_dev *pdev = data;
+
+	if (!pdev->p2pdma)
+		return;
+
+	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
+	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
+
+	gen_pool_destroy(pdev->p2pdma->pool);
+	pdev->p2pdma = NULL;
+}
+
+static int pci_p2pdma_setup(struct pci_dev *pdev)
+{
+	int error = -ENOMEM;
+	struct pci_p2pdma *p2p;
+
+	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
+	if (!p2p)
+		return -ENOMEM;
+
+	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
+	if (!p2p->pool)
+		goto out;
+
+	init_completion(&p2p->devmap_ref_done);
+	error = percpu_ref_init(&p2p->devmap_ref,
+			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+	if (error)
+		goto out_pool_destroy;
+
+	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+	if (error)
+		goto out_pool_destroy;
+
+	pdev->p2pdma = p2p;
+
+	return 0;
+
+out_pool_destroy:
+	gen_pool_destroy(p2p->pool);
+out:
+	devm_kfree(&pdev->dev, p2p);
+	return error;
+}
+
+/**
+ * pci_p2pdma_add_resource - add memory for use as p2p memory
+ * @pdev: the device to add the memory to
+ * @bar: PCI BAR to add
+ * @size: size of the memory to add, may be zero to use the whole BAR
+ * @offset: offset into the PCI BAR
+ *
+ * The memory will be given ZONE_DEVICE struct pages so that it may
+ * be used with any DMA request.
+ */
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+			    u64 offset)
+{
+	struct dev_pagemap *pgmap;
+	void *addr;
+	int error;
+
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+		return -EINVAL;
+
+	if (offset >= pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!size)
+		size = pci_resource_len(pdev, bar) - offset;
+
+	if (size + offset > pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!pdev->p2pdma) {
+		error = pci_p2pdma_setup(pdev);
+		if (error)
+			return error;
+	}
+
+	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
+	if (!pgmap)
+		return -ENOMEM;
+
+	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
+	pgmap->res.end = pgmap->res.start + size - 1;
+	pgmap->res.flags = pci_resource_flags(pdev, bar);
+	pgmap->ref = &pdev->p2pdma->devmap_ref;
+	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+
+	addr = devm_memremap_pages(&pdev->dev, pgmap);
+	if (IS_ERR(addr)) {
+		error = PTR_ERR(addr);
+		goto pgmap_free;
+	}
+
+	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
+			pci_bus_address(pdev, bar) + offset,
+			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
+	if (error)
+		goto pgmap_free;
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
+					  &pdev->p2pdma->devmap_ref);
+	if (error)
+		goto pgmap_free;
+
+	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
+		 &pgmap->res);
+
+	return 0;
+
+pgmap_free:
+	devres_free(pgmap);
+	return error;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+
+static struct pci_dev *find_parent_pci_dev(struct device *dev)
+{
+	struct device *parent;
+
+	dev = get_device(dev);
+
+	while (dev) {
+		if (dev_is_pci(dev))
+			return to_pci_dev(dev);
+
+		parent = get_device(dev->parent);
+		put_device(dev);
+		dev = parent;
+	}
+
+	return NULL;
+}
+
+/*
+ * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
+ * TLPs upstream via ACS. Returns 1 if the packets will be redirected
+ * upstream, 0 otherwise.
+ */
+static int pci_bridge_has_acs_redir(struct pci_dev *dev)
+{
+	int pos;
+	u16 ctrl;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return 0;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
+		return 1;
+
+	return 0;
+}
+
+static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
+{
+	if (!buf)
+		return;
+
+	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
+		       dev->bus->number, PCI_SLOT(dev->devfn),
+		       PCI_FUNC(dev->devfn));
+}
+
+/*
+ * Find the distance through the nearest common upstream bridge between
+ * two PCI devices.
+ *
+ * If the two devices are the same device then 0 will be returned.
+ *
+ * If there are two virtual functions of the same device behind the same
+ * bridge port then 2 will be returned (one step down to the PCIe switch,
+ * then one step back to the same device).
+ *
+ * In the case where two devices are connected to the same PCIe switch, the
+ * value 4 will be returned. This corresponds to the following PCI tree:
+ *
+ *     -+  Root Port
+ *      \+ Switch Upstream Port
+ *       +-+ Switch Downstream Port
+ *       + \- Device A
+ *       \-+ Switch Downstream Port
+ *         \- Device B
+ *
+ * The distance is 4 because we traverse from Device A through the downstream
+ * port of the switch, to the common upstream port, back up to the second
+ * downstream port and then to Device B.
+ *
+ * Any two devices that don't have a common upstream bridge will return -1.
+ * In this way devices on separate PCIe root ports will be rejected, which
+ * is what we want for peer-to-peer seeing each PCIe root port defines a
+ * separate hierarchy domain and there's no way to determine whether the root
+ * complex supports forwarding between them.
+ *
+ * In the case where two devices are connected to different PCIe switches,
+ * this function will still return a positive distance as long as both
+ * switches evenutally have a common upstream bridge. Note this covers
+ * the case of using multiple PCIe switches to achieve a desired level of
+ * fan-out from a root port. The exact distance will be a function of the
+ * number of switches between Device A and Device B.
+ *
+ * If a bridge which has any ACS redirection bits set is in the path
+ * then this functions will return -2. This is so we reject any
+ * cases where the TLPs are forwarded up into the root complex.
+ * In this case, a list of all infringing bridge addresses will be
+ * populated in acs_list (assuming it's non-null) for printk purposes.
+ */
+static int upstream_bridge_distance(struct pci_dev *a,
+				    struct pci_dev *b,
+				    struct seq_buf *acs_list)
+{
+	int dist_a = 0;
+	int dist_b = 0;
+	struct pci_dev *bb = NULL;
+	int acs_cnt = 0;
+
+	/*
+	 * Note, we don't need to take references to devices returned by
+	 * pci_upstream_bridge() seeing we hold a reference to a child
+	 * device which will already hold a reference to the upstream bridge.
+	 */
+
+	while (a) {
+		dist_b = 0;
+
+		if (pci_bridge_has_acs_redir(a)) {
+			seq_buf_print_bus_devfn(acs_list, a);
+			acs_cnt++;
+		}
+
+		bb = b;
+
+		while (bb) {
+			if (a == bb)
+				goto check_b_path_acs;
+
+			bb = pci_upstream_bridge(bb);
+			dist_b++;
+		}
+
+		a = pci_upstream_bridge(a);
+		dist_a++;
+	}
+
+	return -1;
+
+check_b_path_acs:
+	bb = b;
+
+	while (bb) {
+		if (a == bb)
+			break;
+
+		if (pci_bridge_has_acs_redir(bb)) {
+			seq_buf_print_bus_devfn(acs_list, bb);
+			acs_cnt++;
+		}
+
+		bb = pci_upstream_bridge(bb);
+	}
+
+	if (acs_cnt)
+		return -2;
+
+	return dist_a + dist_b;
+}
+
+static int upstream_bridge_distance_warn(struct pci_dev *provider,
+					 struct pci_dev *client)
+{
+	struct seq_buf acs_list;
+	int ret;
+
+	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
+
+	ret = upstream_bridge_distance(provider, client, &acs_list);
+	if (ret == -2) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
+		/* Drop final semicolon */
+		acs_list.buffer[acs_list.len-1] = 0;
+		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+			 acs_list.buffer);
+
+	} else if (ret < 0) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
+	}
+
+	kfree(acs_list.buffer);
+
+	return ret;
+}
+
+struct pci_p2pdma_client {
+	struct list_head list;
+	struct pci_dev *client;
+	struct pci_dev *provider;
+};
+
+/**
+ * pci_p2pdma_add_client - allocate a new element in a client device list
+ * @head: list head of p2pdma clients
+ * @dev: device to add to the list
+ *
+ * This adds @dev to a list of clients used by a p2pdma device.
+ * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
+ * been called successfully, the list will be bound to a specific p2pdma
+ * device and new clients can only be added to the list if they are
+ * supported by that p2pdma device.
+ *
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ *
+ * Returns 0 if the client was successfully added.
+ */
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *item, *new_item;
+	struct pci_dev *provider = NULL;
+	struct pci_dev *client;
+	int ret;
+
+	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
+		return -ENODEV;
+	}
+
+	client = find_parent_pci_dev(dev);
+	if (!client) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
+		return -ENODEV;
+	}
+
+	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
+	if (item && item->provider) {
+		provider = item->provider;
+
+		ret = upstream_bridge_distance_warn(provider, client);
+		if (ret < 0) {
+			ret = -EXDEV;
+			goto put_client;
+		}
+	}
+
+	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
+	if (!new_item) {
+		ret = -ENOMEM;
+		goto put_client;
+	}
+
+	new_item->client = client;
+	new_item->provider = pci_dev_get(provider);
+
+	list_add_tail(&new_item->list, head);
+
+	return 0;
+
+put_client:
+	pci_dev_put(client);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
+
+static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
+{
+	list_del(&item->list);
+	pci_dev_put(item->client);
+	pci_dev_put(item->provider);
+	kfree(item);
+}
+
+/**
+ * pci_p2pdma_remove_client - remove and free a p2pdma client
+ * @head: list head of p2pdma clients
+ * @dev: device to remove from the list
+ *
+ * This removes @dev from a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+	struct pci_dev *pdev;
+
+	pdev = find_parent_pci_dev(dev);
+	if (!pdev)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, head, list) {
+		if (pos->client != pdev)
+			continue;
+
+		pci_p2pdma_client_free(pos);
+	}
+
+	pci_dev_put(pdev);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
+
+/**
+ * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
+ * @head: list head of p2pdma clients
+ *
+ * This removes all devices in a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2pdma functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_client_list_free(struct list_head *head)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, head, list)
+		pci_p2pdma_client_free(pos);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
+
+/**
+ * pci_p2pdma_distance - Determive the cumulative distance between
+ *	a p2pdma provider and the clients in use.
+ * @provider: p2pdma provider to check against the client list
+ * @clients: list of devices to check (NULL-terminated)
+ * @verbose: if true, print warnings for devices when we return -1
+ *
+ * Returns -1 if any of the clients are not compatible (behind the same
+ * root port as the provider), otherwise returns a positive number where
+ * the lower number is the preferrable choice. (If there's one client
+ * that's the same as the provider it will return 0, which is best choice).
+ *
+ * For now, "compatible" means the provider and the clients are all behind
+ * the same PCI root port. This cuts out cases that may work but is safest
+ * for the user. Future work can expand this to white-list root complexes that
+ * can safely forward between each ports.
+ */
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose)
+{
+	struct pci_p2pdma_client *pos;
+	int ret;
+	int distance = 0;
+	bool not_supported = false;
+
+	if (list_empty(clients))
+		return -1;
+
+	list_for_each_entry(pos, clients, list) {
+		if (verbose)
+			ret = upstream_bridge_distance_warn(provider,
+							    pos->client);
+		else
+			ret = upstream_bridge_distance(provider, pos->client,
+						       NULL);
+
+		if (ret < 0)
+			not_supported = true;
+
+		if (not_supported && !verbose)
+			break;
+
+		distance += ret;
+	}
+
+	if (not_supported)
+		return -1;
+
+	return distance;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
+
+/**
+ * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
+ *	and assign a provider to a list of clients
+ * @provider: p2pdma provider to assign to the client list
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * Returns false if any of the clients are not compatible, true if the
+ * provider was successfully assigned to the clients.
+ */
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients)
+{
+	struct pci_p2pdma_client *pos;
+
+	if (pci_p2pdma_distance(provider, clients, true) < 0)
+		return false;
+
+	list_for_each_entry(pos, clients, list)
+		pos->provider = provider;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
+
+/**
+ * pci_has_p2pmem - check if a given PCI device has published any p2pmem
+ * @pdev: PCI device to check
+ */
+bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
+}
+EXPORT_SYMBOL_GPL(pci_has_p2pmem);
+
+/**
+ * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
+ *	the specified list of clients and shortest distance (as determined
+ *	by pci_p2pmem_dma())
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * If multiple devices are behind the same switch, the one "closest" to the
+ * client devices in use will be chosen first. (So if one of the providers are
+ * the same as one of the clients, that provider will be used ahead of any
+ * other providers that are unrelated). If multiple providers are an equal
+ * distance away, one will be chosen at random.
+ *
+ * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
+ * to return the reference) or NULL if no compatible device is found. The
+ * found provider will also be assigned to the client list.
+ */
+struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	struct pci_dev *pdev = NULL;
+	struct pci_p2pdma_client *pos;
+	int distance;
+	int closest_distance = INT_MAX;
+	struct pci_dev **closest_pdevs;
+	int dev_cnt = 0;
+	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
+	int i;
+
+	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
+		if (!pci_has_p2pmem(pdev))
+			continue;
+
+		distance = pci_p2pdma_distance(pdev, clients, false);
+		if (distance < 0 || distance > closest_distance)
+			continue;
+
+		if (distance == closest_distance && dev_cnt >= max_devs)
+			continue;
+
+		if (distance < closest_distance) {
+			for (i = 0; i < dev_cnt; i++)
+				pci_dev_put(closest_pdevs[i]);
+
+			dev_cnt = 0;
+			closest_distance = distance;
+		}
+
+		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
+	}
+
+	if (dev_cnt)
+		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
+
+	for (i = 0; i < dev_cnt; i++)
+		pci_dev_put(closest_pdevs[i]);
+
+	if (pdev)
+		list_for_each_entry(pos, clients, list)
+			pos->provider = pdev;
+
+	kfree(closest_pdevs);
+	return pdev;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_find);
+
+/**
+ * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
+ * @pdev: the device to allocate memory from
+ * @size: number of bytes to allocate
+ *
+ * Returns the allocated memory or NULL on error.
+ */
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	void *ret;
+
+	if (unlikely(!pdev->p2pdma))
+		return NULL;
+
+	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
+		return NULL;
+
+	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
+
+	if (unlikely(!ret))
+		percpu_ref_put(&pdev->p2pdma->devmap_ref);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
+
+/**
+ * pci_free_p2pmem - allocate peer-to-peer DMA memory
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ * @size: number of bytes that was allocated
+ */
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
+{
+	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
+	percpu_ref_put(&pdev->p2pdma->devmap_ref);
+}
+EXPORT_SYMBOL_GPL(pci_free_p2pmem);
+
+/**
+ * pci_virt_to_bus - return the PCI bus address for a given virtual
+ *	address obtained with pci_alloc_p2pmem()
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+	if (!addr)
+		return 0;
+	if (!pdev->p2pdma)
+		return 0;
+
+	/*
+	 * Note: when we added the memory to the pool we used the PCI
+	 * bus address as the physical address. So gen_pool_virt_to_phys()
+	 * actually returns the bus address despite the misleading name.
+	 */
+	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
+
+/**
+ * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ * @length: number of bytes to allocate
+ *
+ * Returns 0 on success
+ */
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length)
+{
+	struct scatterlist *sg;
+	void *addr;
+
+	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
+	if (!sg)
+		return NULL;
+
+	sg_init_table(sg, 1);
+
+	addr = pci_alloc_p2pmem(pdev, length);
+	if (!addr)
+		goto out_free_sg;
+
+	sg_set_buf(sg, addr, length);
+	*nents = 1;
+	return sg;
+
+out_free_sg:
+	kfree(sg);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
+
+/**
+ * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ */
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
+{
+	struct scatterlist *sg;
+	int count;
+
+	for_each_sg(sgl, sg, INT_MAX, count) {
+		if (!sg)
+			break;
+
+		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
+	}
+	kfree(sgl);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
+
+/**
+ * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
+ *	other devices with pci_p2pmem_find()
+ * @pdev: the device with peer-to-peer DMA memory to publish
+ * @publish: set to true to publish the memory, false to unpublish it
+ *
+ * Published memory can be used by other PCI device drivers for
+ * peer-2-peer DMA operations. Non-published memory is reserved for
+ * exlusive use of the device driver that registers the peer-to-peer
+ * memory.
+ */
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+	if (publish && !pdev->p2pdma)
+		return;
+
+	pdev->p2pdma->p2pmem_published = publish;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e763557..9553370ebdad 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,16 @@ struct vmem_altmap {
  * wakeup event whenever a page is unpinned and becomes idle. This
  * wakeup is used to coordinate physical address space management (ex:
  * fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_PCI_P2PDMA:
+ * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
+ * transactions.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
+	MEMORY_DEVICE_PCI_P2PDMA,
 };
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..2055df412a77 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
+}
+#else /* CONFIG_PCI_P2PDMA */
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
new file mode 100644
index 000000000000..7b2b0f547528
--- /dev/null
+++ b/include/linux/pci-p2pdma.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ *
+ */
+
+#ifndef _LINUX_PCI_P2PDMA_H
+#define _LINUX_PCI_P2PDMA_H
+
+#include <linux/pci.h>
+
+struct block_device;
+struct scatterlist;
+
+#ifdef CONFIG_PCI_P2PDMA
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+		u64 offset);
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_client_list_free(struct list_head *head);
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose);
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients);
+bool pci_has_p2pmem(struct pci_dev *pdev);
+struct pci_dev *pci_p2pmem_find(struct list_head *clients);
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length);
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+#else /* CONFIG_PCI_P2PDMA */
+static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
+		size_t size, u64 offset)
+{
+	return -EOPNOTSUPP;
+}
+static inline int pci_p2pdma_add_client(struct list_head *head,
+		struct device *dev)
+{
+	return 0;
+}
+static inline void pci_p2pdma_remove_client(struct list_head *head,
+		struct device *dev)
+{
+}
+static inline void pci_p2pdma_client_list_free(struct list_head *head)
+{
+}
+static inline int pci_p2pdma_distance(struct pci_dev *provider,
+				      struct list_head *clients,
+				      bool verbose)
+{
+	return -1;
+}
+static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+					      struct list_head *clients)
+{
+	return false;
+}
+static inline bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return false;
+}
+static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	return NULL;
+}
+static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	return NULL;
+}
+static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
+		size_t size)
+{
+}
+static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
+						    void *addr)
+{
+	return 0;
+}
+static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+		unsigned int *nents, u32 length)
+{
+	return NULL;
+}
+static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
+		struct scatterlist *sgl)
+{
+}
+static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+}
+#endif /* CONFIG_PCI_P2PDMA */
+#endif /* _LINUX_PCI_P2P_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..5d95dbf21f4a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -281,6 +281,7 @@ struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
+struct pci_p2pdma;
 
 /* The pci_dev structure describes PCI devices */
 struct pci_dev {
@@ -439,6 +440,9 @@ struct pci_dev {
 #ifdef CONFIG_PCI_PASID
 	u16		pasid_features;
 #endif
+#ifdef CONFIG_PCI_P2PDMA
+	struct pci_p2pdma *p2pdma;
+#endif
 	phys_addr_t	rom;		/* Physical address if not from BAR */
 	size_t		romlen;		/* Length if not from BAR */
 	char		*driver_override; /* Driver name to force a match */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

The new interface requires a driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary (adding incompatible clients
will fail). With a suitable p2pmem device, memory can then be
allocated with pci_alloc_p2pmem() for use in DMA transactions.

Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/Kconfig        |  17 +
 drivers/pci/Makefile       |   1 +
 drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |   5 +
 include/linux/mm.h         |  18 ++
 include/linux/pci-p2pdma.h | 102 ++++++
 include/linux/pci.h        |   4 +
 7 files changed, 908 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 56ff8f6d31fc..deb68be4fdac 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -132,6 +132,23 @@ config PCI_PASID
 
 	  If unsure, say N.
 
+config PCI_P2PDMA
+	bool "PCI peer-to-peer transfer support"
+	depends on PCI && ZONE_DEVICE
+	select GENERIC_ALLOCATOR
+	help
+	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
+	  BARs that are exposed in other devices that are the part of
+	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+	  specification to work (ie. anything below a single PCI bridge).
+
+	  Many PCIe root complexes do not support P2P transactions and
+	  it's hard to tell which support it at all, so at this time,
+	  P2P DMA transations must be between devices behind the same root
+	  port.
+
+	  If unsure, say N.
+
 config PCI_LABEL
 	def_bool y if (DMI || ACPI)
 	depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 1b2cfe51e8d7..85f4a703b2be 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
 obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
 obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM)		+= ecam.o
+obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 # Endpoint library must be initialized before its users
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index 000000000000..88aaec5351cd
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,761 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ */
+
+#define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/pci-p2pdma.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/genalloc.h>
+#include <linux/memremap.h>
+#include <linux/percpu-refcount.h>
+#include <linux/random.h>
+#include <linux/seq_buf.h>
+
+struct pci_p2pdma {
+	struct percpu_ref devmap_ref;
+	struct completion devmap_ref_done;
+	struct gen_pool *pool;
+	bool p2pmem_published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+	struct pci_p2pdma *p2p =
+		container_of(ref, struct pci_p2pdma, devmap_ref);
+
+	complete_all(&p2p->devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+	struct percpu_ref *ref = data;
+
+	if (percpu_ref_is_dying(ref))
+		return;
+
+	percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void *data)
+{
+	struct pci_dev *pdev = data;
+
+	if (!pdev->p2pdma)
+		return;
+
+	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
+	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
+
+	gen_pool_destroy(pdev->p2pdma->pool);
+	pdev->p2pdma = NULL;
+}
+
+static int pci_p2pdma_setup(struct pci_dev *pdev)
+{
+	int error = -ENOMEM;
+	struct pci_p2pdma *p2p;
+
+	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
+	if (!p2p)
+		return -ENOMEM;
+
+	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
+	if (!p2p->pool)
+		goto out;
+
+	init_completion(&p2p->devmap_ref_done);
+	error = percpu_ref_init(&p2p->devmap_ref,
+			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+	if (error)
+		goto out_pool_destroy;
+
+	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+	if (error)
+		goto out_pool_destroy;
+
+	pdev->p2pdma = p2p;
+
+	return 0;
+
+out_pool_destroy:
+	gen_pool_destroy(p2p->pool);
+out:
+	devm_kfree(&pdev->dev, p2p);
+	return error;
+}
+
+/**
+ * pci_p2pdma_add_resource - add memory for use as p2p memory
+ * @pdev: the device to add the memory to
+ * @bar: PCI BAR to add
+ * @size: size of the memory to add, may be zero to use the whole BAR
+ * @offset: offset into the PCI BAR
+ *
+ * The memory will be given ZONE_DEVICE struct pages so that it may
+ * be used with any DMA request.
+ */
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+			    u64 offset)
+{
+	struct dev_pagemap *pgmap;
+	void *addr;
+	int error;
+
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+		return -EINVAL;
+
+	if (offset >= pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!size)
+		size = pci_resource_len(pdev, bar) - offset;
+
+	if (size + offset > pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!pdev->p2pdma) {
+		error = pci_p2pdma_setup(pdev);
+		if (error)
+			return error;
+	}
+
+	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
+	if (!pgmap)
+		return -ENOMEM;
+
+	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
+	pgmap->res.end = pgmap->res.start + size - 1;
+	pgmap->res.flags = pci_resource_flags(pdev, bar);
+	pgmap->ref = &pdev->p2pdma->devmap_ref;
+	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+
+	addr = devm_memremap_pages(&pdev->dev, pgmap);
+	if (IS_ERR(addr)) {
+		error = PTR_ERR(addr);
+		goto pgmap_free;
+	}
+
+	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
+			pci_bus_address(pdev, bar) + offset,
+			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
+	if (error)
+		goto pgmap_free;
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
+					  &pdev->p2pdma->devmap_ref);
+	if (error)
+		goto pgmap_free;
+
+	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
+		 &pgmap->res);
+
+	return 0;
+
+pgmap_free:
+	devres_free(pgmap);
+	return error;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+
+static struct pci_dev *find_parent_pci_dev(struct device *dev)
+{
+	struct device *parent;
+
+	dev = get_device(dev);
+
+	while (dev) {
+		if (dev_is_pci(dev))
+			return to_pci_dev(dev);
+
+		parent = get_device(dev->parent);
+		put_device(dev);
+		dev = parent;
+	}
+
+	return NULL;
+}
+
+/*
+ * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
+ * TLPs upstream via ACS. Returns 1 if the packets will be redirected
+ * upstream, 0 otherwise.
+ */
+static int pci_bridge_has_acs_redir(struct pci_dev *dev)
+{
+	int pos;
+	u16 ctrl;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return 0;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
+		return 1;
+
+	return 0;
+}
+
+static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
+{
+	if (!buf)
+		return;
+
+	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
+		       dev->bus->number, PCI_SLOT(dev->devfn),
+		       PCI_FUNC(dev->devfn));
+}
+
+/*
+ * Find the distance through the nearest common upstream bridge between
+ * two PCI devices.
+ *
+ * If the two devices are the same device then 0 will be returned.
+ *
+ * If there are two virtual functions of the same device behind the same
+ * bridge port then 2 will be returned (one step down to the PCIe switch,
+ * then one step back to the same device).
+ *
+ * In the case where two devices are connected to the same PCIe switch, the
+ * value 4 will be returned. This corresponds to the following PCI tree:
+ *
+ *     -+  Root Port
+ *      \+ Switch Upstream Port
+ *       +-+ Switch Downstream Port
+ *       + \- Device A
+ *       \-+ Switch Downstream Port
+ *         \- Device B
+ *
+ * The distance is 4 because we traverse from Device A through the downstream
+ * port of the switch, to the common upstream port, back up to the second
+ * downstream port and then to Device B.
+ *
+ * Any two devices that don't have a common upstream bridge will return -1.
+ * In this way devices on separate PCIe root ports will be rejected, which
+ * is what we want for peer-to-peer seeing each PCIe root port defines a
+ * separate hierarchy domain and there's no way to determine whether the root
+ * complex supports forwarding between them.
+ *
+ * In the case where two devices are connected to different PCIe switches,
+ * this function will still return a positive distance as long as both
+ * switches evenutally have a common upstream bridge. Note this covers
+ * the case of using multiple PCIe switches to achieve a desired level of
+ * fan-out from a root port. The exact distance will be a function of the
+ * number of switches between Device A and Device B.
+ *
+ * If a bridge which has any ACS redirection bits set is in the path
+ * then this functions will return -2. This is so we reject any
+ * cases where the TLPs are forwarded up into the root complex.
+ * In this case, a list of all infringing bridge addresses will be
+ * populated in acs_list (assuming it's non-null) for printk purposes.
+ */
+static int upstream_bridge_distance(struct pci_dev *a,
+				    struct pci_dev *b,
+				    struct seq_buf *acs_list)
+{
+	int dist_a = 0;
+	int dist_b = 0;
+	struct pci_dev *bb = NULL;
+	int acs_cnt = 0;
+
+	/*
+	 * Note, we don't need to take references to devices returned by
+	 * pci_upstream_bridge() seeing we hold a reference to a child
+	 * device which will already hold a reference to the upstream bridge.
+	 */
+
+	while (a) {
+		dist_b = 0;
+
+		if (pci_bridge_has_acs_redir(a)) {
+			seq_buf_print_bus_devfn(acs_list, a);
+			acs_cnt++;
+		}
+
+		bb = b;
+
+		while (bb) {
+			if (a == bb)
+				goto check_b_path_acs;
+
+			bb = pci_upstream_bridge(bb);
+			dist_b++;
+		}
+
+		a = pci_upstream_bridge(a);
+		dist_a++;
+	}
+
+	return -1;
+
+check_b_path_acs:
+	bb = b;
+
+	while (bb) {
+		if (a == bb)
+			break;
+
+		if (pci_bridge_has_acs_redir(bb)) {
+			seq_buf_print_bus_devfn(acs_list, bb);
+			acs_cnt++;
+		}
+
+		bb = pci_upstream_bridge(bb);
+	}
+
+	if (acs_cnt)
+		return -2;
+
+	return dist_a + dist_b;
+}
+
+static int upstream_bridge_distance_warn(struct pci_dev *provider,
+					 struct pci_dev *client)
+{
+	struct seq_buf acs_list;
+	int ret;
+
+	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
+
+	ret = upstream_bridge_distance(provider, client, &acs_list);
+	if (ret == -2) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
+		/* Drop final semicolon */
+		acs_list.buffer[acs_list.len-1] = 0;
+		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+			 acs_list.buffer);
+
+	} else if (ret < 0) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
+	}
+
+	kfree(acs_list.buffer);
+
+	return ret;
+}
+
+struct pci_p2pdma_client {
+	struct list_head list;
+	struct pci_dev *client;
+	struct pci_dev *provider;
+};
+
+/**
+ * pci_p2pdma_add_client - allocate a new element in a client device list
+ * @head: list head of p2pdma clients
+ * @dev: device to add to the list
+ *
+ * This adds @dev to a list of clients used by a p2pdma device.
+ * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
+ * been called successfully, the list will be bound to a specific p2pdma
+ * device and new clients can only be added to the list if they are
+ * supported by that p2pdma device.
+ *
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ *
+ * Returns 0 if the client was successfully added.
+ */
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *item, *new_item;
+	struct pci_dev *provider = NULL;
+	struct pci_dev *client;
+	int ret;
+
+	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
+		return -ENODEV;
+	}
+
+	client = find_parent_pci_dev(dev);
+	if (!client) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
+		return -ENODEV;
+	}
+
+	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
+	if (item && item->provider) {
+		provider = item->provider;
+
+		ret = upstream_bridge_distance_warn(provider, client);
+		if (ret < 0) {
+			ret = -EXDEV;
+			goto put_client;
+		}
+	}
+
+	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
+	if (!new_item) {
+		ret = -ENOMEM;
+		goto put_client;
+	}
+
+	new_item->client = client;
+	new_item->provider = pci_dev_get(provider);
+
+	list_add_tail(&new_item->list, head);
+
+	return 0;
+
+put_client:
+	pci_dev_put(client);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
+
+static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
+{
+	list_del(&item->list);
+	pci_dev_put(item->client);
+	pci_dev_put(item->provider);
+	kfree(item);
+}
+
+/**
+ * pci_p2pdma_remove_client - remove and free a p2pdma client
+ * @head: list head of p2pdma clients
+ * @dev: device to remove from the list
+ *
+ * This removes @dev from a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+	struct pci_dev *pdev;
+
+	pdev = find_parent_pci_dev(dev);
+	if (!pdev)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, head, list) {
+		if (pos->client != pdev)
+			continue;
+
+		pci_p2pdma_client_free(pos);
+	}
+
+	pci_dev_put(pdev);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
+
+/**
+ * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
+ * @head: list head of p2pdma clients
+ *
+ * This removes all devices in a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2pdma functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_client_list_free(struct list_head *head)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, head, list)
+		pci_p2pdma_client_free(pos);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
+
+/**
+ * pci_p2pdma_distance - Determive the cumulative distance between
+ *	a p2pdma provider and the clients in use.
+ * @provider: p2pdma provider to check against the client list
+ * @clients: list of devices to check (NULL-terminated)
+ * @verbose: if true, print warnings for devices when we return -1
+ *
+ * Returns -1 if any of the clients are not compatible (behind the same
+ * root port as the provider), otherwise returns a positive number where
+ * the lower number is the preferrable choice. (If there's one client
+ * that's the same as the provider it will return 0, which is best choice).
+ *
+ * For now, "compatible" means the provider and the clients are all behind
+ * the same PCI root port. This cuts out cases that may work but is safest
+ * for the user. Future work can expand this to white-list root complexes that
+ * can safely forward between each ports.
+ */
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose)
+{
+	struct pci_p2pdma_client *pos;
+	int ret;
+	int distance = 0;
+	bool not_supported = false;
+
+	if (list_empty(clients))
+		return -1;
+
+	list_for_each_entry(pos, clients, list) {
+		if (verbose)
+			ret = upstream_bridge_distance_warn(provider,
+							    pos->client);
+		else
+			ret = upstream_bridge_distance(provider, pos->client,
+						       NULL);
+
+		if (ret < 0)
+			not_supported = true;
+
+		if (not_supported && !verbose)
+			break;
+
+		distance += ret;
+	}
+
+	if (not_supported)
+		return -1;
+
+	return distance;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
+
+/**
+ * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
+ *	and assign a provider to a list of clients
+ * @provider: p2pdma provider to assign to the client list
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * Returns false if any of the clients are not compatible, true if the
+ * provider was successfully assigned to the clients.
+ */
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients)
+{
+	struct pci_p2pdma_client *pos;
+
+	if (pci_p2pdma_distance(provider, clients, true) < 0)
+		return false;
+
+	list_for_each_entry(pos, clients, list)
+		pos->provider = provider;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
+
+/**
+ * pci_has_p2pmem - check if a given PCI device has published any p2pmem
+ * @pdev: PCI device to check
+ */
+bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
+}
+EXPORT_SYMBOL_GPL(pci_has_p2pmem);
+
+/**
+ * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
+ *	the specified list of clients and shortest distance (as determined
+ *	by pci_p2pmem_dma())
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * If multiple devices are behind the same switch, the one "closest" to the
+ * client devices in use will be chosen first. (So if one of the providers are
+ * the same as one of the clients, that provider will be used ahead of any
+ * other providers that are unrelated). If multiple providers are an equal
+ * distance away, one will be chosen at random.
+ *
+ * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
+ * to return the reference) or NULL if no compatible device is found. The
+ * found provider will also be assigned to the client list.
+ */
+struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	struct pci_dev *pdev = NULL;
+	struct pci_p2pdma_client *pos;
+	int distance;
+	int closest_distance = INT_MAX;
+	struct pci_dev **closest_pdevs;
+	int dev_cnt = 0;
+	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
+	int i;
+
+	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
+		if (!pci_has_p2pmem(pdev))
+			continue;
+
+		distance = pci_p2pdma_distance(pdev, clients, false);
+		if (distance < 0 || distance > closest_distance)
+			continue;
+
+		if (distance == closest_distance && dev_cnt >= max_devs)
+			continue;
+
+		if (distance < closest_distance) {
+			for (i = 0; i < dev_cnt; i++)
+				pci_dev_put(closest_pdevs[i]);
+
+			dev_cnt = 0;
+			closest_distance = distance;
+		}
+
+		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
+	}
+
+	if (dev_cnt)
+		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
+
+	for (i = 0; i < dev_cnt; i++)
+		pci_dev_put(closest_pdevs[i]);
+
+	if (pdev)
+		list_for_each_entry(pos, clients, list)
+			pos->provider = pdev;
+
+	kfree(closest_pdevs);
+	return pdev;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_find);
+
+/**
+ * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
+ * @pdev: the device to allocate memory from
+ * @size: number of bytes to allocate
+ *
+ * Returns the allocated memory or NULL on error.
+ */
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	void *ret;
+
+	if (unlikely(!pdev->p2pdma))
+		return NULL;
+
+	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
+		return NULL;
+
+	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
+
+	if (unlikely(!ret))
+		percpu_ref_put(&pdev->p2pdma->devmap_ref);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
+
+/**
+ * pci_free_p2pmem - allocate peer-to-peer DMA memory
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ * @size: number of bytes that was allocated
+ */
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
+{
+	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
+	percpu_ref_put(&pdev->p2pdma->devmap_ref);
+}
+EXPORT_SYMBOL_GPL(pci_free_p2pmem);
+
+/**
+ * pci_virt_to_bus - return the PCI bus address for a given virtual
+ *	address obtained with pci_alloc_p2pmem()
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+	if (!addr)
+		return 0;
+	if (!pdev->p2pdma)
+		return 0;
+
+	/*
+	 * Note: when we added the memory to the pool we used the PCI
+	 * bus address as the physical address. So gen_pool_virt_to_phys()
+	 * actually returns the bus address despite the misleading name.
+	 */
+	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
+
+/**
+ * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ * @length: number of bytes to allocate
+ *
+ * Returns 0 on success
+ */
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length)
+{
+	struct scatterlist *sg;
+	void *addr;
+
+	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
+	if (!sg)
+		return NULL;
+
+	sg_init_table(sg, 1);
+
+	addr = pci_alloc_p2pmem(pdev, length);
+	if (!addr)
+		goto out_free_sg;
+
+	sg_set_buf(sg, addr, length);
+	*nents = 1;
+	return sg;
+
+out_free_sg:
+	kfree(sg);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
+
+/**
+ * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ */
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
+{
+	struct scatterlist *sg;
+	int count;
+
+	for_each_sg(sgl, sg, INT_MAX, count) {
+		if (!sg)
+			break;
+
+		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
+	}
+	kfree(sgl);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
+
+/**
+ * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
+ *	other devices with pci_p2pmem_find()
+ * @pdev: the device with peer-to-peer DMA memory to publish
+ * @publish: set to true to publish the memory, false to unpublish it
+ *
+ * Published memory can be used by other PCI device drivers for
+ * peer-2-peer DMA operations. Non-published memory is reserved for
+ * exlusive use of the device driver that registers the peer-to-peer
+ * memory.
+ */
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+	if (publish && !pdev->p2pdma)
+		return;
+
+	pdev->p2pdma->p2pmem_published = publish;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e763557..9553370ebdad 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,16 @@ struct vmem_altmap {
  * wakeup event whenever a page is unpinned and becomes idle. This
  * wakeup is used to coordinate physical address space management (ex:
  * fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_PCI_P2PDMA:
+ * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
+ * transactions.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
+	MEMORY_DEVICE_PCI_P2PDMA,
 };
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..2055df412a77 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
+}
+#else /* CONFIG_PCI_P2PDMA */
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
new file mode 100644
index 000000000000..7b2b0f547528
--- /dev/null
+++ b/include/linux/pci-p2pdma.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ *
+ */
+
+#ifndef _LINUX_PCI_P2PDMA_H
+#define _LINUX_PCI_P2PDMA_H
+
+#include <linux/pci.h>
+
+struct block_device;
+struct scatterlist;
+
+#ifdef CONFIG_PCI_P2PDMA
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+		u64 offset);
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_client_list_free(struct list_head *head);
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose);
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients);
+bool pci_has_p2pmem(struct pci_dev *pdev);
+struct pci_dev *pci_p2pmem_find(struct list_head *clients);
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length);
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+#else /* CONFIG_PCI_P2PDMA */
+static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
+		size_t size, u64 offset)
+{
+	return -EOPNOTSUPP;
+}
+static inline int pci_p2pdma_add_client(struct list_head *head,
+		struct device *dev)
+{
+	return 0;
+}
+static inline void pci_p2pdma_remove_client(struct list_head *head,
+		struct device *dev)
+{
+}
+static inline void pci_p2pdma_client_list_free(struct list_head *head)
+{
+}
+static inline int pci_p2pdma_distance(struct pci_dev *provider,
+				      struct list_head *clients,
+				      bool verbose)
+{
+	return -1;
+}
+static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+					      struct list_head *clients)
+{
+	return false;
+}
+static inline bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return false;
+}
+static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	return NULL;
+}
+static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	return NULL;
+}
+static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
+		size_t size)
+{
+}
+static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
+						    void *addr)
+{
+	return 0;
+}
+static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+		unsigned int *nents, u32 length)
+{
+	return NULL;
+}
+static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
+		struct scatterlist *sgl)
+{
+}
+static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+}
+#endif /* CONFIG_PCI_P2PDMA */
+#endif /* _LINUX_PCI_P2P_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..5d95dbf21f4a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -281,6 +281,7 @@ struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
+struct pci_p2pdma;
 
 /* The pci_dev structure describes PCI devices */
 struct pci_dev {
@@ -439,6 +440,9 @@ struct pci_dev {
 #ifdef CONFIG_PCI_PASID
 	u16		pasid_features;
 #endif
+#ifdef CONFIG_PCI_P2PDMA
+	struct pci_p2pdma *p2pdma;
+#endif
 	phys_addr_t	rom;		/* Physical address if not from BAR */
 	size_t		romlen;		/* Length if not from BAR */
 	char		*driver_override; /* Driver name to force a match */
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

The new interface requires a driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary (adding incompatible clients
will fail). With a suitable p2pmem device, memory can then be
allocated with pci_alloc_p2pmem() for use in DMA transactions.

Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 drivers/pci/Kconfig        |  17 +
 drivers/pci/Makefile       |   1 +
 drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |   5 +
 include/linux/mm.h         |  18 ++
 include/linux/pci-p2pdma.h | 102 ++++++
 include/linux/pci.h        |   4 +
 7 files changed, 908 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 56ff8f6d31fc..deb68be4fdac 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -132,6 +132,23 @@ config PCI_PASID
 
 	  If unsure, say N.
 
+config PCI_P2PDMA
+	bool "PCI peer-to-peer transfer support"
+	depends on PCI && ZONE_DEVICE
+	select GENERIC_ALLOCATOR
+	help
+	  Enable? drivers to do PCI peer-to-peer transactions to and from
+	  BARs that are exposed in other devices that are the part of
+	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+	  specification to work (ie. anything below a single PCI bridge).
+
+	  Many PCIe root complexes do not support P2P transactions and
+	  it's hard to tell which support it at all, so at this time,
+	  P2P DMA transations must be between devices behind the same root
+	  port.
+
+	  If unsure, say N.
+
 config PCI_LABEL
 	def_bool y if (DMI || ACPI)
 	depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 1b2cfe51e8d7..85f4a703b2be 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
 obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
 obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM)		+= ecam.o
+obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 # Endpoint library must be initialized before its users
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index 000000000000..88aaec5351cd
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,761 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ */
+
+#define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/pci-p2pdma.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/genalloc.h>
+#include <linux/memremap.h>
+#include <linux/percpu-refcount.h>
+#include <linux/random.h>
+#include <linux/seq_buf.h>
+
+struct pci_p2pdma {
+	struct percpu_ref devmap_ref;
+	struct completion devmap_ref_done;
+	struct gen_pool *pool;
+	bool p2pmem_published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+	struct pci_p2pdma *p2p =
+		container_of(ref, struct pci_p2pdma, devmap_ref);
+
+	complete_all(&p2p->devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+	struct percpu_ref *ref = data;
+
+	if (percpu_ref_is_dying(ref))
+		return;
+
+	percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void *data)
+{
+	struct pci_dev *pdev = data;
+
+	if (!pdev->p2pdma)
+		return;
+
+	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
+	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
+
+	gen_pool_destroy(pdev->p2pdma->pool);
+	pdev->p2pdma = NULL;
+}
+
+static int pci_p2pdma_setup(struct pci_dev *pdev)
+{
+	int error = -ENOMEM;
+	struct pci_p2pdma *p2p;
+
+	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
+	if (!p2p)
+		return -ENOMEM;
+
+	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
+	if (!p2p->pool)
+		goto out;
+
+	init_completion(&p2p->devmap_ref_done);
+	error = percpu_ref_init(&p2p->devmap_ref,
+			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+	if (error)
+		goto out_pool_destroy;
+
+	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
+	if (error)
+		goto out_pool_destroy;
+
+	pdev->p2pdma = p2p;
+
+	return 0;
+
+out_pool_destroy:
+	gen_pool_destroy(p2p->pool);
+out:
+	devm_kfree(&pdev->dev, p2p);
+	return error;
+}
+
+/**
+ * pci_p2pdma_add_resource - add memory for use as p2p memory
+ * @pdev: the device to add the memory to
+ * @bar: PCI BAR to add
+ * @size: size of the memory to add, may be zero to use the whole BAR
+ * @offset: offset into the PCI BAR
+ *
+ * The memory will be given ZONE_DEVICE struct pages so that it may
+ * be used with any DMA request.
+ */
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+			    u64 offset)
+{
+	struct dev_pagemap *pgmap;
+	void *addr;
+	int error;
+
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+		return -EINVAL;
+
+	if (offset >= pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!size)
+		size = pci_resource_len(pdev, bar) - offset;
+
+	if (size + offset > pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	if (!pdev->p2pdma) {
+		error = pci_p2pdma_setup(pdev);
+		if (error)
+			return error;
+	}
+
+	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
+	if (!pgmap)
+		return -ENOMEM;
+
+	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
+	pgmap->res.end = pgmap->res.start + size - 1;
+	pgmap->res.flags = pci_resource_flags(pdev, bar);
+	pgmap->ref = &pdev->p2pdma->devmap_ref;
+	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+
+	addr = devm_memremap_pages(&pdev->dev, pgmap);
+	if (IS_ERR(addr)) {
+		error = PTR_ERR(addr);
+		goto pgmap_free;
+	}
+
+	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
+			pci_bus_address(pdev, bar) + offset,
+			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
+	if (error)
+		goto pgmap_free;
+
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
+					  &pdev->p2pdma->devmap_ref);
+	if (error)
+		goto pgmap_free;
+
+	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
+		 &pgmap->res);
+
+	return 0;
+
+pgmap_free:
+	devres_free(pgmap);
+	return error;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+
+static struct pci_dev *find_parent_pci_dev(struct device *dev)
+{
+	struct device *parent;
+
+	dev = get_device(dev);
+
+	while (dev) {
+		if (dev_is_pci(dev))
+			return to_pci_dev(dev);
+
+		parent = get_device(dev->parent);
+		put_device(dev);
+		dev = parent;
+	}
+
+	return NULL;
+}
+
+/*
+ * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
+ * TLPs upstream via ACS. Returns 1 if the packets will be redirected
+ * upstream, 0 otherwise.
+ */
+static int pci_bridge_has_acs_redir(struct pci_dev *dev)
+{
+	int pos;
+	u16 ctrl;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return 0;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
+		return 1;
+
+	return 0;
+}
+
+static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
+{
+	if (!buf)
+		return;
+
+	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
+		       dev->bus->number, PCI_SLOT(dev->devfn),
+		       PCI_FUNC(dev->devfn));
+}
+
+/*
+ * Find the distance through the nearest common upstream bridge between
+ * two PCI devices.
+ *
+ * If the two devices are the same device then 0 will be returned.
+ *
+ * If there are two virtual functions of the same device behind the same
+ * bridge port then 2 will be returned (one step down to the PCIe switch,
+ * then one step back to the same device).
+ *
+ * In the case where two devices are connected to the same PCIe switch, the
+ * value 4 will be returned. This corresponds to the following PCI tree:
+ *
+ *     -+  Root Port
+ *      \+ Switch Upstream Port
+ *       +-+ Switch Downstream Port
+ *       + \- Device A
+ *       \-+ Switch Downstream Port
+ *         \- Device B
+ *
+ * The distance is 4 because we traverse from Device A through the downstream
+ * port of the switch, to the common upstream port, back up to the second
+ * downstream port and then to Device B.
+ *
+ * Any two devices that don't have a common upstream bridge will return -1.
+ * In this way devices on separate PCIe root ports will be rejected, which
+ * is what we want for peer-to-peer seeing each PCIe root port defines a
+ * separate hierarchy domain and there's no way to determine whether the root
+ * complex supports forwarding between them.
+ *
+ * In the case where two devices are connected to different PCIe switches,
+ * this function will still return a positive distance as long as both
+ * switches evenutally have a common upstream bridge. Note this covers
+ * the case of using multiple PCIe switches to achieve a desired level of
+ * fan-out from a root port. The exact distance will be a function of the
+ * number of switches between Device A and Device B.
+ *
+ * If a bridge which has any ACS redirection bits set is in the path
+ * then this functions will return -2. This is so we reject any
+ * cases where the TLPs are forwarded up into the root complex.
+ * In this case, a list of all infringing bridge addresses will be
+ * populated in acs_list (assuming it's non-null) for printk purposes.
+ */
+static int upstream_bridge_distance(struct pci_dev *a,
+				    struct pci_dev *b,
+				    struct seq_buf *acs_list)
+{
+	int dist_a = 0;
+	int dist_b = 0;
+	struct pci_dev *bb = NULL;
+	int acs_cnt = 0;
+
+	/*
+	 * Note, we don't need to take references to devices returned by
+	 * pci_upstream_bridge() seeing we hold a reference to a child
+	 * device which will already hold a reference to the upstream bridge.
+	 */
+
+	while (a) {
+		dist_b = 0;
+
+		if (pci_bridge_has_acs_redir(a)) {
+			seq_buf_print_bus_devfn(acs_list, a);
+			acs_cnt++;
+		}
+
+		bb = b;
+
+		while (bb) {
+			if (a == bb)
+				goto check_b_path_acs;
+
+			bb = pci_upstream_bridge(bb);
+			dist_b++;
+		}
+
+		a = pci_upstream_bridge(a);
+		dist_a++;
+	}
+
+	return -1;
+
+check_b_path_acs:
+	bb = b;
+
+	while (bb) {
+		if (a == bb)
+			break;
+
+		if (pci_bridge_has_acs_redir(bb)) {
+			seq_buf_print_bus_devfn(acs_list, bb);
+			acs_cnt++;
+		}
+
+		bb = pci_upstream_bridge(bb);
+	}
+
+	if (acs_cnt)
+		return -2;
+
+	return dist_a + dist_b;
+}
+
+static int upstream_bridge_distance_warn(struct pci_dev *provider,
+					 struct pci_dev *client)
+{
+	struct seq_buf acs_list;
+	int ret;
+
+	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
+
+	ret = upstream_bridge_distance(provider, client, &acs_list);
+	if (ret == -2) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
+		/* Drop final semicolon */
+		acs_list.buffer[acs_list.len-1] = 0;
+		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+			 acs_list.buffer);
+
+	} else if (ret < 0) {
+		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
+	}
+
+	kfree(acs_list.buffer);
+
+	return ret;
+}
+
+struct pci_p2pdma_client {
+	struct list_head list;
+	struct pci_dev *client;
+	struct pci_dev *provider;
+};
+
+/**
+ * pci_p2pdma_add_client - allocate a new element in a client device list
+ * @head: list head of p2pdma clients
+ * @dev: device to add to the list
+ *
+ * This adds @dev to a list of clients used by a p2pdma device.
+ * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
+ * been called successfully, the list will be bound to a specific p2pdma
+ * device and new clients can only be added to the list if they are
+ * supported by that p2pdma device.
+ *
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ *
+ * Returns 0 if the client was successfully added.
+ */
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *item, *new_item;
+	struct pci_dev *provider = NULL;
+	struct pci_dev *client;
+	int ret;
+
+	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
+		return -ENODEV;
+	}
+
+	client = find_parent_pci_dev(dev);
+	if (!client) {
+		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
+		return -ENODEV;
+	}
+
+	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
+	if (item && item->provider) {
+		provider = item->provider;
+
+		ret = upstream_bridge_distance_warn(provider, client);
+		if (ret < 0) {
+			ret = -EXDEV;
+			goto put_client;
+		}
+	}
+
+	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
+	if (!new_item) {
+		ret = -ENOMEM;
+		goto put_client;
+	}
+
+	new_item->client = client;
+	new_item->provider = pci_dev_get(provider);
+
+	list_add_tail(&new_item->list, head);
+
+	return 0;
+
+put_client:
+	pci_dev_put(client);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
+
+static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
+{
+	list_del(&item->list);
+	pci_dev_put(item->client);
+	pci_dev_put(item->provider);
+	kfree(item);
+}
+
+/**
+ * pci_p2pdma_remove_client - remove and free a p2pdma client
+ * @head: list head of p2pdma clients
+ * @dev: device to remove from the list
+ *
+ * This removes @dev from a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2p functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+	struct pci_dev *pdev;
+
+	pdev = find_parent_pci_dev(dev);
+	if (!pdev)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, head, list) {
+		if (pos->client != pdev)
+			continue;
+
+		pci_p2pdma_client_free(pos);
+	}
+
+	pci_dev_put(pdev);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
+
+/**
+ * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
+ * @head: list head of p2pdma clients
+ *
+ * This removes all devices in a list of clients used by a p2pdma device.
+ * The caller is expected to have a lock which protects @head as necessary
+ * so that none of the pci_p2pdma functions can be called concurrently
+ * on that list.
+ */
+void pci_p2pdma_client_list_free(struct list_head *head)
+{
+	struct pci_p2pdma_client *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, head, list)
+		pci_p2pdma_client_free(pos);
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
+
+/**
+ * pci_p2pdma_distance - Determive the cumulative distance between
+ *	a p2pdma provider and the clients in use.
+ * @provider: p2pdma provider to check against the client list
+ * @clients: list of devices to check (NULL-terminated)
+ * @verbose: if true, print warnings for devices when we return -1
+ *
+ * Returns -1 if any of the clients are not compatible (behind the same
+ * root port as the provider), otherwise returns a positive number where
+ * the lower number is the preferrable choice. (If there's one client
+ * that's the same as the provider it will return 0, which is best choice).
+ *
+ * For now, "compatible" means the provider and the clients are all behind
+ * the same PCI root port. This cuts out cases that may work but is safest
+ * for the user. Future work can expand this to white-list root complexes that
+ * can safely forward between each ports.
+ */
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose)
+{
+	struct pci_p2pdma_client *pos;
+	int ret;
+	int distance = 0;
+	bool not_supported = false;
+
+	if (list_empty(clients))
+		return -1;
+
+	list_for_each_entry(pos, clients, list) {
+		if (verbose)
+			ret = upstream_bridge_distance_warn(provider,
+							    pos->client);
+		else
+			ret = upstream_bridge_distance(provider, pos->client,
+						       NULL);
+
+		if (ret < 0)
+			not_supported = true;
+
+		if (not_supported && !verbose)
+			break;
+
+		distance += ret;
+	}
+
+	if (not_supported)
+		return -1;
+
+	return distance;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
+
+/**
+ * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
+ *	and assign a provider to a list of clients
+ * @provider: p2pdma provider to assign to the client list
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * Returns false if any of the clients are not compatible, true if the
+ * provider was successfully assigned to the clients.
+ */
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients)
+{
+	struct pci_p2pdma_client *pos;
+
+	if (pci_p2pdma_distance(provider, clients, true) < 0)
+		return false;
+
+	list_for_each_entry(pos, clients, list)
+		pos->provider = provider;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
+
+/**
+ * pci_has_p2pmem - check if a given PCI device has published any p2pmem
+ * @pdev: PCI device to check
+ */
+bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
+}
+EXPORT_SYMBOL_GPL(pci_has_p2pmem);
+
+/**
+ * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
+ *	the specified list of clients and shortest distance (as determined
+ *	by pci_p2pmem_dma())
+ * @clients: list of devices to check (NULL-terminated)
+ *
+ * If multiple devices are behind the same switch, the one "closest" to the
+ * client devices in use will be chosen first. (So if one of the providers are
+ * the same as one of the clients, that provider will be used ahead of any
+ * other providers that are unrelated). If multiple providers are an equal
+ * distance away, one will be chosen at random.
+ *
+ * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
+ * to return the reference) or NULL if no compatible device is found. The
+ * found provider will also be assigned to the client list.
+ */
+struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	struct pci_dev *pdev = NULL;
+	struct pci_p2pdma_client *pos;
+	int distance;
+	int closest_distance = INT_MAX;
+	struct pci_dev **closest_pdevs;
+	int dev_cnt = 0;
+	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
+	int i;
+
+	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
+		if (!pci_has_p2pmem(pdev))
+			continue;
+
+		distance = pci_p2pdma_distance(pdev, clients, false);
+		if (distance < 0 || distance > closest_distance)
+			continue;
+
+		if (distance == closest_distance && dev_cnt >= max_devs)
+			continue;
+
+		if (distance < closest_distance) {
+			for (i = 0; i < dev_cnt; i++)
+				pci_dev_put(closest_pdevs[i]);
+
+			dev_cnt = 0;
+			closest_distance = distance;
+		}
+
+		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
+	}
+
+	if (dev_cnt)
+		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
+
+	for (i = 0; i < dev_cnt; i++)
+		pci_dev_put(closest_pdevs[i]);
+
+	if (pdev)
+		list_for_each_entry(pos, clients, list)
+			pos->provider = pdev;
+
+	kfree(closest_pdevs);
+	return pdev;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_find);
+
+/**
+ * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
+ * @pdev: the device to allocate memory from
+ * @size: number of bytes to allocate
+ *
+ * Returns the allocated memory or NULL on error.
+ */
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	void *ret;
+
+	if (unlikely(!pdev->p2pdma))
+		return NULL;
+
+	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
+		return NULL;
+
+	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
+
+	if (unlikely(!ret))
+		percpu_ref_put(&pdev->p2pdma->devmap_ref);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
+
+/**
+ * pci_free_p2pmem - allocate peer-to-peer DMA memory
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ * @size: number of bytes that was allocated
+ */
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
+{
+	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
+	percpu_ref_put(&pdev->p2pdma->devmap_ref);
+}
+EXPORT_SYMBOL_GPL(pci_free_p2pmem);
+
+/**
+ * pci_virt_to_bus - return the PCI bus address for a given virtual
+ *	address obtained with pci_alloc_p2pmem()
+ * @pdev: the device the memory was allocated from
+ * @addr: address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+	if (!addr)
+		return 0;
+	if (!pdev->p2pdma)
+		return 0;
+
+	/*
+	 * Note: when we added the memory to the pool we used the PCI
+	 * bus address as the physical address. So gen_pool_virt_to_phys()
+	 * actually returns the bus address despite the misleading name.
+	 */
+	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
+
+/**
+ * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ * @length: number of bytes to allocate
+ *
+ * Returns 0 on success
+ */
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length)
+{
+	struct scatterlist *sg;
+	void *addr;
+
+	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
+	if (!sg)
+		return NULL;
+
+	sg_init_table(sg, 1);
+
+	addr = pci_alloc_p2pmem(pdev, length);
+	if (!addr)
+		goto out_free_sg;
+
+	sg_set_buf(sg, addr, length);
+	*nents = 1;
+	return sg;
+
+out_free_sg:
+	kfree(sg);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
+
+/**
+ * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
+ * @pdev: the device to allocate memory from
+ * @sgl: the allocated scatterlist
+ * @nents: the number of SG entries in the list
+ */
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
+{
+	struct scatterlist *sg;
+	int count;
+
+	for_each_sg(sgl, sg, INT_MAX, count) {
+		if (!sg)
+			break;
+
+		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
+	}
+	kfree(sgl);
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
+
+/**
+ * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
+ *	other devices with pci_p2pmem_find()
+ * @pdev: the device with peer-to-peer DMA memory to publish
+ * @publish: set to true to publish the memory, false to unpublish it
+ *
+ * Published memory can be used by other PCI device drivers for
+ * peer-2-peer DMA operations. Non-published memory is reserved for
+ * exlusive use of the device driver that registers the peer-to-peer
+ * memory.
+ */
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+	if (publish && !pdev->p2pdma)
+		return;
+
+	pdev->p2pdma->p2pmem_published = publish;
+}
+EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e763557..9553370ebdad 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,16 @@ struct vmem_altmap {
  * wakeup event whenever a page is unpinned and becomes idle. This
  * wakeup is used to coordinate physical address space management (ex:
  * fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_PCI_P2PDMA:
+ * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
+ * transactions.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
+	MEMORY_DEVICE_PCI_P2PDMA,
 };
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..2055df412a77 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
+}
+#else /* CONFIG_PCI_P2PDMA */
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_pci_p2pdma_page(const struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
new file mode 100644
index 000000000000..7b2b0f547528
--- /dev/null
+++ b/include/linux/pci-p2pdma.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2018, Logan Gunthorpe
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig
+ * Copyright (c) 2018, Eideticom Inc.
+ *
+ */
+
+#ifndef _LINUX_PCI_P2PDMA_H
+#define _LINUX_PCI_P2PDMA_H
+
+#include <linux/pci.h>
+
+struct block_device;
+struct scatterlist;
+
+#ifdef CONFIG_PCI_P2PDMA
+int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
+		u64 offset);
+int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
+void pci_p2pdma_client_list_free(struct list_head *head);
+int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
+			bool verbose);
+bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+				struct list_head *clients);
+bool pci_has_p2pmem(struct pci_dev *pdev);
+struct pci_dev *pci_p2pmem_find(struct list_head *clients);
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
+void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
+struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+					 unsigned int *nents, u32 length);
+void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
+void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+#else /* CONFIG_PCI_P2PDMA */
+static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
+		size_t size, u64 offset)
+{
+	return -EOPNOTSUPP;
+}
+static inline int pci_p2pdma_add_client(struct list_head *head,
+		struct device *dev)
+{
+	return 0;
+}
+static inline void pci_p2pdma_remove_client(struct list_head *head,
+		struct device *dev)
+{
+}
+static inline void pci_p2pdma_client_list_free(struct list_head *head)
+{
+}
+static inline int pci_p2pdma_distance(struct pci_dev *provider,
+				      struct list_head *clients,
+				      bool verbose)
+{
+	return -1;
+}
+static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
+					      struct list_head *clients)
+{
+	return false;
+}
+static inline bool pci_has_p2pmem(struct pci_dev *pdev)
+{
+	return false;
+}
+static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
+{
+	return NULL;
+}
+static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+	return NULL;
+}
+static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
+		size_t size)
+{
+}
+static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
+						    void *addr)
+{
+	return 0;
+}
+static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
+		unsigned int *nents, u32 length)
+{
+	return NULL;
+}
+static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
+		struct scatterlist *sgl)
+{
+}
+static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
+{
+}
+#endif /* CONFIG_PCI_P2PDMA */
+#endif /* _LINUX_PCI_P2P_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..5d95dbf21f4a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -281,6 +281,7 @@ struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
+struct pci_p2pdma;
 
 /* The pci_dev structure describes PCI devices */
 struct pci_dev {
@@ -439,6 +440,9 @@ struct pci_dev {
 #ifdef CONFIG_PCI_PASID
 	u16		pasid_features;
 #endif
+#ifdef CONFIG_PCI_P2PDMA
+	struct pci_p2pdma *p2pdma;
+#endif
 	phys_addr_t	rom;		/* Physical address if not from BAR */
 	size_t		romlen;		/* Length if not from BAR */
 	char		*driver_override; /* Driver name to force a match */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.

Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 Documentation/ABI/testing/sysfs-bus-pci | 25 +++++++++++++++
 drivers/pci/p2pdma.c                    | 54 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 44d4b2be92fd..044812c816d0 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -323,3 +323,28 @@ Description:
 
 		This is similar to /sys/bus/pci/drivers_autoprobe, but
 		affects only the VFs associated with a specific PF.
+
+What:		/sys/bus/pci/devices/.../p2pmem/available
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the amount of memory that has not been
+		allocated (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/size
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the total amount of memory that the device
+		provides (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/published
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains a '1' if the memory has been published for
+		use inside the kernel or a '0' if it is only intended
+		for use within the driver that published it.
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 88aaec5351cd..67c1daf1189e 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -25,6 +25,54 @@ struct pci_p2pdma {
 	bool p2pmem_published;
 };
 
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t size = 0;
+
+	if (pdev->p2pdma->pool)
+		size = gen_pool_size(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t available_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t avail = 0;
+
+	if (pdev->p2pdma->pool)
+		avail = gen_pool_avail(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", avail);
+}
+static DEVICE_ATTR_RO(available);
+
+static ssize_t published_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			pdev->p2pdma->p2pmem_published);
+}
+static DEVICE_ATTR_RO(published);
+
+static struct attribute *p2pmem_attrs[] = {
+	&dev_attr_size.attr,
+	&dev_attr_available.attr,
+	&dev_attr_published.attr,
+	NULL,
+};
+
+static const struct attribute_group p2pmem_group = {
+	.attrs = p2pmem_attrs,
+	.name = "p2pmem",
+};
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
 	struct pci_p2pdma *p2p =
@@ -54,6 +102,7 @@ static void pci_p2pdma_release(void *data)
 	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
 
 	gen_pool_destroy(pdev->p2pdma->pool);
+	sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
 	pdev->p2pdma = NULL;
 }
 
@@ -84,9 +133,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 
 	pdev->p2pdma = p2p;
 
+	error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
+	if (error)
+		goto out_pool_destroy;
+
 	return 0;
 
 out_pool_destroy:
+	pdev->p2pdma = NULL;
 	gen_pool_destroy(p2p->pool);
 out:
 	devm_kfree(&pdev->dev, p2p);
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.

Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 Documentation/ABI/testing/sysfs-bus-pci | 25 +++++++++++++++
 drivers/pci/p2pdma.c                    | 54 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 44d4b2be92fd..044812c816d0 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -323,3 +323,28 @@ Description:
 
 		This is similar to /sys/bus/pci/drivers_autoprobe, but
 		affects only the VFs associated with a specific PF.
+
+What:		/sys/bus/pci/devices/.../p2pmem/available
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the amount of memory that has not been
+		allocated (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/size
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the total amount of memory that the device
+		provides (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/published
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains a '1' if the memory has been published for
+		use inside the kernel or a '0' if it is only intended
+		for use within the driver that published it.
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 88aaec5351cd..67c1daf1189e 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -25,6 +25,54 @@ struct pci_p2pdma {
 	bool p2pmem_published;
 };
 
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t size = 0;
+
+	if (pdev->p2pdma->pool)
+		size = gen_pool_size(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t available_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t avail = 0;
+
+	if (pdev->p2pdma->pool)
+		avail = gen_pool_avail(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", avail);
+}
+static DEVICE_ATTR_RO(available);
+
+static ssize_t published_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			pdev->p2pdma->p2pmem_published);
+}
+static DEVICE_ATTR_RO(published);
+
+static struct attribute *p2pmem_attrs[] = {
+	&dev_attr_size.attr,
+	&dev_attr_available.attr,
+	&dev_attr_published.attr,
+	NULL,
+};
+
+static const struct attribute_group p2pmem_group = {
+	.attrs = p2pmem_attrs,
+	.name = "p2pmem",
+};
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
 	struct pci_p2pdma *p2p =
@@ -54,6 +102,7 @@ static void pci_p2pdma_release(void *data)
 	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
 
 	gen_pool_destroy(pdev->p2pdma->pool);
+	sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
 	pdev->p2pdma = NULL;
 }
 
@@ -84,9 +133,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 
 	pdev->p2pdma = p2p;
 
+	error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
+	if (error)
+		goto out_pool_destroy;
+
 	return 0;
 
 out_pool_destroy:
+	pdev->p2pdma = NULL;
 	gen_pool_destroy(p2p->pool);
 out:
 	devm_kfree(&pdev->dev, p2p);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.

Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
---
 Documentation/ABI/testing/sysfs-bus-pci | 25 +++++++++++++++
 drivers/pci/p2pdma.c                    | 54 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 44d4b2be92fd..044812c816d0 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -323,3 +323,28 @@ Description:
 
 		This is similar to /sys/bus/pci/drivers_autoprobe, but
 		affects only the VFs associated with a specific PF.
+
+What:		/sys/bus/pci/devices/.../p2pmem/available
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the amount of memory that has not been
+		allocated (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/size
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the total amount of memory that the device
+		provides (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/published
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains a '1' if the memory has been published for
+		use inside the kernel or a '0' if it is only intended
+		for use within the driver that published it.
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 88aaec5351cd..67c1daf1189e 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -25,6 +25,54 @@ struct pci_p2pdma {
 	bool p2pmem_published;
 };
 
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t size = 0;
+
+	if (pdev->p2pdma->pool)
+		size = gen_pool_size(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t available_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t avail = 0;
+
+	if (pdev->p2pdma->pool)
+		avail = gen_pool_avail(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", avail);
+}
+static DEVICE_ATTR_RO(available);
+
+static ssize_t published_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			pdev->p2pdma->p2pmem_published);
+}
+static DEVICE_ATTR_RO(published);
+
+static struct attribute *p2pmem_attrs[] = {
+	&dev_attr_size.attr,
+	&dev_attr_available.attr,
+	&dev_attr_published.attr,
+	NULL,
+};
+
+static const struct attribute_group p2pmem_group = {
+	.attrs = p2pmem_attrs,
+	.name = "p2pmem",
+};
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
 	struct pci_p2pdma *p2p =
@@ -54,6 +102,7 @@ static void pci_p2pdma_release(void *data)
 	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
 
 	gen_pool_destroy(pdev->p2pdma->pool);
+	sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
 	pdev->p2pdma = NULL;
 }
 
@@ -84,9 +133,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 
 	pdev->p2pdma = p2p;
 
+	error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
+	if (error)
+		goto out_pool_destroy;
+
 	return 0;
 
 out_pool_destroy:
+	pdev->p2pdma = NULL;
 	gen_pool_destroy(p2p->pool);
 out:
 	devm_kfree(&pdev->dev, p2p);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.

Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 Documentation/ABI/testing/sysfs-bus-pci | 25 +++++++++++++++
 drivers/pci/p2pdma.c                    | 54 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 44d4b2be92fd..044812c816d0 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -323,3 +323,28 @@ Description:
 
 		This is similar to /sys/bus/pci/drivers_autoprobe, but
 		affects only the VFs associated with a specific PF.
+
+What:		/sys/bus/pci/devices/.../p2pmem/available
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang at deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the amount of memory that has not been
+		allocated (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/size
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang at deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains the total amount of memory that the device
+		provides (in decimal).
+
+What:		/sys/bus/pci/devices/.../p2pmem/published
+Date:		November 2017
+Contact:	Logan Gunthorpe <logang at deltatee.com>
+Description:
+		If the device has any Peer-to-Peer memory registered, this
+	        file contains a '1' if the memory has been published for
+		use inside the kernel or a '0' if it is only intended
+		for use within the driver that published it.
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 88aaec5351cd..67c1daf1189e 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -25,6 +25,54 @@ struct pci_p2pdma {
 	bool p2pmem_published;
 };
 
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t size = 0;
+
+	if (pdev->p2pdma->pool)
+		size = gen_pool_size(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t available_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	size_t avail = 0;
+
+	if (pdev->p2pdma->pool)
+		avail = gen_pool_avail(pdev->p2pdma->pool);
+
+	return snprintf(buf, PAGE_SIZE, "%zd\n", avail);
+}
+static DEVICE_ATTR_RO(available);
+
+static ssize_t published_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			pdev->p2pdma->p2pmem_published);
+}
+static DEVICE_ATTR_RO(published);
+
+static struct attribute *p2pmem_attrs[] = {
+	&dev_attr_size.attr,
+	&dev_attr_available.attr,
+	&dev_attr_published.attr,
+	NULL,
+};
+
+static const struct attribute_group p2pmem_group = {
+	.attrs = p2pmem_attrs,
+	.name = "p2pmem",
+};
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
 	struct pci_p2pdma *p2p =
@@ -54,6 +102,7 @@ static void pci_p2pdma_release(void *data)
 	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
 
 	gen_pool_destroy(pdev->p2pdma->pool);
+	sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
 	pdev->p2pdma = NULL;
 }
 
@@ -84,9 +133,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 
 	pdev->p2pdma = p2p;
 
+	error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
+	if (error)
+		goto out_pool_destroy;
+
 	return 0;
 
 out_pool_destroy:
+	pdev->p2pdma = NULL;
 	gen_pool_destroy(p2p->pool);
 out:
 	devm_kfree(&pdev->dev, p2p);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct
addresses when using P2P memory. Memory mapped in this way does not
need to be unmapped.

For this, we assume that an SGL passed to these functions contain all
P2P memory or no P2P memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |  1 +
 include/linux/pci-p2pdma.h |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 67c1daf1189e..29bd40a87768 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -191,6 +191,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 	pgmap->res.flags = pci_resource_flags(pdev, bar);
 	pgmap->ref = &pdev->p2pdma->devmap_ref;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+	pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
+		pci_resource_start(pdev, bar);
 
 	addr = devm_memremap_pages(&pdev->dev, pgmap);
 	if (IS_ERR(addr)) {
@@ -813,3 +815,44 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 	pdev->p2pdma->p2pmem_published = publish;
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
+
+/**
+ * pci_p2pdma_map_sg - map a PCI peer-to-peer scatterlist for DMA
+ * @dev: device doing the DMA request
+ * @sg: scatter list to map
+ * @nents: elements in the scatterlist
+ * @dir: DMA direction
+ *
+ * Scatterlists mapped with this function should not be unmapped in any way.
+ *
+ * Returns the number of SG entries mapped or 0 on error.
+ */
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir)
+{
+	struct dev_pagemap *pgmap;
+	struct scatterlist *s;
+	phys_addr_t paddr;
+	int i;
+
+	/*
+	 * p2pdma mappings are not compatible with devices that use
+	 * dma_virt_ops. If the upper layers do the right thing
+	 * this should never happen because it will be prevented
+	 * by the check in pci_p2pdma_add_client()
+	 */
+	if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
+			 dev->dma_ops == &dma_virt_ops))
+		return 0;
+
+	for_each_sg(sg, s, nents, i) {
+		pgmap = sg_page(s)->pgmap;
+		paddr = sg_phys(s);
+
+		s->dma_address = paddr - pgmap->pci_p2pdma_bus_offset;
+		sg_dma_len(s) = s->length;
+	}
+
+	return nents;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 9553370ebdad..0ac69ddf5fc4 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -125,6 +125,7 @@ struct dev_pagemap {
 	struct device *dev;
 	void *data;
 	enum memory_type type;
+	u64 pci_p2pdma_bus_offset;
 };
 
 #ifdef CONFIG_ZONE_DEVICE
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 7b2b0f547528..2f03dbbf5af6 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -36,6 +36,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 					 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -98,5 +100,10 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline int pci_p2pdma_map_sg(struct device *dev,
+		struct scatterlist *sg, int nents, enum dma_data_direction dir)
+{
+	return 0;
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct
addresses when using P2P memory. Memory mapped in this way does not
need to be unmapped.

For this, we assume that an SGL passed to these functions contain all
P2P memory or no P2P memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |  1 +
 include/linux/pci-p2pdma.h |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 67c1daf1189e..29bd40a87768 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -191,6 +191,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 	pgmap->res.flags = pci_resource_flags(pdev, bar);
 	pgmap->ref = &pdev->p2pdma->devmap_ref;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+	pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
+		pci_resource_start(pdev, bar);
 
 	addr = devm_memremap_pages(&pdev->dev, pgmap);
 	if (IS_ERR(addr)) {
@@ -813,3 +815,44 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 	pdev->p2pdma->p2pmem_published = publish;
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
+
+/**
+ * pci_p2pdma_map_sg - map a PCI peer-to-peer scatterlist for DMA
+ * @dev: device doing the DMA request
+ * @sg: scatter list to map
+ * @nents: elements in the scatterlist
+ * @dir: DMA direction
+ *
+ * Scatterlists mapped with this function should not be unmapped in any way.
+ *
+ * Returns the number of SG entries mapped or 0 on error.
+ */
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir)
+{
+	struct dev_pagemap *pgmap;
+	struct scatterlist *s;
+	phys_addr_t paddr;
+	int i;
+
+	/*
+	 * p2pdma mappings are not compatible with devices that use
+	 * dma_virt_ops. If the upper layers do the right thing
+	 * this should never happen because it will be prevented
+	 * by the check in pci_p2pdma_add_client()
+	 */
+	if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
+			 dev->dma_ops == &dma_virt_ops))
+		return 0;
+
+	for_each_sg(sg, s, nents, i) {
+		pgmap = sg_page(s)->pgmap;
+		paddr = sg_phys(s);
+
+		s->dma_address = paddr - pgmap->pci_p2pdma_bus_offset;
+		sg_dma_len(s) = s->length;
+	}
+
+	return nents;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 9553370ebdad..0ac69ddf5fc4 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -125,6 +125,7 @@ struct dev_pagemap {
 	struct device *dev;
 	void *data;
 	enum memory_type type;
+	u64 pci_p2pdma_bus_offset;
 };
 
 #ifdef CONFIG_ZONE_DEVICE
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 7b2b0f547528..2f03dbbf5af6 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -36,6 +36,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 					 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -98,5 +100,10 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline int pci_p2pdma_map_sg(struct device *dev,
+		struct scatterlist *sg, int nents, enum dma_data_direction dir)
+{
+	return 0;
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct
addresses when using P2P memory. Memory mapped in this way does not
need to be unmapped.

For this, we assume that an SGL passed to these functions contain all
P2P memory or no P2P memory.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 drivers/pci/p2pdma.c       | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/memremap.h   |  1 +
 include/linux/pci-p2pdma.h |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 67c1daf1189e..29bd40a87768 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -191,6 +191,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 	pgmap->res.flags = pci_resource_flags(pdev, bar);
 	pgmap->ref = &pdev->p2pdma->devmap_ref;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+	pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
+		pci_resource_start(pdev, bar);
 
 	addr = devm_memremap_pages(&pdev->dev, pgmap);
 	if (IS_ERR(addr)) {
@@ -813,3 +815,44 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 	pdev->p2pdma->p2pmem_published = publish;
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
+
+/**
+ * pci_p2pdma_map_sg - map a PCI peer-to-peer scatterlist for DMA
+ * @dev: device doing the DMA request
+ * @sg: scatter list to map
+ * @nents: elements in the scatterlist
+ * @dir: DMA direction
+ *
+ * Scatterlists mapped with this function should not be unmapped in any way.
+ *
+ * Returns the number of SG entries mapped or 0 on error.
+ */
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir)
+{
+	struct dev_pagemap *pgmap;
+	struct scatterlist *s;
+	phys_addr_t paddr;
+	int i;
+
+	/*
+	 * p2pdma mappings are not compatible with devices that use
+	 * dma_virt_ops. If the upper layers do the right thing
+	 * this should never happen because it will be prevented
+	 * by the check in pci_p2pdma_add_client()
+	 */
+	if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
+			 dev->dma_ops == &dma_virt_ops))
+		return 0;
+
+	for_each_sg(sg, s, nents, i) {
+		pgmap = sg_page(s)->pgmap;
+		paddr = sg_phys(s);
+
+		s->dma_address = paddr - pgmap->pci_p2pdma_bus_offset;
+		sg_dma_len(s) = s->length;
+	}
+
+	return nents;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 9553370ebdad..0ac69ddf5fc4 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -125,6 +125,7 @@ struct dev_pagemap {
 	struct device *dev;
 	void *data;
 	enum memory_type type;
+	u64 pci_p2pdma_bus_offset;
 };
 
 #ifdef CONFIG_ZONE_DEVICE
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 7b2b0f547528..2f03dbbf5af6 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -36,6 +36,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 					 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction dir);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -98,5 +100,10 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline int pci_p2pdma_map_sg(struct device *dev,
+		struct scatterlist *sg, int nents, enum dma_data_direction dir)
+{
+	return 0;
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.

Add new helpers for attributes which take a boolean or a PCI device.
Any boolean, or the word 'auto' turn P2P on or off. Specifying a full
PCI device name/BDF will select the specific device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-p2pdma.h | 15 +++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 29bd40a87768..3da848f3fe72 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -9,6 +9,7 @@
  */
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/ctype.h>
 #include <linux/pci-p2pdma.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -856,3 +857,85 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 	return nents;
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
+
+/**
+ * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
+ *		to enable p2pdma
+ * @page: contents of the value to be stored
+ * @p2p_dev: returns the PCI device that was selected to be used
+ *		(if 'auto', 'none or a boolean isn't the store value)
+ * @use_p2pdma: returns whether to enable p2pdma or not
+ *
+ * Parses an attribute value to decide whether to enable p2pdma.
+ * The value can select a PCI device (using it's full BDF device
+ * name), a boolean, or 'auto'. 'auto' and a true boolean value
+ * have the same meaning. A false value disables p2pdma and
+ * a PCI device enables it to use a specific device as the
+ * backing provider.
+ *
+ * pci_p2pdma_enable_show() should be used as the show operation for
+ * the attribute.
+ *
+ * Returns 0 on success
+ */
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma)
+{
+	struct device *dev;
+
+	dev = bus_find_device_by_name(&pci_bus_type, NULL, page);
+	if (dev) {
+		*use_p2pdma = true;
+		*p2p_dev = to_pci_dev(dev);
+
+		if (!pci_has_p2pmem(*p2p_dev)) {
+			pr_err("PCI device has no peer-to-peer memory: %s\n",
+			       page);
+			pci_dev_put(*p2p_dev);
+			return -ENODEV;
+		}
+
+		return 0;
+	} else if (sysfs_streq(page, "auto")) {
+		*use_p2pdma = true;
+		return 0;
+	} else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) {
+		/*
+		 * If the user enters a PCI device that  doesn't exist
+		 * like "0000:01:00.1", we don't want strtobool to think
+		 * it's a '0' when it's clearly not what the user wanted.
+		 * So we require 0's and 1's to be exactly one character.
+		 */
+	} else if (!strtobool(page, use_p2pdma)) {
+		return 0;
+	}
+
+	pr_err("No such PCI device: %.*s\n", (int)strcspn(page, "\n"), page);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_store);
+
+/**
+ * pci_p2pdma_enable_show - show a configfs/sysfs attribute indicating
+ *		whether p2pdma is enabled
+ * @page: contents of the stored value
+ * @p2p_dev: the selected p2p device (NULL if no device is selected)
+ * @use_p2pdma: whether p2pdme has been enabled
+ *
+ * Attributes that use pci_p2pdma_enable_store() should use this function
+ * to show the value of the attribute.
+ *
+ * Returns 0 on success
+ */
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma)
+{
+	if (!use_p2pdma)
+		return sprintf(page, "none\n");
+
+	if (!p2p_dev)
+		return sprintf(page, "auto\n");
+
+	return sprintf(page, "%s\n", pci_name(p2p_dev));
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 2f03dbbf5af6..377de4d73767 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,10 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		      enum dma_data_direction dir);
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma);
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -105,5 +109,16 @@ static inline int pci_p2pdma_map_sg(struct device *dev,
 {
 	return 0;
 }
+static inline int pci_p2pdma_enable_store(const char *page,
+		struct pci_dev **p2p_dev, bool *use_p2pdma)
+{
+	*use_p2pdma = false;
+	return 0;
+}
+static inline ssize_t pci_p2pdma_enable_show(char *page,
+		struct pci_dev *p2p_dev, bool use_p2pdma)
+{
+	return sprintf(page, "none\n");
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.

Add new helpers for attributes which take a boolean or a PCI device.
Any boolean, or the word 'auto' turn P2P on or off. Specifying a full
PCI device name/BDF will select the specific device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-p2pdma.h | 15 +++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 29bd40a87768..3da848f3fe72 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -9,6 +9,7 @@
  */
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/ctype.h>
 #include <linux/pci-p2pdma.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -856,3 +857,85 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 	return nents;
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
+
+/**
+ * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
+ *		to enable p2pdma
+ * @page: contents of the value to be stored
+ * @p2p_dev: returns the PCI device that was selected to be used
+ *		(if 'auto', 'none or a boolean isn't the store value)
+ * @use_p2pdma: returns whether to enable p2pdma or not
+ *
+ * Parses an attribute value to decide whether to enable p2pdma.
+ * The value can select a PCI device (using it's full BDF device
+ * name), a boolean, or 'auto'. 'auto' and a true boolean value
+ * have the same meaning. A false value disables p2pdma and
+ * a PCI device enables it to use a specific device as the
+ * backing provider.
+ *
+ * pci_p2pdma_enable_show() should be used as the show operation for
+ * the attribute.
+ *
+ * Returns 0 on success
+ */
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma)
+{
+	struct device *dev;
+
+	dev = bus_find_device_by_name(&pci_bus_type, NULL, page);
+	if (dev) {
+		*use_p2pdma = true;
+		*p2p_dev = to_pci_dev(dev);
+
+		if (!pci_has_p2pmem(*p2p_dev)) {
+			pr_err("PCI device has no peer-to-peer memory: %s\n",
+			       page);
+			pci_dev_put(*p2p_dev);
+			return -ENODEV;
+		}
+
+		return 0;
+	} else if (sysfs_streq(page, "auto")) {
+		*use_p2pdma = true;
+		return 0;
+	} else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) {
+		/*
+		 * If the user enters a PCI device that  doesn't exist
+		 * like "0000:01:00.1", we don't want strtobool to think
+		 * it's a '0' when it's clearly not what the user wanted.
+		 * So we require 0's and 1's to be exactly one character.
+		 */
+	} else if (!strtobool(page, use_p2pdma)) {
+		return 0;
+	}
+
+	pr_err("No such PCI device: %.*s\n", (int)strcspn(page, "\n"), page);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_store);
+
+/**
+ * pci_p2pdma_enable_show - show a configfs/sysfs attribute indicating
+ *		whether p2pdma is enabled
+ * @page: contents of the stored value
+ * @p2p_dev: the selected p2p device (NULL if no device is selected)
+ * @use_p2pdma: whether p2pdme has been enabled
+ *
+ * Attributes that use pci_p2pdma_enable_store() should use this function
+ * to show the value of the attribute.
+ *
+ * Returns 0 on success
+ */
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma)
+{
+	if (!use_p2pdma)
+		return sprintf(page, "none\n");
+
+	if (!p2p_dev)
+		return sprintf(page, "auto\n");
+
+	return sprintf(page, "%s\n", pci_name(p2p_dev));
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 2f03dbbf5af6..377de4d73767 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,10 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		      enum dma_data_direction dir);
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma);
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -105,5 +109,16 @@ static inline int pci_p2pdma_map_sg(struct device *dev,
 {
 	return 0;
 }
+static inline int pci_p2pdma_enable_store(const char *page,
+		struct pci_dev **p2p_dev, bool *use_p2pdma)
+{
+	*use_p2pdma = false;
+	return 0;
+}
+static inline ssize_t pci_p2pdma_enable_show(char *page,
+		struct pci_dev *p2p_dev, bool use_p2pdma)
+{
+	return sprintf(page, "none\n");
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.

Add new helpers for attributes which take a boolean or a PCI device.
Any boolean, or the word 'auto' turn P2P on or off. Specifying a full
PCI device name/BDF will select the specific device.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
---
 drivers/pci/p2pdma.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-p2pdma.h | 15 +++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 29bd40a87768..3da848f3fe72 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -9,6 +9,7 @@
  */
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/ctype.h>
 #include <linux/pci-p2pdma.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -856,3 +857,85 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 	return nents;
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
+
+/**
+ * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
+ *		to enable p2pdma
+ * @page: contents of the value to be stored
+ * @p2p_dev: returns the PCI device that was selected to be used
+ *		(if 'auto', 'none or a boolean isn't the store value)
+ * @use_p2pdma: returns whether to enable p2pdma or not
+ *
+ * Parses an attribute value to decide whether to enable p2pdma.
+ * The value can select a PCI device (using it's full BDF device
+ * name), a boolean, or 'auto'. 'auto' and a true boolean value
+ * have the same meaning. A false value disables p2pdma and
+ * a PCI device enables it to use a specific device as the
+ * backing provider.
+ *
+ * pci_p2pdma_enable_show() should be used as the show operation for
+ * the attribute.
+ *
+ * Returns 0 on success
+ */
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma)
+{
+	struct device *dev;
+
+	dev = bus_find_device_by_name(&pci_bus_type, NULL, page);
+	if (dev) {
+		*use_p2pdma = true;
+		*p2p_dev = to_pci_dev(dev);
+
+		if (!pci_has_p2pmem(*p2p_dev)) {
+			pr_err("PCI device has no peer-to-peer memory: %s\n",
+			       page);
+			pci_dev_put(*p2p_dev);
+			return -ENODEV;
+		}
+
+		return 0;
+	} else if (sysfs_streq(page, "auto")) {
+		*use_p2pdma = true;
+		return 0;
+	} else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) {
+		/*
+		 * If the user enters a PCI device that  doesn't exist
+		 * like "0000:01:00.1", we don't want strtobool to think
+		 * it's a '0' when it's clearly not what the user wanted.
+		 * So we require 0's and 1's to be exactly one character.
+		 */
+	} else if (!strtobool(page, use_p2pdma)) {
+		return 0;
+	}
+
+	pr_err("No such PCI device: %.*s\n", (int)strcspn(page, "\n"), page);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_store);
+
+/**
+ * pci_p2pdma_enable_show - show a configfs/sysfs attribute indicating
+ *		whether p2pdma is enabled
+ * @page: contents of the stored value
+ * @p2p_dev: the selected p2p device (NULL if no device is selected)
+ * @use_p2pdma: whether p2pdme has been enabled
+ *
+ * Attributes that use pci_p2pdma_enable_store() should use this function
+ * to show the value of the attribute.
+ *
+ * Returns 0 on success
+ */
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma)
+{
+	if (!use_p2pdma)
+		return sprintf(page, "none\n");
+
+	if (!p2p_dev)
+		return sprintf(page, "auto\n");
+
+	return sprintf(page, "%s\n", pci_name(p2p_dev));
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 2f03dbbf5af6..377de4d73767 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,10 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		      enum dma_data_direction dir);
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma);
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -105,5 +109,16 @@ static inline int pci_p2pdma_map_sg(struct device *dev,
 {
 	return 0;
 }
+static inline int pci_p2pdma_enable_store(const char *page,
+		struct pci_dev **p2p_dev, bool *use_p2pdma)
+{
+	*use_p2pdma = false;
+	return 0;
+}
+static inline ssize_t pci_p2pdma_enable_show(char *page,
+		struct pci_dev *p2p_dev, bool use_p2pdma)
+{
+	return sprintf(page, "none\n");
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.

Add new helpers for attributes which take a boolean or a PCI device.
Any boolean, or the word 'auto' turn P2P on or off. Specifying a full
PCI device name/BDF will select the specific device.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 drivers/pci/p2pdma.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-p2pdma.h | 15 +++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 29bd40a87768..3da848f3fe72 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -9,6 +9,7 @@
  */
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
+#include <linux/ctype.h>
 #include <linux/pci-p2pdma.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -856,3 +857,85 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 	return nents;
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
+
+/**
+ * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
+ *		to enable p2pdma
+ * @page: contents of the value to be stored
+ * @p2p_dev: returns the PCI device that was selected to be used
+ *		(if 'auto', 'none or a boolean isn't the store value)
+ * @use_p2pdma: returns whether to enable p2pdma or not
+ *
+ * Parses an attribute value to decide whether to enable p2pdma.
+ * The value can select a PCI device (using it's full BDF device
+ * name), a boolean, or 'auto'. 'auto' and a true boolean value
+ * have the same meaning. A false value disables p2pdma and
+ * a PCI device enables it to use a specific device as the
+ * backing provider.
+ *
+ * pci_p2pdma_enable_show() should be used as the show operation for
+ * the attribute.
+ *
+ * Returns 0 on success
+ */
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma)
+{
+	struct device *dev;
+
+	dev = bus_find_device_by_name(&pci_bus_type, NULL, page);
+	if (dev) {
+		*use_p2pdma = true;
+		*p2p_dev = to_pci_dev(dev);
+
+		if (!pci_has_p2pmem(*p2p_dev)) {
+			pr_err("PCI device has no peer-to-peer memory: %s\n",
+			       page);
+			pci_dev_put(*p2p_dev);
+			return -ENODEV;
+		}
+
+		return 0;
+	} else if (sysfs_streq(page, "auto")) {
+		*use_p2pdma = true;
+		return 0;
+	} else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) {
+		/*
+		 * If the user enters a PCI device that  doesn't exist
+		 * like "0000:01:00.1", we don't want strtobool to think
+		 * it's a '0' when it's clearly not what the user wanted.
+		 * So we require 0's and 1's to be exactly one character.
+		 */
+	} else if (!strtobool(page, use_p2pdma)) {
+		return 0;
+	}
+
+	pr_err("No such PCI device: %.*s\n", (int)strcspn(page, "\n"), page);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_store);
+
+/**
+ * pci_p2pdma_enable_show - show a configfs/sysfs attribute indicating
+ *		whether p2pdma is enabled
+ * @page: contents of the stored value
+ * @p2p_dev: the selected p2p device (NULL if no device is selected)
+ * @use_p2pdma: whether p2pdme has been enabled
+ *
+ * Attributes that use pci_p2pdma_enable_store() should use this function
+ * to show the value of the attribute.
+ *
+ * Returns 0 on success
+ */
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma)
+{
+	if (!use_p2pdma)
+		return sprintf(page, "none\n");
+
+	if (!p2p_dev)
+		return sprintf(page, "auto\n");
+
+	return sprintf(page, "%s\n", pci_name(p2p_dev));
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 2f03dbbf5af6..377de4d73767 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,10 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		      enum dma_data_direction dir);
+int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
+			    bool *use_p2pdma);
+ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
+			       bool use_p2pdma);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -105,5 +109,16 @@ static inline int pci_p2pdma_map_sg(struct device *dev,
 {
 	return 0;
 }
+static inline int pci_p2pdma_enable_store(const char *page,
+		struct pci_dev **p2p_dev, bool *use_p2pdma)
+{
+	*use_p2pdma = false;
+	return 0;
+}
+static inline ssize_t pci_p2pdma_enable_show(char *page,
+		struct pci_dev *p2p_dev, bool use_p2pdma)
+{
+	return sprintf(page, "none\n");
+}
 #endif /* CONFIG_PCI_P2PDMA */
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Sanyog Kale, Thierry Reding, Christian König, Vinod Koul,
	Benjamin Herrenschmidt, Linus Walleij, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Sagar Dharia, Greg Kroah-Hartman, Bjorn Helgaas, Max Gurtovoy,
	Mauro Carvalho Chehab, Christoph Hellwig

Add a new directory in the driver API guide for PCI specific
documentation.

This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Sanyog Kale <sanyog.r.kale@intel.com>
Cc: Sagar Dharia <sdharia@codeaurora.org>
---
 Documentation/driver-api/index.rst         |  2 +-
 Documentation/driver-api/pci/index.rst     | 20 ++++++++++++++++++++
 Documentation/driver-api/{ => pci}/pci.rst |  0
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6d9f2f9fe20e..e9e7d24169cf 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -29,7 +29,7 @@ available subsections can be seen below.
    iio/index
    input
    usb/index
-   pci
+   pci/index
    spi
    i2c
    hsi
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
index 000000000000..eaf20b24bf7d
--- /dev/null
+++ b/Documentation/driver-api/pci/index.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+============================================
+The Linux PCI driver implementer's API guide
+============================================
+
+.. class:: toc-title
+
+	   Table of contents
+
+.. toctree::
+   :maxdepth: 2
+
+   pci
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst
similarity index 100%
rename from Documentation/driver-api/pci.rst
rename to Documentation/driver-api/pci/pci.rst
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe, Jonathan Corbet,
	Mauro Carvalho Chehab, Greg Kroah-Hartman, Vinod Koul,
	Linus Walleij, Thierry Reding, Sanyog Kale, Sagar Dharia

Add a new directory in the driver API guide for PCI specific
documentation.

This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Sanyog Kale <sanyog.r.kale@intel.com>
Cc: Sagar Dharia <sdharia@codeaurora.org>
---
 Documentation/driver-api/index.rst         |  2 +-
 Documentation/driver-api/pci/index.rst     | 20 ++++++++++++++++++++
 Documentation/driver-api/{ => pci}/pci.rst |  0
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6d9f2f9fe20e..e9e7d24169cf 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -29,7 +29,7 @@ available subsections can be seen below.
    iio/index
    input
    usb/index
-   pci
+   pci/index
    spi
    i2c
    hsi
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
index 000000000000..eaf20b24bf7d
--- /dev/null
+++ b/Documentation/driver-api/pci/index.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+============================================
+The Linux PCI driver implementer's API guide
+============================================
+
+.. class:: toc-title
+
+	   Table of contents
+
+.. toctree::
+   :maxdepth: 2
+
+   pci
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst
similarity index 100%
rename from Documentation/driver-api/pci.rst
rename to Documentation/driver-api/pci/pci.rst
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Sanyog Kale, Thierry Reding, Christian König, Vinod Koul,
	Benjamin Herrenschmidt, Linus Walleij, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Sagar Dharia, Greg Kroah-Hartman, Bjorn Helgaas, Max Gurtovoy,
	Mauro Carvalho Chehab, Christoph Hellwig

Add a new directory in the driver API guide for PCI specific
documentation.

This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Cc: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
Cc: Mauro Carvalho Chehab <mchehab-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Cc: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Cc: Sanyog Kale <sanyog.r.kale-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Sagar Dharia <sdharia-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 Documentation/driver-api/index.rst         |  2 +-
 Documentation/driver-api/pci/index.rst     | 20 ++++++++++++++++++++
 Documentation/driver-api/{ => pci}/pci.rst |  0
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6d9f2f9fe20e..e9e7d24169cf 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -29,7 +29,7 @@ available subsections can be seen below.
    iio/index
    input
    usb/index
-   pci
+   pci/index
    spi
    i2c
    hsi
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
index 000000000000..eaf20b24bf7d
--- /dev/null
+++ b/Documentation/driver-api/pci/index.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+============================================
+The Linux PCI driver implementer's API guide
+============================================
+
+.. class:: toc-title
+
+	   Table of contents
+
+.. toctree::
+   :maxdepth: 2
+
+   pci
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst
similarity index 100%
rename from Documentation/driver-api/pci.rst
rename to Documentation/driver-api/pci/pci.rst
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Add a new directory in the driver API guide for PCI specific
documentation.

This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Cc: Jonathan Corbet <corbet at lwn.net>
Cc: Mauro Carvalho Chehab <mchehab at kernel.org>
Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
Cc: Vinod Koul <vinod.koul at intel.com>
Cc: Linus Walleij <linus.walleij at linaro.org>
Cc: Logan Gunthorpe <logang at deltatee.com>
Cc: Thierry Reding <treding at nvidia.com>
Cc: Sanyog Kale <sanyog.r.kale at intel.com>
Cc: Sagar Dharia <sdharia at codeaurora.org>
---
 Documentation/driver-api/index.rst         |  2 +-
 Documentation/driver-api/pci/index.rst     | 20 ++++++++++++++++++++
 Documentation/driver-api/{ => pci}/pci.rst |  0
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/driver-api/pci/index.rst
 rename Documentation/driver-api/{ => pci}/pci.rst (100%)

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6d9f2f9fe20e..e9e7d24169cf 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -29,7 +29,7 @@ available subsections can be seen below.
    iio/index
    input
    usb/index
-   pci
+   pci/index
    spi
    i2c
    hsi
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
index 000000000000..eaf20b24bf7d
--- /dev/null
+++ b/Documentation/driver-api/pci/index.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+============================================
+The Linux PCI driver implementer's API guide
+============================================
+
+.. class:: toc-title
+
+	   Table of contents
+
+.. toctree::
+   :maxdepth: 2
+
+   pci
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst
similarity index 100%
rename from Documentation/driver-api/pci.rst
rename to Documentation/driver-api/pci/pci.rst
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-30 18:53 ` Logan Gunthorpe
                     ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.

Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been converted to restructured text
at this time.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/pci/index.rst  |   1 +
 Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst

diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index eaf20b24bf7d..ecc7416c523b 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -11,6 +11,7 @@ The Linux PCI driver implementer's API guide
    :maxdepth: 2
 
    pci
+   p2pdma
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..ac857450d53f
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,170 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transacations within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use the CMB as
+  submission queue entries.
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately (for example the
+block layer uses a flag to keep P2P memory out of queues that do not
+have P2P client support). It is important to ensure that struct pages that
+back P2P memory stay out of code that does not have support for them.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including all NVMe
+devices and the RNIC in use. The list is stored as an anonymous struct
+list_head which must be initialized with the usual INIT_LIST_HEAD.
+The following functions may then be used to add to, remove from and free
+the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
+:c:func:`pci_p2pdma_remove_client()` and
+:c:func:`pci_p2pdma_client_list_free()`.
+
+With the client list in hand, the orchestrator may then call
+:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
+that is supported (behind the same root port) as all the clients. If more
+than one provider is supported, the one nearest to all the clients will
+be chosen first. If there are more than one provider is an equal distance
+away, the one returned will be chosen at random. This function returns the PCI
+device to use for the provider with a reference taken and therefore
+when it's no longer needed it should be returned with pci_dev_put().
+
+Alternatively, if the orchestrator knows (via some other means)
+which provider it wants to use it may use :c:func:`pci_has_p2pmem()`
+to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()`
+to determine the cumulative distance between it and a potential
+list of clients.
+
+With a supported provider in hand, the driver can then call
+:c:func:`pci_p2pdma_assign_provider()` to assign the provider
+to the client list. This function returns false if any of the
+clients are unsupported by the provider.
+
+Once a provider is assigned to a client list via either
+:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`,
+the list is permanently bound to the provider such that any new clients
+added to the list must be supported by the already selected provider.
+If they are not supported, :c:func:`pci_p2pdma_add_client()` will return
+an error. In this way, orchestrators are free to add and remove devices
+without having to recheck support or tear down existing transfers to
+change P2P providers.
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=====================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe, Jonathan Corbet

Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.

Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been converted to restructured text
at this time.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/pci/index.rst  |   1 +
 Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst

diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index eaf20b24bf7d..ecc7416c523b 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -11,6 +11,7 @@ The Linux PCI driver implementer's API guide
    :maxdepth: 2
 
    pci
+   p2pdma
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..ac857450d53f
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,170 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transacations within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use the CMB as
+  submission queue entries.
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately (for example the
+block layer uses a flag to keep P2P memory out of queues that do not
+have P2P client support). It is important to ensure that struct pages that
+back P2P memory stay out of code that does not have support for them.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including all NVMe
+devices and the RNIC in use. The list is stored as an anonymous struct
+list_head which must be initialized with the usual INIT_LIST_HEAD.
+The following functions may then be used to add to, remove from and free
+the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
+:c:func:`pci_p2pdma_remove_client()` and
+:c:func:`pci_p2pdma_client_list_free()`.
+
+With the client list in hand, the orchestrator may then call
+:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
+that is supported (behind the same root port) as all the clients. If more
+than one provider is supported, the one nearest to all the clients will
+be chosen first. If there are more than one provider is an equal distance
+away, the one returned will be chosen at random. This function returns the PCI
+device to use for the provider with a reference taken and therefore
+when it's no longer needed it should be returned with pci_dev_put().
+
+Alternatively, if the orchestrator knows (via some other means)
+which provider it wants to use it may use :c:func:`pci_has_p2pmem()`
+to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()`
+to determine the cumulative distance between it and a potential
+list of clients.
+
+With a supported provider in hand, the driver can then call
+:c:func:`pci_p2pdma_assign_provider()` to assign the provider
+to the client list. This function returns false if any of the
+clients are unsupported by the provider.
+
+Once a provider is assigned to a client list via either
+:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`,
+the list is permanently bound to the provider such that any new clients
+added to the list must be supported by the already selected provider.
+If they are not supported, :c:func:`pci_p2pdma_add_client()` will return
+an error. In this way, orchestrators are free to add and remove devices
+without having to recheck support or tear down existing transfers to
+change P2P providers.
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=====================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.

Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been converted to restructured text
at this time.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Cc: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
---
 Documentation/driver-api/pci/index.rst  |   1 +
 Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst

diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index eaf20b24bf7d..ecc7416c523b 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -11,6 +11,7 @@ The Linux PCI driver implementer's API guide
    :maxdepth: 2
 
    pci
+   p2pdma
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..ac857450d53f
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,170 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transacations within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use the CMB as
+  submission queue entries.
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately (for example the
+block layer uses a flag to keep P2P memory out of queues that do not
+have P2P client support). It is important to ensure that struct pages that
+back P2P memory stay out of code that does not have support for them.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including all NVMe
+devices and the RNIC in use. The list is stored as an anonymous struct
+list_head which must be initialized with the usual INIT_LIST_HEAD.
+The following functions may then be used to add to, remove from and free
+the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
+:c:func:`pci_p2pdma_remove_client()` and
+:c:func:`pci_p2pdma_client_list_free()`.
+
+With the client list in hand, the orchestrator may then call
+:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
+that is supported (behind the same root port) as all the clients. If more
+than one provider is supported, the one nearest to all the clients will
+be chosen first. If there are more than one provider is an equal distance
+away, the one returned will be chosen at random. This function returns the PCI
+device to use for the provider with a reference taken and therefore
+when it's no longer needed it should be returned with pci_dev_put().
+
+Alternatively, if the orchestrator knows (via some other means)
+which provider it wants to use it may use :c:func:`pci_has_p2pmem()`
+to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()`
+to determine the cumulative distance between it and a potential
+list of clients.
+
+With a supported provider in hand, the driver can then call
+:c:func:`pci_p2pdma_assign_provider()` to assign the provider
+to the client list. This function returns false if any of the
+clients are unsupported by the provider.
+
+Once a provider is assigned to a client list via either
+:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`,
+the list is permanently bound to the provider such that any new clients
+added to the list must be supported by the already selected provider.
+If they are not supported, :c:func:`pci_p2pdma_add_client()` will return
+an error. In this way, orchestrators are free to add and remove devices
+without having to recheck support or tear down existing transfers to
+change P2P providers.
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=====================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Jonathan Corbet, Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.

Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been converted to restructured text
at this time.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/pci/index.rst  |   1 +
 Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst

diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index eaf20b24bf7d..ecc7416c523b 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -11,6 +11,7 @@ The Linux PCI driver implementer's API guide
    :maxdepth: 2
 
    pci
+   p2pdma
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..ac857450d53f
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,170 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transacations within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use the CMB as
+  submission queue entries.
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately (for example the
+block layer uses a flag to keep P2P memory out of queues that do not
+have P2P client support). It is important to ensure that struct pages that
+back P2P memory stay out of code that does not have support for them.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including all NVMe
+devices and the RNIC in use. The list is stored as an anonymous struct
+list_head which must be initialized with the usual INIT_LIST_HEAD.
+The following functions may then be used to add to, remove from and free
+the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
+:c:func:`pci_p2pdma_remove_client()` and
+:c:func:`pci_p2pdma_client_list_free()`.
+
+With the client list in hand, the orchestrator may then call
+:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
+that is supported (behind the same root port) as all the clients. If more
+than one provider is supported, the one nearest to all the clients will
+be chosen first. If there are more than one provider is an equal distance
+away, the one returned will be chosen at random. This function returns the PCI
+device to use for the provider with a reference taken and therefore
+when it's no longer needed it should be returned with pci_dev_put().
+
+Alternatively, if the orchestrator knows (via some other means)
+which provider it wants to use it may use :c:func:`pci_has_p2pmem()`
+to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()`
+to determine the cumulative distance between it and a potential
+list of clients.
+
+With a supported provider in hand, the driver can then call
+:c:func:`pci_p2pdma_assign_provider()` to assign the provider
+to the client list. This function returns false if any of the
+clients are unsupported by the provider.
+
+Once a provider is assigned to a client list via either
+:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`,
+the list is permanently bound to the provider such that any new clients
+added to the list must be supported by the already selected provider.
+If they are not supported, :c:func:`pci_p2pdma_add_client()` will return
+an error. In this way, orchestrators are free to add and remove devices
+without having to recheck support or tear down existing transfers to
+change P2P providers.
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=====================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
-- 
2.11.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.

Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been converted to restructured text
at this time.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Cc: Jonathan Corbet <corbet at lwn.net>
---
 Documentation/driver-api/pci/index.rst  |   1 +
 Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 Documentation/driver-api/pci/p2pdma.rst

diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index eaf20b24bf7d..ecc7416c523b 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -11,6 +11,7 @@ The Linux PCI driver implementer's API guide
    :maxdepth: 2
 
    pci
+   p2pdma
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..ac857450d53f
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,170 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transacations within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use the CMB as
+  submission queue entries.
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately (for example the
+block layer uses a flag to keep P2P memory out of queues that do not
+have P2P client support). It is important to ensure that struct pages that
+back P2P memory stay out of code that does not have support for them.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including all NVMe
+devices and the RNIC in use. The list is stored as an anonymous struct
+list_head which must be initialized with the usual INIT_LIST_HEAD.
+The following functions may then be used to add to, remove from and free
+the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
+:c:func:`pci_p2pdma_remove_client()` and
+:c:func:`pci_p2pdma_client_list_free()`.
+
+With the client list in hand, the orchestrator may then call
+:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
+that is supported (behind the same root port) as all the clients. If more
+than one provider is supported, the one nearest to all the clients will
+be chosen first. If there are more than one provider is an equal distance
+away, the one returned will be chosen at random. This function returns the PCI
+device to use for the provider with a reference taken and therefore
+when it's no longer needed it should be returned with pci_dev_put().
+
+Alternatively, if the orchestrator knows (via some other means)
+which provider it wants to use it may use :c:func:`pci_has_p2pmem()`
+to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()`
+to determine the cumulative distance between it and a potential
+list of clients.
+
+With a supported provider in hand, the driver can then call
+:c:func:`pci_p2pdma_assign_provider()` to assign the provider
+to the client list. This function returns false if any of the
+clients are unsupported by the provider.
+
+Once a provider is assigned to a client list via either
+:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`,
+the list is permanently bound to the provider such that any new clients
+added to the list must be supported by the already selected provider.
+If they are not supported, :c:func:`pci_p2pdma_add_client()` will return
+an error. In this way, orchestrators are free to add and remove devices
+without having to recheck support or tear down existing transfers to
+change P2P providers.
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=====================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-08-30 18:53 ` Logan Gunthorpe
                     ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.

When a request is submitted we check if PCI P2PDMA memory is assigned
to the first page in the bio. If it is, we ensure the queue it's
submitted to supports it, and enforce REQ_NOMERGE.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 block/blk-core.c       | 14 ++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c282efb..cc0289c7b983 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2264,6 +2264,20 @@ generic_make_request_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
 		goto not_supported;
 
+	/*
+	 * Ensure PCI P2PDMA memory is not used in requests to queues that
+	 * have no support. This should never happen if the higher layers using
+	 * P2PDMA do the right thing and use the proper P2PDMA client
+	 * infrastructure. Also, ensure such requests use REQ_NOMERGE
+	 * seeing requests can not mix P2PDMA and non-P2PDMA memory at
+	 * this time.
+	 */
+	if (bio->bi_vcnt && is_pci_p2pdma_page(bio->bi_io_vec->bv_page)) {
+		if (WARN_ON_ONCE(!blk_queue_pci_p2pdma(q)))
+			goto not_supported;
+		bio->bi_opf |= REQ_NOMERGE;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0e2b64..7bf80ca802e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -699,6 +699,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PCI_P2PDMA  30	/* device supports pci p2p requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -731,6 +732,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_scsi_passthrough(q)	\
 	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	\
+	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.

When a request is submitted we check if PCI P2PDMA memory is assigned
to the first page in the bio. If it is, we ensure the queue it's
submitted to supports it, and enforce REQ_NOMERGE.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 block/blk-core.c       | 14 ++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c282efb..cc0289c7b983 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2264,6 +2264,20 @@ generic_make_request_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
 		goto not_supported;
 
+	/*
+	 * Ensure PCI P2PDMA memory is not used in requests to queues that
+	 * have no support. This should never happen if the higher layers using
+	 * P2PDMA do the right thing and use the proper P2PDMA client
+	 * infrastructure. Also, ensure such requests use REQ_NOMERGE
+	 * seeing requests can not mix P2PDMA and non-P2PDMA memory at
+	 * this time.
+	 */
+	if (bio->bi_vcnt && is_pci_p2pdma_page(bio->bi_io_vec->bv_page)) {
+		if (WARN_ON_ONCE(!blk_queue_pci_p2pdma(q)))
+			goto not_supported;
+		bio->bi_opf |= REQ_NOMERGE;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0e2b64..7bf80ca802e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -699,6 +699,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PCI_P2PDMA  30	/* device supports pci p2p requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -731,6 +732,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_scsi_passthrough(q)	\
 	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	\
+	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.

When a request is submitted we check if PCI P2PDMA memory is assigned
to the first page in the bio. If it is, we ensure the queue it's
submitted to supports it, and enforce REQ_NOMERGE.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
---
 block/blk-core.c       | 14 ++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c282efb..cc0289c7b983 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2264,6 +2264,20 @@ generic_make_request_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
 		goto not_supported;
 
+	/*
+	 * Ensure PCI P2PDMA memory is not used in requests to queues that
+	 * have no support. This should never happen if the higher layers using
+	 * P2PDMA do the right thing and use the proper P2PDMA client
+	 * infrastructure. Also, ensure such requests use REQ_NOMERGE
+	 * seeing requests can not mix P2PDMA and non-P2PDMA memory at
+	 * this time.
+	 */
+	if (bio->bi_vcnt && is_pci_p2pdma_page(bio->bi_io_vec->bv_page)) {
+		if (WARN_ON_ONCE(!blk_queue_pci_p2pdma(q)))
+			goto not_supported;
+		bio->bi_opf |= REQ_NOMERGE;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0e2b64..7bf80ca802e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -699,6 +699,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PCI_P2PDMA  30	/* device supports pci p2p requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -731,6 +732,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_scsi_passthrough(q)	\
 	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	\
+	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.

When a request is submitted we check if PCI P2PDMA memory is assigned
to the first page in the bio. If it is, we ensure the queue it's
submitted to supports it, and enforce REQ_NOMERGE.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 block/blk-core.c       | 14 ++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c282efb..cc0289c7b983 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2264,6 +2264,20 @@ generic_make_request_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
 		goto not_supported;
 
+	/*
+	 * Ensure PCI P2PDMA memory is not used in requests to queues that
+	 * have no support. This should never happen if the higher layers using
+	 * P2PDMA do the right thing and use the proper P2PDMA client
+	 * infrastructure. Also, ensure such requests use REQ_NOMERGE
+	 * seeing requests can not mix P2PDMA and non-P2PDMA memory at
+	 * this time.
+	 */
+	if (bio->bi_vcnt && is_pci_p2pdma_page(bio->bi_io_vec->bv_page)) {
+		if (WARN_ON_ONCE(!blk_queue_pci_p2pdma(q)))
+			goto not_supported;
+		bio->bi_opf |= REQ_NOMERGE;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0e2b64..7bf80ca802e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -699,6 +699,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PCI_P2PDMA  30	/* device supports pci p2p requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -731,6 +732,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_scsi_passthrough(q)	\
 	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	\
+	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.11.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.

When a request is submitted we check if PCI P2PDMA memory is assigned
to the first page in the bio. If it is, we ensure the queue it's
submitted to supports it, and enforce REQ_NOMERGE.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 block/blk-core.c       | 14 ++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c282efb..cc0289c7b983 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2264,6 +2264,20 @@ generic_make_request_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
 		goto not_supported;
 
+	/*
+	 * Ensure PCI P2PDMA memory is not used in requests to queues that
+	 * have no support. This should never happen if the higher layers using
+	 * P2PDMA do the right thing and use the proper P2PDMA client
+	 * infrastructure. Also, ensure such requests use REQ_NOMERGE
+	 * seeing requests can not mix P2PDMA and non-P2PDMA memory at
+	 * this time.
+	 */
+	if (bio->bi_vcnt && is_pci_p2pdma_page(bio->bi_io_vec->bv_page)) {
+		if (WARN_ON_ONCE(!blk_queue_pci_p2pdma(q)))
+			goto not_supported;
+		bio->bi_opf |= REQ_NOMERGE;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0e2b64..7bf80ca802e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -699,6 +699,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PCI_P2PDMA  30	/* device supports pci p2p requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -731,6 +732,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_scsi_passthrough(q)	\
 	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	\
+	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.

To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page is P2P the entire SGL should be P2P.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/rw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 683e6d11a564..d22c4a2ebac6 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -12,6 +12,7 @@
  */
 #include <linux/moduleparam.h>
 #include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
 #include <rdma/mr_pool.h>
 #include <rdma/rw.h>
 
@@ -280,7 +281,11 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	struct ib_device *dev = qp->pd->device;
 	int ret;
 
-	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (is_pci_p2pdma_page(sg_page(sg)))
+		ret = pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
+	else
+		ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+
 	if (!ret)
 		return -ENOMEM;
 	sg_cnt = ret;
@@ -602,7 +607,9 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		break;
 	}
 
-	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+	/* P2PDMA contexts do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(sg)))
+		ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.

To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page is P2P the entire SGL should be P2P.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/rw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 683e6d11a564..d22c4a2ebac6 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -12,6 +12,7 @@
  */
 #include <linux/moduleparam.h>
 #include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
 #include <rdma/mr_pool.h>
 #include <rdma/rw.h>
 
@@ -280,7 +281,11 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	struct ib_device *dev = qp->pd->device;
 	int ret;
 
-	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (is_pci_p2pdma_page(sg_page(sg)))
+		ret = pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
+	else
+		ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+
 	if (!ret)
 		return -ENOMEM;
 	sg_cnt = ret;
@@ -602,7 +607,9 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		break;
 	}
 
-	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+	/* P2PDMA contexts do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(sg)))
+		ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.

To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page is P2P the entire SGL should be P2P.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/rw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 683e6d11a564..d22c4a2ebac6 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -12,6 +12,7 @@
  */
 #include <linux/moduleparam.h>
 #include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
 #include <rdma/mr_pool.h>
 #include <rdma/rw.h>
 
@@ -280,7 +281,11 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	struct ib_device *dev = qp->pd->device;
 	int ret;
 
-	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (is_pci_p2pdma_page(sg_page(sg)))
+		ret = pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
+	else
+		ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+
 	if (!ret)
 		return -ENOMEM;
 	sg_cnt = ret;
@@ -602,7 +607,9 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		break;
 	}
 
-	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+	/* P2PDMA contexts do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(sg)))
+		ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.

To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page is P2P the entire SGL should be P2P.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 drivers/infiniband/core/rw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 683e6d11a564..d22c4a2ebac6 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -12,6 +12,7 @@
  */
 #include <linux/moduleparam.h>
 #include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
 #include <rdma/mr_pool.h>
 #include <rdma/rw.h>
 
@@ -280,7 +281,11 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	struct ib_device *dev = qp->pd->device;
 	int ret;
 
-	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (is_pci_p2pdma_page(sg_page(sg)))
+		ret = pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
+	else
+		ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+
 	if (!ret)
 		return -ENOMEM;
 	sg_cnt = ret;
@@ -602,7 +607,9 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		break;
 	}
 
-	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+	/* P2PDMA contexts do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(sg)))
+		ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations,
this seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not be support PCI P2P and have no support for CMB.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/host/pci.c | 80 +++++++++++++++++++++++++++----------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b9951d2067e..2902585c6ddf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -30,6 +30,7 @@
 #include <linux/types.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/sed-opal.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvme.h"
 
@@ -99,9 +100,8 @@ struct nvme_dev {
 	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
-	void __iomem *cmb;
-	pci_bus_addr_t cmb_bus_addr;
 	u64 cmb_size;
+	bool cmb_use_sqes;
 	u32 cmbsz;
 	u32 cmbloc;
 	struct nvme_ctrl ctrl;
@@ -158,7 +158,7 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	struct nvme_command __iomem *sq_cmds_io;
+	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -439,11 +439,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
 {
 	spin_lock(&nvmeq->sq_lock);
-	if (nvmeq->sq_cmds_io)
-		memcpy_toio(&nvmeq->sq_cmds_io[nvmeq->sq_tail], cmd,
-				sizeof(*cmd));
-	else
-		memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
+
+	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
 	if (++nvmeq->sq_tail == nvmeq->q_depth)
 		nvmeq->sq_tail = 0;
@@ -1224,9 +1221,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	if (nvmeq->sq_cmds)
-		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
-					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+
+	if (nvmeq->sq_cmds) {
+		if (nvmeq->sq_cmds_is_io)
+			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+					nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth));
+		else
+			dma_free_coherent(nvmeq->q_dmadev,
+					  SQ_SIZE(nvmeq->q_depth),
+					  nvmeq->sq_cmds,
+					  nvmeq->sq_dma_addr);
+	}
 }
 
 static void nvme_free_queues(struct nvme_dev *dev, int lowest)
@@ -1315,12 +1321,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
 static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 				int qid, int depth)
 {
-	/* CMB SQEs will be mapped before creation */
-	if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
-		return 0;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+	if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
+		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
+						nvmeq->sq_cmds);
+		nvmeq->sq_cmds_is_io = true;
+	}
+
+	if (!nvmeq->sq_cmds) {
+		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+					&nvmeq->sq_dma_addr, GFP_KERNEL);
+		nvmeq->sq_cmds_is_io = false;
+	}
 
-	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					    &nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
@@ -1397,13 +1412,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 	int result;
 	s16 vector;
 
-	if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
-		unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
-						      dev->ctrl.page_size);
-		nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
-		nvmeq->sq_cmds_io = dev->cmb + offset;
-	}
-
 	/*
 	 * A queue's vector matches the queue identifier unless the controller
 	 * has only one vector available.
@@ -1644,9 +1652,6 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 		return;
 	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
 
-	if (!use_cmb_sqes)
-		return;
-
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
 	bar = NVME_CMB_BIR(dev->cmbloc);
@@ -1663,11 +1668,18 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	if (size > bar_size - offset)
 		size = bar_size - offset;
 
-	dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
-	if (!dev->cmb)
+	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
+		dev_warn(dev->ctrl.device,
+			 "failed to register the CMB\n");
 		return;
-	dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset;
+	}
+
 	dev->cmb_size = size;
+	dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS);
+
+	if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) ==
+			(NVME_CMBSZ_WDS | NVME_CMBSZ_RDS))
+		pci_p2pmem_publish(pdev, true);
 
 	if (sysfs_add_file_to_group(&dev->ctrl.device->kobj,
 				    &dev_attr_cmb.attr, NULL))
@@ -1677,12 +1689,10 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 
 static inline void nvme_release_cmb(struct nvme_dev *dev)
 {
-	if (dev->cmb) {
-		iounmap(dev->cmb);
-		dev->cmb = NULL;
+	if (dev->cmb_size) {
 		sysfs_remove_file_from_group(&dev->ctrl.device->kobj,
 					     &dev_attr_cmb.attr, NULL);
-		dev->cmbsz = 0;
+		dev->cmb_size = 0;
 	}
 }
 
@@ -1881,13 +1891,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	if (nr_io_queues == 0)
 		return 0;
 
-	if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
 				sizeof(struct nvme_command));
 		if (result > 0)
 			dev->q_depth = result;
 		else
-			nvme_release_cmb(dev);
+			dev->cmb_use_sqes = false;
 	}
 
 	do {
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations,
this seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not be support PCI P2P and have no support for CMB.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/host/pci.c | 80 +++++++++++++++++++++++++++----------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b9951d2067e..2902585c6ddf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -30,6 +30,7 @@
 #include <linux/types.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/sed-opal.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvme.h"
 
@@ -99,9 +100,8 @@ struct nvme_dev {
 	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
-	void __iomem *cmb;
-	pci_bus_addr_t cmb_bus_addr;
 	u64 cmb_size;
+	bool cmb_use_sqes;
 	u32 cmbsz;
 	u32 cmbloc;
 	struct nvme_ctrl ctrl;
@@ -158,7 +158,7 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	struct nvme_command __iomem *sq_cmds_io;
+	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -439,11 +439,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
 {
 	spin_lock(&nvmeq->sq_lock);
-	if (nvmeq->sq_cmds_io)
-		memcpy_toio(&nvmeq->sq_cmds_io[nvmeq->sq_tail], cmd,
-				sizeof(*cmd));
-	else
-		memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
+
+	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
 	if (++nvmeq->sq_tail == nvmeq->q_depth)
 		nvmeq->sq_tail = 0;
@@ -1224,9 +1221,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	if (nvmeq->sq_cmds)
-		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
-					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+
+	if (nvmeq->sq_cmds) {
+		if (nvmeq->sq_cmds_is_io)
+			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+					nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth));
+		else
+			dma_free_coherent(nvmeq->q_dmadev,
+					  SQ_SIZE(nvmeq->q_depth),
+					  nvmeq->sq_cmds,
+					  nvmeq->sq_dma_addr);
+	}
 }
 
 static void nvme_free_queues(struct nvme_dev *dev, int lowest)
@@ -1315,12 +1321,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
 static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 				int qid, int depth)
 {
-	/* CMB SQEs will be mapped before creation */
-	if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
-		return 0;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+	if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
+		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
+						nvmeq->sq_cmds);
+		nvmeq->sq_cmds_is_io = true;
+	}
+
+	if (!nvmeq->sq_cmds) {
+		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+					&nvmeq->sq_dma_addr, GFP_KERNEL);
+		nvmeq->sq_cmds_is_io = false;
+	}
 
-	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					    &nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
@@ -1397,13 +1412,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 	int result;
 	s16 vector;
 
-	if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
-		unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
-						      dev->ctrl.page_size);
-		nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
-		nvmeq->sq_cmds_io = dev->cmb + offset;
-	}
-
 	/*
 	 * A queue's vector matches the queue identifier unless the controller
 	 * has only one vector available.
@@ -1644,9 +1652,6 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 		return;
 	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
 
-	if (!use_cmb_sqes)
-		return;
-
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
 	bar = NVME_CMB_BIR(dev->cmbloc);
@@ -1663,11 +1668,18 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	if (size > bar_size - offset)
 		size = bar_size - offset;
 
-	dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
-	if (!dev->cmb)
+	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
+		dev_warn(dev->ctrl.device,
+			 "failed to register the CMB\n");
 		return;
-	dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset;
+	}
+
 	dev->cmb_size = size;
+	dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS);
+
+	if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) ==
+			(NVME_CMBSZ_WDS | NVME_CMBSZ_RDS))
+		pci_p2pmem_publish(pdev, true);
 
 	if (sysfs_add_file_to_group(&dev->ctrl.device->kobj,
 				    &dev_attr_cmb.attr, NULL))
@@ -1677,12 +1689,10 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 
 static inline void nvme_release_cmb(struct nvme_dev *dev)
 {
-	if (dev->cmb) {
-		iounmap(dev->cmb);
-		dev->cmb = NULL;
+	if (dev->cmb_size) {
 		sysfs_remove_file_from_group(&dev->ctrl.device->kobj,
 					     &dev_attr_cmb.attr, NULL);
-		dev->cmbsz = 0;
+		dev->cmb_size = 0;
 	}
 }
 
@@ -1881,13 +1891,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	if (nr_io_queues == 0)
 		return 0;
 
-	if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
 				sizeof(struct nvme_command));
 		if (result > 0)
 			dev->q_depth = result;
 		else
-			nvme_release_cmb(dev);
+			dev->cmb_use_sqes = false;
 	}
 
 	do {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations,
this seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not be support PCI P2P and have no support for CMB.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
---
 drivers/nvme/host/pci.c | 80 +++++++++++++++++++++++++++----------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b9951d2067e..2902585c6ddf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -30,6 +30,7 @@
 #include <linux/types.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/sed-opal.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvme.h"
 
@@ -99,9 +100,8 @@ struct nvme_dev {
 	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
-	void __iomem *cmb;
-	pci_bus_addr_t cmb_bus_addr;
 	u64 cmb_size;
+	bool cmb_use_sqes;
 	u32 cmbsz;
 	u32 cmbloc;
 	struct nvme_ctrl ctrl;
@@ -158,7 +158,7 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	struct nvme_command __iomem *sq_cmds_io;
+	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -439,11 +439,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
 {
 	spin_lock(&nvmeq->sq_lock);
-	if (nvmeq->sq_cmds_io)
-		memcpy_toio(&nvmeq->sq_cmds_io[nvmeq->sq_tail], cmd,
-				sizeof(*cmd));
-	else
-		memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
+
+	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
 	if (++nvmeq->sq_tail == nvmeq->q_depth)
 		nvmeq->sq_tail = 0;
@@ -1224,9 +1221,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	if (nvmeq->sq_cmds)
-		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
-					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+
+	if (nvmeq->sq_cmds) {
+		if (nvmeq->sq_cmds_is_io)
+			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+					nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth));
+		else
+			dma_free_coherent(nvmeq->q_dmadev,
+					  SQ_SIZE(nvmeq->q_depth),
+					  nvmeq->sq_cmds,
+					  nvmeq->sq_dma_addr);
+	}
 }
 
 static void nvme_free_queues(struct nvme_dev *dev, int lowest)
@@ -1315,12 +1321,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
 static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 				int qid, int depth)
 {
-	/* CMB SQEs will be mapped before creation */
-	if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
-		return 0;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+	if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
+		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
+						nvmeq->sq_cmds);
+		nvmeq->sq_cmds_is_io = true;
+	}
+
+	if (!nvmeq->sq_cmds) {
+		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+					&nvmeq->sq_dma_addr, GFP_KERNEL);
+		nvmeq->sq_cmds_is_io = false;
+	}
 
-	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					    &nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
@@ -1397,13 +1412,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 	int result;
 	s16 vector;
 
-	if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
-		unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
-						      dev->ctrl.page_size);
-		nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
-		nvmeq->sq_cmds_io = dev->cmb + offset;
-	}
-
 	/*
 	 * A queue's vector matches the queue identifier unless the controller
 	 * has only one vector available.
@@ -1644,9 +1652,6 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 		return;
 	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
 
-	if (!use_cmb_sqes)
-		return;
-
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
 	bar = NVME_CMB_BIR(dev->cmbloc);
@@ -1663,11 +1668,18 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	if (size > bar_size - offset)
 		size = bar_size - offset;
 
-	dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
-	if (!dev->cmb)
+	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
+		dev_warn(dev->ctrl.device,
+			 "failed to register the CMB\n");
 		return;
-	dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset;
+	}
+
 	dev->cmb_size = size;
+	dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS);
+
+	if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) ==
+			(NVME_CMBSZ_WDS | NVME_CMBSZ_RDS))
+		pci_p2pmem_publish(pdev, true);
 
 	if (sysfs_add_file_to_group(&dev->ctrl.device->kobj,
 				    &dev_attr_cmb.attr, NULL))
@@ -1677,12 +1689,10 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 
 static inline void nvme_release_cmb(struct nvme_dev *dev)
 {
-	if (dev->cmb) {
-		iounmap(dev->cmb);
-		dev->cmb = NULL;
+	if (dev->cmb_size) {
 		sysfs_remove_file_from_group(&dev->ctrl.device->kobj,
 					     &dev_attr_cmb.attr, NULL);
-		dev->cmbsz = 0;
+		dev->cmb_size = 0;
 	}
 }
 
@@ -1881,13 +1891,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	if (nr_io_queues == 0)
 		return 0;
 
-	if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
 				sizeof(struct nvme_command));
 		if (result > 0)
 			dev->q_depth = result;
 		else
-			nvme_release_cmb(dev);
+			dev->cmb_use_sqes = false;
 	}
 
 	do {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations,
this seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not be support PCI P2P and have no support for CMB.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 drivers/nvme/host/pci.c | 80 +++++++++++++++++++++++++++----------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b9951d2067e..2902585c6ddf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -30,6 +30,7 @@
 #include <linux/types.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/sed-opal.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvme.h"
 
@@ -99,9 +100,8 @@ struct nvme_dev {
 	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
-	void __iomem *cmb;
-	pci_bus_addr_t cmb_bus_addr;
 	u64 cmb_size;
+	bool cmb_use_sqes;
 	u32 cmbsz;
 	u32 cmbloc;
 	struct nvme_ctrl ctrl;
@@ -158,7 +158,7 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	struct nvme_command __iomem *sq_cmds_io;
+	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -439,11 +439,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
 {
 	spin_lock(&nvmeq->sq_lock);
-	if (nvmeq->sq_cmds_io)
-		memcpy_toio(&nvmeq->sq_cmds_io[nvmeq->sq_tail], cmd,
-				sizeof(*cmd));
-	else
-		memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
+
+	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
 	if (++nvmeq->sq_tail == nvmeq->q_depth)
 		nvmeq->sq_tail = 0;
@@ -1224,9 +1221,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	if (nvmeq->sq_cmds)
-		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
-					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+
+	if (nvmeq->sq_cmds) {
+		if (nvmeq->sq_cmds_is_io)
+			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+					nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth));
+		else
+			dma_free_coherent(nvmeq->q_dmadev,
+					  SQ_SIZE(nvmeq->q_depth),
+					  nvmeq->sq_cmds,
+					  nvmeq->sq_dma_addr);
+	}
 }
 
 static void nvme_free_queues(struct nvme_dev *dev, int lowest)
@@ -1315,12 +1321,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
 static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 				int qid, int depth)
 {
-	/* CMB SQEs will be mapped before creation */
-	if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
-		return 0;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+	if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
+		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
+						nvmeq->sq_cmds);
+		nvmeq->sq_cmds_is_io = true;
+	}
+
+	if (!nvmeq->sq_cmds) {
+		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+					&nvmeq->sq_dma_addr, GFP_KERNEL);
+		nvmeq->sq_cmds_is_io = false;
+	}
 
-	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					    &nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
@@ -1397,13 +1412,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 	int result;
 	s16 vector;
 
-	if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
-		unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
-						      dev->ctrl.page_size);
-		nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
-		nvmeq->sq_cmds_io = dev->cmb + offset;
-	}
-
 	/*
 	 * A queue's vector matches the queue identifier unless the controller
 	 * has only one vector available.
@@ -1644,9 +1652,6 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 		return;
 	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
 
-	if (!use_cmb_sqes)
-		return;
-
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
 	bar = NVME_CMB_BIR(dev->cmbloc);
@@ -1663,11 +1668,18 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	if (size > bar_size - offset)
 		size = bar_size - offset;
 
-	dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
-	if (!dev->cmb)
+	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
+		dev_warn(dev->ctrl.device,
+			 "failed to register the CMB\n");
 		return;
-	dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset;
+	}
+
 	dev->cmb_size = size;
+	dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS);
+
+	if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) ==
+			(NVME_CMBSZ_WDS | NVME_CMBSZ_RDS))
+		pci_p2pmem_publish(pdev, true);
 
 	if (sysfs_add_file_to_group(&dev->ctrl.device->kobj,
 				    &dev_attr_cmb.attr, NULL))
@@ -1677,12 +1689,10 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 
 static inline void nvme_release_cmb(struct nvme_dev *dev)
 {
-	if (dev->cmb) {
-		iounmap(dev->cmb);
-		dev->cmb = NULL;
+	if (dev->cmb_size) {
 		sysfs_remove_file_from_group(&dev->ctrl.device->kobj,
 					     &dev_attr_cmb.attr, NULL);
-		dev->cmbsz = 0;
+		dev->cmb_size = 0;
 	}
 }
 
@@ -1881,13 +1891,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	if (nr_io_queues == 0)
 		return 0;
 
-	if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
 				sizeof(struct nvme_command));
 		if (result > 0)
 			dev->q_depth = result;
 		else
-			nvme_release_cmb(dev);
+			dev->cmb_use_sqes = false;
 	}
 
 	do {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
  2018-08-30 18:53 ` Logan Gunthorpe
                     ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2902585c6ddf..bb2120d30e39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2392,7 +2400,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2902585c6ddf..bb2120d30e39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2392,7 +2400,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
-- 
2.11.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2902585c6ddf..bb2120d30e39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2392,7 +2400,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2902585c6ddf..bb2120d30e39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2392,7 +2400,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2902585c6ddf..bb2120d30e39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2392,7 +2400,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 11/13] nvme-pci: Add a quirk for a pseudo CMB
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.

We'd like to use some of these older cards to test P2P memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h |  7 +++++++
 drivers/nvme/host/pci.c  | 24 ++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4030743c90aa..8e6f3bcfe956 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -90,6 +90,13 @@ enum nvme_quirks {
 	 * Set MEDIUM priority on SQ creation
 	 */
 	NVME_QUIRK_MEDIUM_PRIO_SQ		= (1 << 7),
+
+	/*
+	 * Pseudo CMB Support on BAR 4. For adapters like the Microsemi
+	 * NVRAM that have CMB-like memory on a BAR but does not set
+	 * CMBLOC or CMBSZ.
+	 */
+	NVME_QUIRK_PSEUDO_CMB_BAR4		= (1 << 8),
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb2120d30e39..f898f2ab1420 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1636,6 +1636,13 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static u32 nvme_pseudo_cmbsz(struct pci_dev *pdev, int bar)
+{
+	return NVME_CMBSZ_WDS | NVME_CMBSZ_RDS |
+		(((ilog2(SZ_16M) - 12) / 4) << NVME_CMBSZ_SZU_SHIFT) |
+		((pci_resource_len(pdev, bar) / SZ_16M) << NVME_CMBSZ_SZ_SHIFT);
+}
+
 static u64 nvme_cmb_size_unit(struct nvme_dev *dev)
 {
 	u8 szu = (dev->cmbsz >> NVME_CMBSZ_SZU_SHIFT) & NVME_CMBSZ_SZU_MASK;
@@ -1655,10 +1662,15 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int bar;
 
-	dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
-	if (!dev->cmbsz)
-		return;
-	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	if (dev->ctrl.quirks & NVME_QUIRK_PSEUDO_CMB_BAR4) {
+		dev->cmbsz = nvme_pseudo_cmbsz(pdev, 4);
+		dev->cmbloc = 4;
+	} else {
+		dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
+		if (!dev->cmbsz)
+			return;
+		dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	}
 
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
@@ -2707,6 +2719,10 @@ static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
 	{ PCI_DEVICE(0x1d1d, 0x2601),	/* CNEX Granby */
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
+	{ PCI_DEVICE(0x11f8, 0xf117),	/* Microsemi NVRAM adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4, },
+	{ PCI_DEVICE(0x1db1, 0x0002),	/* Everspin nvNitro adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4,  },
 	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 11/13] nvme-pci: Add a quirk for a pseudo CMB
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.

We'd like to use some of these older cards to test P2P memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h |  7 +++++++
 drivers/nvme/host/pci.c  | 24 ++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4030743c90aa..8e6f3bcfe956 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -90,6 +90,13 @@ enum nvme_quirks {
 	 * Set MEDIUM priority on SQ creation
 	 */
 	NVME_QUIRK_MEDIUM_PRIO_SQ		= (1 << 7),
+
+	/*
+	 * Pseudo CMB Support on BAR 4. For adapters like the Microsemi
+	 * NVRAM that have CMB-like memory on a BAR but does not set
+	 * CMBLOC or CMBSZ.
+	 */
+	NVME_QUIRK_PSEUDO_CMB_BAR4		= (1 << 8),
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb2120d30e39..f898f2ab1420 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1636,6 +1636,13 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static u32 nvme_pseudo_cmbsz(struct pci_dev *pdev, int bar)
+{
+	return NVME_CMBSZ_WDS | NVME_CMBSZ_RDS |
+		(((ilog2(SZ_16M) - 12) / 4) << NVME_CMBSZ_SZU_SHIFT) |
+		((pci_resource_len(pdev, bar) / SZ_16M) << NVME_CMBSZ_SZ_SHIFT);
+}
+
 static u64 nvme_cmb_size_unit(struct nvme_dev *dev)
 {
 	u8 szu = (dev->cmbsz >> NVME_CMBSZ_SZU_SHIFT) & NVME_CMBSZ_SZU_MASK;
@@ -1655,10 +1662,15 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int bar;
 
-	dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
-	if (!dev->cmbsz)
-		return;
-	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	if (dev->ctrl.quirks & NVME_QUIRK_PSEUDO_CMB_BAR4) {
+		dev->cmbsz = nvme_pseudo_cmbsz(pdev, 4);
+		dev->cmbloc = 4;
+	} else {
+		dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
+		if (!dev->cmbsz)
+			return;
+		dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	}
 
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
@@ -2707,6 +2719,10 @@ static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
 	{ PCI_DEVICE(0x1d1d, 0x2601),	/* CNEX Granby */
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
+	{ PCI_DEVICE(0x11f8, 0xf117),	/* Microsemi NVRAM adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4, },
+	{ PCI_DEVICE(0x1db1, 0x0002),	/* Everspin nvNitro adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4,  },
 	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 11/13] nvme-pci: Add a quirk for a pseudo CMB
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.

We'd like to use some of these older cards to test P2P memory.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/nvme.h |  7 +++++++
 drivers/nvme/host/pci.c  | 24 ++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4030743c90aa..8e6f3bcfe956 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -90,6 +90,13 @@ enum nvme_quirks {
 	 * Set MEDIUM priority on SQ creation
 	 */
 	NVME_QUIRK_MEDIUM_PRIO_SQ		= (1 << 7),
+
+	/*
+	 * Pseudo CMB Support on BAR 4. For adapters like the Microsemi
+	 * NVRAM that have CMB-like memory on a BAR but does not set
+	 * CMBLOC or CMBSZ.
+	 */
+	NVME_QUIRK_PSEUDO_CMB_BAR4		= (1 << 8),
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb2120d30e39..f898f2ab1420 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1636,6 +1636,13 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static u32 nvme_pseudo_cmbsz(struct pci_dev *pdev, int bar)
+{
+	return NVME_CMBSZ_WDS | NVME_CMBSZ_RDS |
+		(((ilog2(SZ_16M) - 12) / 4) << NVME_CMBSZ_SZU_SHIFT) |
+		((pci_resource_len(pdev, bar) / SZ_16M) << NVME_CMBSZ_SZ_SHIFT);
+}
+
 static u64 nvme_cmb_size_unit(struct nvme_dev *dev)
 {
 	u8 szu = (dev->cmbsz >> NVME_CMBSZ_SZU_SHIFT) & NVME_CMBSZ_SZU_MASK;
@@ -1655,10 +1662,15 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int bar;
 
-	dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
-	if (!dev->cmbsz)
-		return;
-	dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	if (dev->ctrl.quirks & NVME_QUIRK_PSEUDO_CMB_BAR4) {
+		dev->cmbsz = nvme_pseudo_cmbsz(pdev, 4);
+		dev->cmbloc = 4;
+	} else {
+		dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
+		if (!dev->cmbsz)
+			return;
+		dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
+	}
 
 	size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev);
 	offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc);
@@ -2707,6 +2719,10 @@ static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
 	{ PCI_DEVICE(0x1d1d, 0x2601),	/* CNEX Granby */
 		.driver_data = NVME_QUIRK_LIGHTNVM, },
+	{ PCI_DEVICE(0x11f8, 0xf117),	/* Microsemi NVRAM adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4, },
+	{ PCI_DEVICE(0x1db1, 0x0002),	/* Everspin nvNitro adaptor */
+		.driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4,  },
 	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Add helpers to allocate and free the SGL in a struct nvmet_req:

int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
void nvmet_req_free_sgl(struct nvmet_req *req)

This will be expanded in a future patch to implement peer-to-peer
memory DMAs and should be common with all target drivers. The presently
unused 'sq' argument in the alloc function will be necessary to
decide whether to use peer-to-peer memory and obtain the correct
provider to allocate the memory.

The new helpers are used in nvmet-rdma. Seeing we use req.transfer_len
as the length of the SGL it is set earlier and cleared on any error.
It also seems to be unnecessary to accumulate the length as the map_sgl
functions should only ever be called once per request.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/core.c  | 18 ++++++++++++++++++
 drivers/nvme/target/nvmet.h |  2 ++
 drivers/nvme/target/rdma.c  | 20 ++++++++++++--------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index ebf3e7a6c49e..6a1c8d5f552b 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -725,6 +725,24 @@ void nvmet_req_execute(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
+{
+	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
+	if (!req->sg)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
+
+void nvmet_req_free_sgl(struct nvmet_req *req)
+{
+	sgl_free(req->sg);
+	req->sg = NULL;
+	req->sg_cnt = 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_free_sgl);
+
 static inline bool nvmet_cc_en(u32 cc)
 {
 	return (cc >> NVME_CC_EN_SHIFT) & 0x1;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ec9af4ee03b6..7d6cb61021e4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -336,6 +336,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 void nvmet_req_uninit(struct nvmet_req *req);
 void nvmet_req_execute(struct nvmet_req *req);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq);
+void nvmet_req_free_sgl(struct nvmet_req *req);
 
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 3533e918ea37..e148dee72ba5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -489,7 +489,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	}
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
-		sgl_free(rsp->req.sg);
+		nvmet_req_free_sgl(&rsp->req);
 
 	if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
 		nvmet_rdma_process_wr_wait_list(queue);
@@ -638,24 +638,24 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 {
 	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
 	u64 addr = le64_to_cpu(sgl->addr);
-	u32 len = get_unaligned_le24(sgl->length);
 	u32 key = get_unaligned_le32(sgl->key);
 	int ret;
 
+	rsp->req.transfer_len = get_unaligned_le24(sgl->length);
+
 	/* no data command? */
-	if (!len)
+	if (!rsp->req.transfer_len)
 		return 0;
 
-	rsp->req.sg = sgl_alloc(len, GFP_KERNEL, &rsp->req.sg_cnt);
-	if (!rsp->req.sg)
-		return NVME_SC_INTERNAL;
+	ret = nvmet_req_alloc_sgl(&rsp->req, &rsp->queue->nvme_sq);
+	if (ret < 0)
+		goto error_out;
 
 	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
 			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
 			nvmet_data_dir(&rsp->req));
 	if (ret < 0)
-		return NVME_SC_INTERNAL;
-	rsp->req.transfer_len += len;
+		goto error_out;
 	rsp->n_rdma += ret;
 
 	if (invalidate) {
@@ -664,6 +664,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	}
 
 	return 0;
+
+error_out:
+	rsp->req.transfer_len = 0;
+	return NVME_SC_INTERNAL;
 }
 
 static u16 nvmet_rdma_map_sgl(struct nvmet_rdma_rsp *rsp)
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Add helpers to allocate and free the SGL in a struct nvmet_req:

int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
void nvmet_req_free_sgl(struct nvmet_req *req)

This will be expanded in a future patch to implement peer-to-peer
memory DMAs and should be common with all target drivers. The presently
unused 'sq' argument in the alloc function will be necessary to
decide whether to use peer-to-peer memory and obtain the correct
provider to allocate the memory.

The new helpers are used in nvmet-rdma. Seeing we use req.transfer_len
as the length of the SGL it is set earlier and cleared on any error.
It also seems to be unnecessary to accumulate the length as the map_sgl
functions should only ever be called once per request.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/core.c  | 18 ++++++++++++++++++
 drivers/nvme/target/nvmet.h |  2 ++
 drivers/nvme/target/rdma.c  | 20 ++++++++++++--------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index ebf3e7a6c49e..6a1c8d5f552b 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -725,6 +725,24 @@ void nvmet_req_execute(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
+{
+	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
+	if (!req->sg)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
+
+void nvmet_req_free_sgl(struct nvmet_req *req)
+{
+	sgl_free(req->sg);
+	req->sg = NULL;
+	req->sg_cnt = 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_free_sgl);
+
 static inline bool nvmet_cc_en(u32 cc)
 {
 	return (cc >> NVME_CC_EN_SHIFT) & 0x1;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ec9af4ee03b6..7d6cb61021e4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -336,6 +336,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 void nvmet_req_uninit(struct nvmet_req *req);
 void nvmet_req_execute(struct nvmet_req *req);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq);
+void nvmet_req_free_sgl(struct nvmet_req *req);
 
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 3533e918ea37..e148dee72ba5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -489,7 +489,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	}
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
-		sgl_free(rsp->req.sg);
+		nvmet_req_free_sgl(&rsp->req);
 
 	if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
 		nvmet_rdma_process_wr_wait_list(queue);
@@ -638,24 +638,24 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 {
 	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
 	u64 addr = le64_to_cpu(sgl->addr);
-	u32 len = get_unaligned_le24(sgl->length);
 	u32 key = get_unaligned_le32(sgl->key);
 	int ret;
 
+	rsp->req.transfer_len = get_unaligned_le24(sgl->length);
+
 	/* no data command? */
-	if (!len)
+	if (!rsp->req.transfer_len)
 		return 0;
 
-	rsp->req.sg = sgl_alloc(len, GFP_KERNEL, &rsp->req.sg_cnt);
-	if (!rsp->req.sg)
-		return NVME_SC_INTERNAL;
+	ret = nvmet_req_alloc_sgl(&rsp->req, &rsp->queue->nvme_sq);
+	if (ret < 0)
+		goto error_out;
 
 	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
 			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
 			nvmet_data_dir(&rsp->req));
 	if (ret < 0)
-		return NVME_SC_INTERNAL;
-	rsp->req.transfer_len += len;
+		goto error_out;
 	rsp->n_rdma += ret;
 
 	if (invalidate) {
@@ -664,6 +664,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	}
 
 	return 0;
+
+error_out:
+	rsp->req.transfer_len = 0;
+	return NVME_SC_INTERNAL;
 }
 
 static u16 nvmet_rdma_map_sgl(struct nvmet_rdma_rsp *rsp)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


Add helpers to allocate and free the SGL in a struct nvmet_req:

int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
void nvmet_req_free_sgl(struct nvmet_req *req)

This will be expanded in a future patch to implement peer-to-peer
memory DMAs and should be common with all target drivers. The presently
unused 'sq' argument in the alloc function will be necessary to
decide whether to use peer-to-peer memory and obtain the correct
provider to allocate the memory.

The new helpers are used in nvmet-rdma. Seeing we use req.transfer_len
as the length of the SGL it is set earlier and cleared on any error.
It also seems to be unnecessary to accumulate the length as the map_sgl
functions should only ever be called once per request.

Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/core.c  | 18 ++++++++++++++++++
 drivers/nvme/target/nvmet.h |  2 ++
 drivers/nvme/target/rdma.c  | 20 ++++++++++++--------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index ebf3e7a6c49e..6a1c8d5f552b 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -725,6 +725,24 @@ void nvmet_req_execute(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
+{
+	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
+	if (!req->sg)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
+
+void nvmet_req_free_sgl(struct nvmet_req *req)
+{
+	sgl_free(req->sg);
+	req->sg = NULL;
+	req->sg_cnt = 0;
+}
+EXPORT_SYMBOL_GPL(nvmet_req_free_sgl);
+
 static inline bool nvmet_cc_en(u32 cc)
 {
 	return (cc >> NVME_CC_EN_SHIFT) & 0x1;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ec9af4ee03b6..7d6cb61021e4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -336,6 +336,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 void nvmet_req_uninit(struct nvmet_req *req);
 void nvmet_req_execute(struct nvmet_req *req);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
+int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq);
+void nvmet_req_free_sgl(struct nvmet_req *req);
 
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 3533e918ea37..e148dee72ba5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -489,7 +489,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	}
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
-		sgl_free(rsp->req.sg);
+		nvmet_req_free_sgl(&rsp->req);
 
 	if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
 		nvmet_rdma_process_wr_wait_list(queue);
@@ -638,24 +638,24 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 {
 	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
 	u64 addr = le64_to_cpu(sgl->addr);
-	u32 len = get_unaligned_le24(sgl->length);
 	u32 key = get_unaligned_le32(sgl->key);
 	int ret;
 
+	rsp->req.transfer_len = get_unaligned_le24(sgl->length);
+
 	/* no data command? */
-	if (!len)
+	if (!rsp->req.transfer_len)
 		return 0;
 
-	rsp->req.sg = sgl_alloc(len, GFP_KERNEL, &rsp->req.sg_cnt);
-	if (!rsp->req.sg)
-		return NVME_SC_INTERNAL;
+	ret = nvmet_req_alloc_sgl(&rsp->req, &rsp->queue->nvme_sq);
+	if (ret < 0)
+		goto error_out;
 
 	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
 			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
 			nvmet_data_dir(&rsp->req));
 	if (ret < 0)
-		return NVME_SC_INTERNAL;
-	rsp->req.transfer_len += len;
+		goto error_out;
 	rsp->n_rdma += ret;
 
 	if (invalidate) {
@@ -664,6 +664,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	}
 
 	return 0;
+
+error_out:
+	rsp->req.transfer_len = 0;
+	return NVME_SC_INTERNAL;
 }
 
 static u16 nvmet_rdma_map_sgl(struct nvmet_rdma_rsp *rsp)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
  2018-08-30 18:53 ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 18:53   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it and no devices are found, then the system
will silently fall back on using regular memory.

If appropriate, that port will allocate memory for the RDMA buffers
for queues from the p2pmem device falling back to system memory should
anything fail.

Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, only a limited number
of cards with CMB buffers seem to be available.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/configfs.c |  36 +++++++++++
 drivers/nvme/target/core.c     | 133 ++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h    |  13 ++++
 drivers/nvme/target/rdma.c     |   2 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index b37a8e3e3f80..0dfb0e0c3d21 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -17,6 +17,8 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/ctype.h>
+#include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -1094,6 +1096,37 @@ static void nvmet_port_release(struct config_item *item)
 	kfree(port);
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static ssize_t nvmet_p2pmem_show(struct config_item *item, char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return pci_p2pdma_enable_show(page, port->p2p_dev, port->use_p2pmem);
+}
+
+static ssize_t nvmet_p2pmem_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	struct pci_dev *p2p_dev = NULL;
+	bool use_p2pmem;
+	int error;
+
+	error = pci_p2pdma_enable_store(page, &p2p_dev, &use_p2pmem);
+	if (error)
+		return error;
+
+	down_write(&nvmet_config_sem);
+	port->use_p2pmem = use_p2pmem;
+	pci_dev_put(port->p2p_dev);
+	port->p2p_dev = p2p_dev;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_, p2pmem);
+#endif /* CONFIG_PCI_P2PDMA */
+
 static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_adrfam,
 	&nvmet_attr_addr_treq,
@@ -1101,6 +1134,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_PCI_P2PDMA
+	&nvmet_attr_p2pmem,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 6a1c8d5f552b..8f20b1e26c69 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/rculist.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -365,9 +366,29 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns)
 	nvmet_file_ns_disable(ns);
 }
 
+static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl,
+		struct nvmet_ns *ns)
+{
+	int ret;
+
+	if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) {
+		pr_err("peer-to-peer DMA is not supported by %s\n",
+		       ns->device_path);
+		return -EINVAL;
+	}
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+	if (ret)
+		pr_err("failed to add peer-to-peer DMA client %s: %d\n",
+		       ns->device_path, ret);
+
+	return ret;
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 	int ret;
 
 	mutex_lock(&subsys->lock);
@@ -389,6 +410,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ret)
 		goto out_dev_put;
 
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl->p2p_dev) {
+			ret = nvmet_p2pdma_add_client(ctrl, ns);
+			if (ret)
+				goto out_remove_clients;
+		}
+	}
+
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
@@ -417,6 +446,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+out_remove_clients:
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
 out_dev_put:
 	nvmet_ns_dev_disable(ns);
 	goto out_unlock;
@@ -425,6 +457,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 void nvmet_ns_disable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
 	if (!ns->enabled)
@@ -450,6 +483,12 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
+	}
+
 	subsys->nr_namespaces--;
 	nvmet_ns_changed(subsys, ns->nsid);
 	nvmet_ns_dev_disable(ns);
@@ -727,6 +766,23 @@ EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
 int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
 {
+	struct pci_dev *p2p_dev = NULL;
+
+	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
+		if (sq->ctrl)
+			p2p_dev = sq->ctrl->p2p_dev;
+
+		req->p2p_dev = NULL;
+		if (sq->qid && p2p_dev) {
+			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
+						       req->transfer_len);
+			if (req->sg) {
+				req->p2p_dev = p2p_dev;
+				return 0;
+			}
+		}
+	}
+
 	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
 	if (!req->sg)
 		return -ENOMEM;
@@ -737,7 +793,11 @@ EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
 
 void nvmet_req_free_sgl(struct nvmet_req *req)
 {
-	sgl_free(req->sg);
+	if (req->p2p_dev)
+		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
+	else
+		sgl_free(req->sg);
+
 	req->sg = NULL;
 	req->sg_cnt = 0;
 }
@@ -939,6 +999,74 @@ bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys,
 		return __nvmet_host_allowed(subsys, hostnqn);
 }
 
+/*
+ * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
+ * Ι/O commands. This requires the PCI p2p device to be compatible with the
+ * backing device for every namespace on this controller.
+ */
+static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
+{
+	struct nvmet_ns *ns;
+	int ret;
+
+	if (!req->port->use_p2pmem || !req->p2p_client)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
+	if (ret) {
+		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
+		       dev_name(req->p2p_client), ret);
+		goto free_devices;
+	}
+
+	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+		ret = nvmet_p2pdma_add_client(ctrl, ns);
+		if (ret)
+			goto free_devices;
+	}
+
+	if (req->port->p2p_dev) {
+		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
+						&ctrl->p2p_clients)) {
+			pr_info("peer-to-peer memory on %s is not supported\n",
+				pci_name(req->port->p2p_dev));
+			goto free_devices;
+		}
+		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
+	} else {
+		ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients);
+		if (!ctrl->p2p_dev) {
+			pr_info("no supported peer-to-peer memory devices found\n");
+			goto free_devices;
+		}
+	}
+
+	mutex_unlock(&ctrl->subsys->lock);
+
+	pr_info("using peer-to-peer memory on %s\n", pci_name(ctrl->p2p_dev));
+	return;
+
+free_devices:
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
+static void nvmet_release_p2pmem(struct nvmet_ctrl *ctrl)
+{
+	if (!ctrl->p2p_dev)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	pci_dev_put(ctrl->p2p_dev);
+	ctrl->p2p_dev = NULL;
+
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
 u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp)
 {
@@ -980,6 +1108,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
+	INIT_LIST_HEAD(&ctrl->p2p_clients);
 
 	memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
 	memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
@@ -1041,6 +1170,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		ctrl->kato = DIV_ROUND_UP(kato, 1000);
 	}
 	nvmet_start_keep_alive_timer(ctrl);
+	nvmet_setup_p2pmem(ctrl, req);
 
 	mutex_lock(&subsys->lock);
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
@@ -1079,6 +1209,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fatal_err_work);
 
+	nvmet_release_p2pmem(ctrl);
 	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
 
 	kfree(ctrl->sqs);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7d6cb61021e4..297861064dd8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -84,6 +84,11 @@ static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
 	return container_of(to_config_group(item), struct nvmet_ns, group);
 }
 
+static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
+{
+	return disk_to_dev(ns->bdev->bd_disk);
+}
+
 struct nvmet_cq {
 	u16			qid;
 	u16			size;
@@ -134,6 +139,8 @@ struct nvmet_port {
 	void				*priv;
 	bool				enabled;
 	int				inline_data_size;
+	bool				use_p2pmem;
+	struct pci_dev			*p2p_dev;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -182,6 +189,9 @@ struct nvmet_ctrl {
 	__le32			*changed_ns_list;
 	u32			nr_changed_ns;
 
+	struct pci_dev		*p2p_dev;
+	struct list_head	p2p_clients;
+
 	char			subsysnqn[NVMF_NQN_FIELD_LEN];
 	char			hostnqn[NVMF_NQN_FIELD_LEN];
 };
@@ -294,6 +304,9 @@ struct nvmet_req {
 
 	void (*execute)(struct nvmet_req *req);
 	const struct nvmet_fabrics_ops *ops;
+
+	struct pci_dev *p2p_dev;
+	struct device *p2p_client;
 };
 
 extern struct workqueue_struct *buffered_io_wq;
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e148dee72ba5..5c9cb752e2ed 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -735,6 +735,8 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
 		cmd->send_sge.addr, cmd->send_sge.length,
 		DMA_TO_DEVICE);
 
+	cmd->req.p2p_client = &queue->dev->device->dev;
+
 	if (!nvmet_req_init(&cmd->req, &queue->nvme_cq,
 			&queue->nvme_sq, &nvmet_rdma_ops))
 		return;
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe, Steve Wise

We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it and no devices are found, then the system
will silently fall back on using regular memory.

If appropriate, that port will allocate memory for the RDMA buffers
for queues from the p2pmem device falling back to system memory should
anything fail.

Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, only a limited number
of cards with CMB buffers seem to be available.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/configfs.c |  36 +++++++++++
 drivers/nvme/target/core.c     | 133 ++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h    |  13 ++++
 drivers/nvme/target/rdma.c     |   2 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index b37a8e3e3f80..0dfb0e0c3d21 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -17,6 +17,8 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/ctype.h>
+#include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -1094,6 +1096,37 @@ static void nvmet_port_release(struct config_item *item)
 	kfree(port);
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static ssize_t nvmet_p2pmem_show(struct config_item *item, char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return pci_p2pdma_enable_show(page, port->p2p_dev, port->use_p2pmem);
+}
+
+static ssize_t nvmet_p2pmem_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	struct pci_dev *p2p_dev = NULL;
+	bool use_p2pmem;
+	int error;
+
+	error = pci_p2pdma_enable_store(page, &p2p_dev, &use_p2pmem);
+	if (error)
+		return error;
+
+	down_write(&nvmet_config_sem);
+	port->use_p2pmem = use_p2pmem;
+	pci_dev_put(port->p2p_dev);
+	port->p2p_dev = p2p_dev;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_, p2pmem);
+#endif /* CONFIG_PCI_P2PDMA */
+
 static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_adrfam,
 	&nvmet_attr_addr_treq,
@@ -1101,6 +1134,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_PCI_P2PDMA
+	&nvmet_attr_p2pmem,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 6a1c8d5f552b..8f20b1e26c69 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/rculist.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -365,9 +366,29 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns)
 	nvmet_file_ns_disable(ns);
 }
 
+static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl,
+		struct nvmet_ns *ns)
+{
+	int ret;
+
+	if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) {
+		pr_err("peer-to-peer DMA is not supported by %s\n",
+		       ns->device_path);
+		return -EINVAL;
+	}
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+	if (ret)
+		pr_err("failed to add peer-to-peer DMA client %s: %d\n",
+		       ns->device_path, ret);
+
+	return ret;
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 	int ret;
 
 	mutex_lock(&subsys->lock);
@@ -389,6 +410,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ret)
 		goto out_dev_put;
 
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl->p2p_dev) {
+			ret = nvmet_p2pdma_add_client(ctrl, ns);
+			if (ret)
+				goto out_remove_clients;
+		}
+	}
+
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
@@ -417,6 +446,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+out_remove_clients:
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
 out_dev_put:
 	nvmet_ns_dev_disable(ns);
 	goto out_unlock;
@@ -425,6 +457,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 void nvmet_ns_disable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
 	if (!ns->enabled)
@@ -450,6 +483,12 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
+	}
+
 	subsys->nr_namespaces--;
 	nvmet_ns_changed(subsys, ns->nsid);
 	nvmet_ns_dev_disable(ns);
@@ -727,6 +766,23 @@ EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
 int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
 {
+	struct pci_dev *p2p_dev = NULL;
+
+	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
+		if (sq->ctrl)
+			p2p_dev = sq->ctrl->p2p_dev;
+
+		req->p2p_dev = NULL;
+		if (sq->qid && p2p_dev) {
+			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
+						       req->transfer_len);
+			if (req->sg) {
+				req->p2p_dev = p2p_dev;
+				return 0;
+			}
+		}
+	}
+
 	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
 	if (!req->sg)
 		return -ENOMEM;
@@ -737,7 +793,11 @@ EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
 
 void nvmet_req_free_sgl(struct nvmet_req *req)
 {
-	sgl_free(req->sg);
+	if (req->p2p_dev)
+		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
+	else
+		sgl_free(req->sg);
+
 	req->sg = NULL;
 	req->sg_cnt = 0;
 }
@@ -939,6 +999,74 @@ bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys,
 		return __nvmet_host_allowed(subsys, hostnqn);
 }
 
+/*
+ * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
+ * Ι/O commands. This requires the PCI p2p device to be compatible with the
+ * backing device for every namespace on this controller.
+ */
+static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
+{
+	struct nvmet_ns *ns;
+	int ret;
+
+	if (!req->port->use_p2pmem || !req->p2p_client)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
+	if (ret) {
+		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
+		       dev_name(req->p2p_client), ret);
+		goto free_devices;
+	}
+
+	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+		ret = nvmet_p2pdma_add_client(ctrl, ns);
+		if (ret)
+			goto free_devices;
+	}
+
+	if (req->port->p2p_dev) {
+		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
+						&ctrl->p2p_clients)) {
+			pr_info("peer-to-peer memory on %s is not supported\n",
+				pci_name(req->port->p2p_dev));
+			goto free_devices;
+		}
+		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
+	} else {
+		ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients);
+		if (!ctrl->p2p_dev) {
+			pr_info("no supported peer-to-peer memory devices found\n");
+			goto free_devices;
+		}
+	}
+
+	mutex_unlock(&ctrl->subsys->lock);
+
+	pr_info("using peer-to-peer memory on %s\n", pci_name(ctrl->p2p_dev));
+	return;
+
+free_devices:
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
+static void nvmet_release_p2pmem(struct nvmet_ctrl *ctrl)
+{
+	if (!ctrl->p2p_dev)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	pci_dev_put(ctrl->p2p_dev);
+	ctrl->p2p_dev = NULL;
+
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
 u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp)
 {
@@ -980,6 +1108,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
+	INIT_LIST_HEAD(&ctrl->p2p_clients);
 
 	memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
 	memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
@@ -1041,6 +1170,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		ctrl->kato = DIV_ROUND_UP(kato, 1000);
 	}
 	nvmet_start_keep_alive_timer(ctrl);
+	nvmet_setup_p2pmem(ctrl, req);
 
 	mutex_lock(&subsys->lock);
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
@@ -1079,6 +1209,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fatal_err_work);
 
+	nvmet_release_p2pmem(ctrl);
 	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
 
 	kfree(ctrl->sqs);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7d6cb61021e4..297861064dd8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -84,6 +84,11 @@ static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
 	return container_of(to_config_group(item), struct nvmet_ns, group);
 }
 
+static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
+{
+	return disk_to_dev(ns->bdev->bd_disk);
+}
+
 struct nvmet_cq {
 	u16			qid;
 	u16			size;
@@ -134,6 +139,8 @@ struct nvmet_port {
 	void				*priv;
 	bool				enabled;
 	int				inline_data_size;
+	bool				use_p2pmem;
+	struct pci_dev			*p2p_dev;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -182,6 +189,9 @@ struct nvmet_ctrl {
 	__le32			*changed_ns_list;
 	u32			nr_changed_ns;
 
+	struct pci_dev		*p2p_dev;
+	struct list_head	p2p_clients;
+
 	char			subsysnqn[NVMF_NQN_FIELD_LEN];
 	char			hostnqn[NVMF_NQN_FIELD_LEN];
 };
@@ -294,6 +304,9 @@ struct nvmet_req {
 
 	void (*execute)(struct nvmet_req *req);
 	const struct nvmet_fabrics_ops *ops;
+
+	struct pci_dev *p2p_dev;
+	struct device *p2p_client;
 };
 
 extern struct workqueue_struct *buffered_io_wq;
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e148dee72ba5..5c9cb752e2ed 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -735,6 +735,8 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
 		cmd->send_sge.addr, cmd->send_sge.length,
 		DMA_TO_DEVICE);
 
+	cmd->req.p2p_client = &queue->dev->device->dev;
+
 	if (!nvmet_req_init(&cmd->req, &queue->nvme_cq,
 			&queue->nvme_sq, &nvmet_rdma_ops))
 		return;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it and no devices are found, then the system
will silently fall back on using regular memory.

If appropriate, that port will allocate memory for the RDMA buffers
for queues from the p2pmem device falling back to system memory should
anything fail.

Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, only a limited number
of cards with CMB buffers seem to be available.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/configfs.c |  36 +++++++++++
 drivers/nvme/target/core.c     | 133 ++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h    |  13 ++++
 drivers/nvme/target/rdma.c     |   2 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index b37a8e3e3f80..0dfb0e0c3d21 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -17,6 +17,8 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/ctype.h>
+#include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -1094,6 +1096,37 @@ static void nvmet_port_release(struct config_item *item)
 	kfree(port);
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static ssize_t nvmet_p2pmem_show(struct config_item *item, char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return pci_p2pdma_enable_show(page, port->p2p_dev, port->use_p2pmem);
+}
+
+static ssize_t nvmet_p2pmem_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	struct pci_dev *p2p_dev = NULL;
+	bool use_p2pmem;
+	int error;
+
+	error = pci_p2pdma_enable_store(page, &p2p_dev, &use_p2pmem);
+	if (error)
+		return error;
+
+	down_write(&nvmet_config_sem);
+	port->use_p2pmem = use_p2pmem;
+	pci_dev_put(port->p2p_dev);
+	port->p2p_dev = p2p_dev;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_, p2pmem);
+#endif /* CONFIG_PCI_P2PDMA */
+
 static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_adrfam,
 	&nvmet_attr_addr_treq,
@@ -1101,6 +1134,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_PCI_P2PDMA
+	&nvmet_attr_p2pmem,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 6a1c8d5f552b..8f20b1e26c69 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/rculist.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -365,9 +366,29 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns)
 	nvmet_file_ns_disable(ns);
 }
 
+static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl,
+		struct nvmet_ns *ns)
+{
+	int ret;
+
+	if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) {
+		pr_err("peer-to-peer DMA is not supported by %s\n",
+		       ns->device_path);
+		return -EINVAL;
+	}
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+	if (ret)
+		pr_err("failed to add peer-to-peer DMA client %s: %d\n",
+		       ns->device_path, ret);
+
+	return ret;
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 	int ret;
 
 	mutex_lock(&subsys->lock);
@@ -389,6 +410,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ret)
 		goto out_dev_put;
 
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl->p2p_dev) {
+			ret = nvmet_p2pdma_add_client(ctrl, ns);
+			if (ret)
+				goto out_remove_clients;
+		}
+	}
+
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
@@ -417,6 +446,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+out_remove_clients:
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
 out_dev_put:
 	nvmet_ns_dev_disable(ns);
 	goto out_unlock;
@@ -425,6 +457,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 void nvmet_ns_disable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
 	if (!ns->enabled)
@@ -450,6 +483,12 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
+	}
+
 	subsys->nr_namespaces--;
 	nvmet_ns_changed(subsys, ns->nsid);
 	nvmet_ns_dev_disable(ns);
@@ -727,6 +766,23 @@ EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
 int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
 {
+	struct pci_dev *p2p_dev = NULL;
+
+	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
+		if (sq->ctrl)
+			p2p_dev = sq->ctrl->p2p_dev;
+
+		req->p2p_dev = NULL;
+		if (sq->qid && p2p_dev) {
+			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
+						       req->transfer_len);
+			if (req->sg) {
+				req->p2p_dev = p2p_dev;
+				return 0;
+			}
+		}
+	}
+
 	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
 	if (!req->sg)
 		return -ENOMEM;
@@ -737,7 +793,11 @@ EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
 
 void nvmet_req_free_sgl(struct nvmet_req *req)
 {
-	sgl_free(req->sg);
+	if (req->p2p_dev)
+		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
+	else
+		sgl_free(req->sg);
+
 	req->sg = NULL;
 	req->sg_cnt = 0;
 }
@@ -939,6 +999,74 @@ bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys,
 		return __nvmet_host_allowed(subsys, hostnqn);
 }
 
+/*
+ * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
+ * Ι/O commands. This requires the PCI p2p device to be compatible with the
+ * backing device for every namespace on this controller.
+ */
+static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
+{
+	struct nvmet_ns *ns;
+	int ret;
+
+	if (!req->port->use_p2pmem || !req->p2p_client)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
+	if (ret) {
+		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
+		       dev_name(req->p2p_client), ret);
+		goto free_devices;
+	}
+
+	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+		ret = nvmet_p2pdma_add_client(ctrl, ns);
+		if (ret)
+			goto free_devices;
+	}
+
+	if (req->port->p2p_dev) {
+		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
+						&ctrl->p2p_clients)) {
+			pr_info("peer-to-peer memory on %s is not supported\n",
+				pci_name(req->port->p2p_dev));
+			goto free_devices;
+		}
+		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
+	} else {
+		ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients);
+		if (!ctrl->p2p_dev) {
+			pr_info("no supported peer-to-peer memory devices found\n");
+			goto free_devices;
+		}
+	}
+
+	mutex_unlock(&ctrl->subsys->lock);
+
+	pr_info("using peer-to-peer memory on %s\n", pci_name(ctrl->p2p_dev));
+	return;
+
+free_devices:
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
+static void nvmet_release_p2pmem(struct nvmet_ctrl *ctrl)
+{
+	if (!ctrl->p2p_dev)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	pci_dev_put(ctrl->p2p_dev);
+	ctrl->p2p_dev = NULL;
+
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
 u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp)
 {
@@ -980,6 +1108,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
+	INIT_LIST_HEAD(&ctrl->p2p_clients);
 
 	memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
 	memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
@@ -1041,6 +1170,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		ctrl->kato = DIV_ROUND_UP(kato, 1000);
 	}
 	nvmet_start_keep_alive_timer(ctrl);
+	nvmet_setup_p2pmem(ctrl, req);
 
 	mutex_lock(&subsys->lock);
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
@@ -1079,6 +1209,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fatal_err_work);
 
+	nvmet_release_p2pmem(ctrl);
 	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
 
 	kfree(ctrl->sqs);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7d6cb61021e4..297861064dd8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -84,6 +84,11 @@ static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
 	return container_of(to_config_group(item), struct nvmet_ns, group);
 }
 
+static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
+{
+	return disk_to_dev(ns->bdev->bd_disk);
+}
+
 struct nvmet_cq {
 	u16			qid;
 	u16			size;
@@ -134,6 +139,8 @@ struct nvmet_port {
 	void				*priv;
 	bool				enabled;
 	int				inline_data_size;
+	bool				use_p2pmem;
+	struct pci_dev			*p2p_dev;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -182,6 +189,9 @@ struct nvmet_ctrl {
 	__le32			*changed_ns_list;
 	u32			nr_changed_ns;
 
+	struct pci_dev		*p2p_dev;
+	struct list_head	p2p_clients;
+
 	char			subsysnqn[NVMF_NQN_FIELD_LEN];
 	char			hostnqn[NVMF_NQN_FIELD_LEN];
 };
@@ -294,6 +304,9 @@ struct nvmet_req {
 
 	void (*execute)(struct nvmet_req *req);
 	const struct nvmet_fabrics_ops *ops;
+
+	struct pci_dev *p2p_dev;
+	struct device *p2p_client;
 };
 
 extern struct workqueue_struct *buffered_io_wq;
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e148dee72ba5..5c9cb752e2ed 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -735,6 +735,8 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
 		cmd->send_sge.addr, cmd->send_sge.length,
 		DMA_TO_DEVICE);
 
+	cmd->req.p2p_client = &queue->dev->device->dev;
+
 	if (!nvmet_req_init(&cmd->req, &queue->nvme_cq,
 			&queue->nvme_sq, &nvmet_rdma_ops))
 		return;
-- 
2.11.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-30 18:53   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 18:53 UTC (permalink / raw)


We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it and no devices are found, then the system
will silently fall back on using regular memory.

If appropriate, that port will allocate memory for the RDMA buffers
for queues from the p2pmem device falling back to system memory should
anything fail.

Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, only a limited number
of cards with CMB buffers seem to be available.

Signed-off-by: Stephen Bates <sbates at raithlin.com>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
---
 drivers/nvme/target/configfs.c |  36 +++++++++++
 drivers/nvme/target/core.c     | 133 ++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h    |  13 ++++
 drivers/nvme/target/rdma.c     |   2 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index b37a8e3e3f80..0dfb0e0c3d21 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -17,6 +17,8 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/ctype.h>
+#include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -1094,6 +1096,37 @@ static void nvmet_port_release(struct config_item *item)
 	kfree(port);
 }
 
+#ifdef CONFIG_PCI_P2PDMA
+static ssize_t nvmet_p2pmem_show(struct config_item *item, char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return pci_p2pdma_enable_show(page, port->p2p_dev, port->use_p2pmem);
+}
+
+static ssize_t nvmet_p2pmem_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	struct pci_dev *p2p_dev = NULL;
+	bool use_p2pmem;
+	int error;
+
+	error = pci_p2pdma_enable_store(page, &p2p_dev, &use_p2pmem);
+	if (error)
+		return error;
+
+	down_write(&nvmet_config_sem);
+	port->use_p2pmem = use_p2pmem;
+	pci_dev_put(port->p2p_dev);
+	port->p2p_dev = p2p_dev;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_, p2pmem);
+#endif /* CONFIG_PCI_P2PDMA */
+
 static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_adrfam,
 	&nvmet_attr_addr_treq,
@@ -1101,6 +1134,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_PCI_P2PDMA
+	&nvmet_attr_p2pmem,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 6a1c8d5f552b..8f20b1e26c69 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/rculist.h>
+#include <linux/pci-p2pdma.h>
 
 #include "nvmet.h"
 
@@ -365,9 +366,29 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns)
 	nvmet_file_ns_disable(ns);
 }
 
+static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl,
+		struct nvmet_ns *ns)
+{
+	int ret;
+
+	if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) {
+		pr_err("peer-to-peer DMA is not supported by %s\n",
+		       ns->device_path);
+		return -EINVAL;
+	}
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+	if (ret)
+		pr_err("failed to add peer-to-peer DMA client %s: %d\n",
+		       ns->device_path, ret);
+
+	return ret;
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 	int ret;
 
 	mutex_lock(&subsys->lock);
@@ -389,6 +410,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ret)
 		goto out_dev_put;
 
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl->p2p_dev) {
+			ret = nvmet_p2pdma_add_client(ctrl, ns);
+			if (ret)
+				goto out_remove_clients;
+		}
+	}
+
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
@@ -417,6 +446,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+out_remove_clients:
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
 out_dev_put:
 	nvmet_ns_dev_disable(ns);
 	goto out_unlock;
@@ -425,6 +457,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 void nvmet_ns_disable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
 	if (!ns->enabled)
@@ -450,6 +483,12 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
+	}
+
 	subsys->nr_namespaces--;
 	nvmet_ns_changed(subsys, ns->nsid);
 	nvmet_ns_dev_disable(ns);
@@ -727,6 +766,23 @@ EXPORT_SYMBOL_GPL(nvmet_req_execute);
 
 int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
 {
+	struct pci_dev *p2p_dev = NULL;
+
+	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
+		if (sq->ctrl)
+			p2p_dev = sq->ctrl->p2p_dev;
+
+		req->p2p_dev = NULL;
+		if (sq->qid && p2p_dev) {
+			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
+						       req->transfer_len);
+			if (req->sg) {
+				req->p2p_dev = p2p_dev;
+				return 0;
+			}
+		}
+	}
+
 	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
 	if (!req->sg)
 		return -ENOMEM;
@@ -737,7 +793,11 @@ EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
 
 void nvmet_req_free_sgl(struct nvmet_req *req)
 {
-	sgl_free(req->sg);
+	if (req->p2p_dev)
+		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
+	else
+		sgl_free(req->sg);
+
 	req->sg = NULL;
 	req->sg_cnt = 0;
 }
@@ -939,6 +999,74 @@ bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys,
 		return __nvmet_host_allowed(subsys, hostnqn);
 }
 
+/*
+ * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
+ * ?/O commands. This requires the PCI p2p device to be compatible with the
+ * backing device for every namespace on this controller.
+ */
+static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
+{
+	struct nvmet_ns *ns;
+	int ret;
+
+	if (!req->port->use_p2pmem || !req->p2p_client)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
+	if (ret) {
+		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
+		       dev_name(req->p2p_client), ret);
+		goto free_devices;
+	}
+
+	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+		ret = nvmet_p2pdma_add_client(ctrl, ns);
+		if (ret)
+			goto free_devices;
+	}
+
+	if (req->port->p2p_dev) {
+		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
+						&ctrl->p2p_clients)) {
+			pr_info("peer-to-peer memory on %s is not supported\n",
+				pci_name(req->port->p2p_dev));
+			goto free_devices;
+		}
+		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
+	} else {
+		ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients);
+		if (!ctrl->p2p_dev) {
+			pr_info("no supported peer-to-peer memory devices found\n");
+			goto free_devices;
+		}
+	}
+
+	mutex_unlock(&ctrl->subsys->lock);
+
+	pr_info("using peer-to-peer memory on %s\n", pci_name(ctrl->p2p_dev));
+	return;
+
+free_devices:
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
+static void nvmet_release_p2pmem(struct nvmet_ctrl *ctrl)
+{
+	if (!ctrl->p2p_dev)
+		return;
+
+	mutex_lock(&ctrl->subsys->lock);
+
+	pci_p2pdma_client_list_free(&ctrl->p2p_clients);
+	pci_dev_put(ctrl->p2p_dev);
+	ctrl->p2p_dev = NULL;
+
+	mutex_unlock(&ctrl->subsys->lock);
+}
+
 u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp)
 {
@@ -980,6 +1108,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
+	INIT_LIST_HEAD(&ctrl->p2p_clients);
 
 	memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
 	memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
@@ -1041,6 +1170,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 		ctrl->kato = DIV_ROUND_UP(kato, 1000);
 	}
 	nvmet_start_keep_alive_timer(ctrl);
+	nvmet_setup_p2pmem(ctrl, req);
 
 	mutex_lock(&subsys->lock);
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
@@ -1079,6 +1209,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fatal_err_work);
 
+	nvmet_release_p2pmem(ctrl);
 	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
 
 	kfree(ctrl->sqs);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7d6cb61021e4..297861064dd8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -84,6 +84,11 @@ static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
 	return container_of(to_config_group(item), struct nvmet_ns, group);
 }
 
+static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
+{
+	return disk_to_dev(ns->bdev->bd_disk);
+}
+
 struct nvmet_cq {
 	u16			qid;
 	u16			size;
@@ -134,6 +139,8 @@ struct nvmet_port {
 	void				*priv;
 	bool				enabled;
 	int				inline_data_size;
+	bool				use_p2pmem;
+	struct pci_dev			*p2p_dev;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -182,6 +189,9 @@ struct nvmet_ctrl {
 	__le32			*changed_ns_list;
 	u32			nr_changed_ns;
 
+	struct pci_dev		*p2p_dev;
+	struct list_head	p2p_clients;
+
 	char			subsysnqn[NVMF_NQN_FIELD_LEN];
 	char			hostnqn[NVMF_NQN_FIELD_LEN];
 };
@@ -294,6 +304,9 @@ struct nvmet_req {
 
 	void (*execute)(struct nvmet_req *req);
 	const struct nvmet_fabrics_ops *ops;
+
+	struct pci_dev *p2p_dev;
+	struct device *p2p_client;
 };
 
 extern struct workqueue_struct *buffered_io_wq;
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e148dee72ba5..5c9cb752e2ed 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -735,6 +735,8 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
 		cmd->send_sge.addr, cmd->send_sge.length,
 		DMA_TO_DEVICE);
 
+	cmd->req.p2p_client = &queue->dev->device->dev;
+
 	if (!nvmet_req_init(&cmd->req, &queue->nvme_cq,
 			&queue->nvme_sq, &nvmet_rdma_ops))
 		return;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-08-30 18:53   ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-30 19:11     ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:11 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
> supports targeting P2P memory.
> 
> When a request is submitted we check if PCI P2PDMA memory is assigned
> to the first page in the bio. If it is, we ensure the queue it's
> submitted to supports it, and enforce REQ_NOMERGE.

I think this belongs in the caller - both the validity check, and
passing in NOMERGE for this type of request. I don't want to impose
this overhead on everything, for a pretty niche case.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:11     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:11 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
> supports targeting P2P memory.
> 
> When a request is submitted we check if PCI P2PDMA memory is assigned
> to the first page in the bio. If it is, we ensure the queue it's
> submitted to supports it, and enforce REQ_NOMERGE.

I think this belongs in the caller - both the validity check, and
passing in NOMERGE for this type of request. I don't want to impose
this overhead on everything, for a pretty niche case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:11     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:11 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
> supports targeting P2P memory.
> 
> When a request is submitted we check if PCI P2PDMA memory is assigned
> to the first page in the bio. If it is, we ensure the queue it's
> submitted to supports it, and enforce REQ_NOMERGE.

I think this belongs in the caller - both the validity check, and
passing in NOMERGE for this type of request. I don't want to impose
this overhead on everything, for a pretty niche case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:11     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:11 UTC (permalink / raw)


On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
> supports targeting P2P memory.
> 
> When a request is submitted we check if PCI P2PDMA memory is assigned
> to the first page in the bio. If it is, we ensure the queue it's
> submitted to supports it, and enforce REQ_NOMERGE.

I think this belongs in the caller - both the validity check, and
passing in NOMERGE for this type of request. I don't want to impose
this overhead on everything, for a pretty niche case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-08-30 19:11     ` Jens Axboe
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 19:17       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:17 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 30/08/18 01:11 PM, Jens Axboe wrote:
> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>> supports targeting P2P memory.
>>
>> When a request is submitted we check if PCI P2PDMA memory is assigned
>> to the first page in the bio. If it is, we ensure the queue it's
>> submitted to supports it, and enforce REQ_NOMERGE.
> 
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

Well, the point was to prevent driver writers from doing the wrong
thing. The WARN_ON would be a bit pointless in the driver if we rely on
the driver to either do the right thing or add the WARN_ON themselves.

If I'm going to change anything I'd drop the warning entirely and move
the NO_MERGE back into the caller...

Note: the check will be compiled out if the kernel does not support PCI P2P.

Logan


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:17       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:17 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 30/08/18 01:11 PM, Jens Axboe wrote:
> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>> supports targeting P2P memory.
>>
>> When a request is submitted we check if PCI P2PDMA memory is assigned
>> to the first page in the bio. If it is, we ensure the queue it's
>> submitted to supports it, and enforce REQ_NOMERGE.
> 
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

Well, the point was to prevent driver writers from doing the wrong
thing. The WARN_ON would be a bit pointless in the driver if we rely on
the driver to either do the right thing or add the WARN_ON themselves.

If I'm going to change anything I'd drop the warning entirely and move
the NO_MERGE back into the caller...

Note: the check will be compiled out if the kernel does not support PCI P2P.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:17       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:17 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 30/08/18 01:11 PM, Jens Axboe wrote:
> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>> supports targeting P2P memory.
>>
>> When a request is submitted we check if PCI P2PDMA memory is assigned
>> to the first page in the bio. If it is, we ensure the queue it's
>> submitted to supports it, and enforce REQ_NOMERGE.
> 
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

Well, the point was to prevent driver writers from doing the wrong
thing. The WARN_ON would be a bit pointless in the driver if we rely on
the driver to either do the right thing or add the WARN_ON themselves.

If I'm going to change anything I'd drop the warning entirely and move
the NO_MERGE back into the caller...

Note: the check will be compiled out if the kernel does not support PCI P2P.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:17       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:17 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig



On 30/08/18 01:11 PM, Jens Axboe wrote:
> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>> supports targeting P2P memory.
>>
>> When a request is submitted we check if PCI P2PDMA memory is assigned
>> to the first page in the bio. If it is, we ensure the queue it's
>> submitted to supports it, and enforce REQ_NOMERGE.
> 
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

Well, the point was to prevent driver writers from doing the wrong
thing. The WARN_ON would be a bit pointless in the driver if we rely on
the driver to either do the right thing or add the WARN_ON themselves.

If I'm going to change anything I'd drop the warning entirely and move
the NO_MERGE back into the caller...

Note: the check will be compiled out if the kernel does not support PCI P2P.

Logan



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:17       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:17 UTC (permalink / raw)




On 30/08/18 01:11 PM, Jens Axboe wrote:
> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>> supports targeting P2P memory.
>>
>> When a request is submitted we check if PCI P2PDMA memory is assigned
>> to the first page in the bio. If it is, we ensure the queue it's
>> submitted to supports it, and enforce REQ_NOMERGE.
> 
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

Well, the point was to prevent driver writers from doing the wrong
thing. The WARN_ON would be a bit pointless in the driver if we rely on
the driver to either do the right thing or add the WARN_ON themselves.

If I'm going to change anything I'd drop the warning entirely and move
the NO_MERGE back into the caller...

Note: the check will be compiled out if the kernel does not support PCI P2P.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-08-30 19:17       ` Logan Gunthorpe
                           ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 19:19         ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:19 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On 8/30/18 1:17 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/08/18 01:11 PM, Jens Axboe wrote:
>> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>>> supports targeting P2P memory.
>>>
>>> When a request is submitted we check if PCI P2PDMA memory is assigned
>>> to the first page in the bio. If it is, we ensure the queue it's
>>> submitted to supports it, and enforce REQ_NOMERGE.
>>
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> Well, the point was to prevent driver writers from doing the wrong
> thing. The WARN_ON would be a bit pointless in the driver if we rely on
> the driver to either do the right thing or add the WARN_ON themselves.
> 
> If I'm going to change anything I'd drop the warning entirely and move
> the NO_MERGE back into the caller...

Of course, if you move it into the caller, the warning makes no sense.

> Note: the check will be compiled out if the kernel does not support PCI P2P.

Sure, but then distros tend to enable everything...

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:19         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:19 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 8/30/18 1:17 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/08/18 01:11 PM, Jens Axboe wrote:
>> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>>> supports targeting P2P memory.
>>>
>>> When a request is submitted we check if PCI P2PDMA memory is assigned
>>> to the first page in the bio. If it is, we ensure the queue it's
>>> submitted to supports it, and enforce REQ_NOMERGE.
>>
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> Well, the point was to prevent driver writers from doing the wrong
> thing. The WARN_ON would be a bit pointless in the driver if we rely on
> the driver to either do the right thing or add the WARN_ON themselves.
> 
> If I'm going to change anything I'd drop the warning entirely and move
> the NO_MERGE back into the caller...

Of course, if you move it into the caller, the warning makes no sense.

> Note: the check will be compiled out if the kernel does not support PCI P2P.

Sure, but then distros tend to enable everything...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:19         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:19 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On 8/30/18 1:17 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/08/18 01:11 PM, Jens Axboe wrote:
>> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>>> supports targeting P2P memory.
>>>
>>> When a request is submitted we check if PCI P2PDMA memory is assigned
>>> to the first page in the bio. If it is, we ensure the queue it's
>>> submitted to supports it, and enforce REQ_NOMERGE.
>>
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> Well, the point was to prevent driver writers from doing the wrong
> thing. The WARN_ON would be a bit pointless in the driver if we rely on
> the driver to either do the right thing or add the WARN_ON themselves.
> 
> If I'm going to change anything I'd drop the warning entirely and move
> the NO_MERGE back into the caller...

Of course, if you move it into the caller, the warning makes no sense.

> Note: the check will be compiled out if the kernel does not support PCI P2P.

Sure, but then distros tend to enable everything...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:19         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:19 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

On 8/30/18 1:17 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/08/18 01:11 PM, Jens Axboe wrote:
>> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>>> supports targeting P2P memory.
>>>
>>> When a request is submitted we check if PCI P2PDMA memory is assigned
>>> to the first page in the bio. If it is, we ensure the queue it's
>>> submitted to supports it, and enforce REQ_NOMERGE.
>>
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> Well, the point was to prevent driver writers from doing the wrong
> thing. The WARN_ON would be a bit pointless in the driver if we rely on
> the driver to either do the right thing or add the WARN_ON themselves.
> 
> If I'm going to change anything I'd drop the warning entirely and move
> the NO_MERGE back into the caller...

Of course, if you move it into the caller, the warning makes no sense.

> Note: the check will be compiled out if the kernel does not support PCI P2P.

Sure, but then distros tend to enable everything...

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-08-30 19:19         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-08-30 19:19 UTC (permalink / raw)


On 8/30/18 1:17 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/08/18 01:11 PM, Jens Axboe wrote:
>> On 8/30/18 12:53 PM, Logan Gunthorpe wrote:
>>> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
>>> supports targeting P2P memory.
>>>
>>> When a request is submitted we check if PCI P2PDMA memory is assigned
>>> to the first page in the bio. If it is, we ensure the queue it's
>>> submitted to supports it, and enforce REQ_NOMERGE.
>>
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> Well, the point was to prevent driver writers from doing the wrong
> thing. The WARN_ON would be a bit pointless in the driver if we rely on
> the driver to either do the right thing or add the WARN_ON themselves.
> 
> If I'm going to change anything I'd drop the warning entirely and move
> the NO_MERGE back into the caller...

Of course, if you move it into the caller, the warning makes no sense.

> Note: the check will be compiled out if the kernel does not support PCI P2P.

Sure, but then distros tend to enable everything...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
  2018-08-30 18:53 ` Logan Gunthorpe
                     ` (3 preceding siblings ...)
  (?)
@ 2018-08-30 19:20   ` Jerome Glisse
  -1 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	linux-block, Alex Williamson, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> 
> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
Jérôme
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:20   ` Jerome Glisse
  0 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> 
> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
J�r�me

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:20   ` Jerome Glisse
  0 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Alex Williamson,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> 
> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:20   ` Jerome Glisse
  0 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> 
> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:20   ` Jerome Glisse
  0 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Sagi Grimberg, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, Stephen Bates, linux-block,
	Alex Williamson, Jason Gunthorpe, Christian König,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christoph Hellwig

On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> =

> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
J=E9r=F4me

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:20   ` Jerome Glisse
  0 siblings, 0 replies; 265+ messages in thread
From: Jerome Glisse @ 2018-08-30 19:20 UTC (permalink / raw)


On Thu, Aug 30, 2018@12:53:39PM -0600, Logan Gunthorpe wrote:

[...]

> 
> When the PCI P2PDMA config option is selected the ACS bits in every
> bridge port in the system are turned off to allow traffic to
> pass freely behind the root port. At this time, the bit must be disabled
> at boot so the IOMMU subsystem can correctly create the groups, though
> this could be addressed in the future. There is no way to dynamically
> disable the bit and alter the groups.

Can you provide an example on how to test this ? Like kernel command
line option, the doc patch does not have any such example. It would be
nice to add.

Maybe i have miss it in some of the patch. I just skimmed over for
now.

Cheers,
J?r?me

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
  2018-08-30 19:20   ` Jerome Glisse
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-30 19:30     ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:30 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	linux-block, Alex Williamson, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 30/08/18 01:20 PM, Jerome Glisse wrote:
> On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:
> 
> [...]
> 
>>
>> When the PCI P2PDMA config option is selected the ACS bits in every
>> bridge port in the system are turned off to allow traffic to
>> pass freely behind the root port. At this time, the bit must be disabled
>> at boot so the IOMMU subsystem can correctly create the groups, though
>> this could be addressed in the future. There is no way to dynamically
>> disable the bit and alter the groups.

Oh, sorry this paragraph in the cover letter is wrong now. We now rely
on the disable_acs_redir command line option introduced in

aaca43fda742 ("PCI: Add "pci=disable_acs_redir=" parameter for
peer-to-peer support")


> Can you provide an example on how to test this ? Like kernel command
> line option, the doc patch does not have any such example. It would be
> nice to add.

Do you mean to test the patchset or the ACS bits you quoted?

Testing the patchset is a matter of having the right hardware (ie an
RDMA NIC and CMB enabled NVMe behind a PCIe switch, with the ACS bits
set correctly by the above command line option) and setting the p2pmem
configfs attribute in an nvme-of port to 'yes'.


Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:30     ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:30 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 30/08/18 01:20 PM, Jerome Glisse wrote:
> On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:
> 
> [...]
> 
>>
>> When the PCI P2PDMA config option is selected the ACS bits in every
>> bridge port in the system are turned off to allow traffic to
>> pass freely behind the root port. At this time, the bit must be disabled
>> at boot so the IOMMU subsystem can correctly create the groups, though
>> this could be addressed in the future. There is no way to dynamically
>> disable the bit and alter the groups.

Oh, sorry this paragraph in the cover letter is wrong now. We now rely
on the disable_acs_redir command line option introduced in

aaca43fda742 ("PCI: Add "pci=disable_acs_redir=" parameter for
peer-to-peer support")


> Can you provide an example on how to test this ? Like kernel command
> line option, the doc patch does not have any such example. It would be
> nice to add.

Do you mean to test the patchset or the ACS bits you quoted?

Testing the patchset is a matter of having the right hardware (ie an
RDMA NIC and CMB enabled NVMe behind a PCIe switch, with the ACS bits
set correctly by the above command line option) and setting the p2pmem
configfs attribute in an nvme-of port to 'yes'.


Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:30     ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:30 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Alex Williamson,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig



On 30/08/18 01:20 PM, Jerome Glisse wrote:
> On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:
> 
> [...]
> 
>>
>> When the PCI P2PDMA config option is selected the ACS bits in every
>> bridge port in the system are turned off to allow traffic to
>> pass freely behind the root port. At this time, the bit must be disabled
>> at boot so the IOMMU subsystem can correctly create the groups, though
>> this could be addressed in the future. There is no way to dynamically
>> disable the bit and alter the groups.

Oh, sorry this paragraph in the cover letter is wrong now. We now rely
on the disable_acs_redir command line option introduced in

aaca43fda742 ("PCI: Add "pci=disable_acs_redir=" parameter for
peer-to-peer support")


> Can you provide an example on how to test this ? Like kernel command
> line option, the doc patch does not have any such example. It would be
> nice to add.

Do you mean to test the patchset or the ACS bits you quoted?

Testing the patchset is a matter of having the right hardware (ie an
RDMA NIC and CMB enabled NVMe behind a PCIe switch, with the ACS bits
set correctly by the above command line option) and setting the p2pmem
configfs attribute in an nvme-of port to 'yes'.


Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:30     ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:30 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Keith Busch, Sagi Grimberg, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, Stephen Bates, linux-block,
	Alex Williamson, Jason Gunthorpe, Christian König,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christoph Hellwig



On 30/08/18 01:20 PM, Jerome Glisse wrote:
> On Thu, Aug 30, 2018 at 12:53:39PM -0600, Logan Gunthorpe wrote:
> 
> [...]
> 
>>
>> When the PCI P2PDMA config option is selected the ACS bits in every
>> bridge port in the system are turned off to allow traffic to
>> pass freely behind the root port. At this time, the bit must be disabled
>> at boot so the IOMMU subsystem can correctly create the groups, though
>> this could be addressed in the future. There is no way to dynamically
>> disable the bit and alter the groups.

Oh, sorry this paragraph in the cover letter is wrong now. We now rely
on the disable_acs_redir command line option introduced in

aaca43fda742 ("PCI: Add "pci=disable_acs_redir=" parameter for
peer-to-peer support")


> Can you provide an example on how to test this ? Like kernel command
> line option, the doc patch does not have any such example. It would be
> nice to add.

Do you mean to test the patchset or the ACS bits you quoted?

Testing the patchset is a matter of having the right hardware (ie an
RDMA NIC and CMB enabled NVMe behind a PCIe switch, with the ACS bits
set correctly by the above command line option) and setting the p2pmem
configfs attribute in an nvme-of port to 'yes'.


Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
@ 2018-08-30 19:30     ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-30 19:30 UTC (permalink / raw)




On 30/08/18 01:20 PM, Jerome Glisse wrote:
> On Thu, Aug 30, 2018@12:53:39PM -0600, Logan Gunthorpe wrote:
> 
> [...]
> 
>>
>> When the PCI P2PDMA config option is selected the ACS bits in every
>> bridge port in the system are turned off to allow traffic to
>> pass freely behind the root port. At this time, the bit must be disabled
>> at boot so the IOMMU subsystem can correctly create the groups, though
>> this could be addressed in the future. There is no way to dynamically
>> disable the bit and alter the groups.

Oh, sorry this paragraph in the cover letter is wrong now. We now rely
on the disable_acs_redir command line option introduced in

aaca43fda742 ("PCI: Add "pci=disable_acs_redir=" parameter for
peer-to-peer support")


> Can you provide an example on how to test this ? Like kernel command
> line option, the doc patch does not have any such example. It would be
> nice to add.

Do you mean to test the patchset or the ACS bits you quoted?

Testing the patchset is a matter of having the right hardware (ie an
RDMA NIC and CMB enabled NVMe behind a PCIe switch, with the ACS bits
set correctly by the above command line option) and setting the p2pmem
configfs attribute in an nvme-of port to 'yes'.


Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
  2018-08-30 18:53   ` Logan Gunthorpe
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-31  0:14     ` Sagi Grimberg
  -1 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Acked-by: Sagi Grimberg <sagi@grimberg.me>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-31  0:14     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Bjorn Helgaas,
	Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

Acked-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-31  0:14     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Acked-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-31  0:14     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Stephen Bates, Keith Busch, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Dan Williams,
	Christoph Hellwig

Acked-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs
@ 2018-08-31  0:14     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:14 UTC (permalink / raw)


Acked-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
  2018-08-30 18:53   ` Logan Gunthorpe
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-31  0:18     ` Sagi Grimberg
  -1 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:18 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sai@grimberg.me>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-31  0:18     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:18 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Bjorn Helgaas,
	Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

Reviewed-by: Sagi Grimberg <sai@grimberg.me>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-31  0:18     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:18 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sai-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-31  0:18     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:18 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Alex Williamson,
	Stephen Bates, Keith Busch, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Dan Williams,
	Christoph Hellwig

Reviewed-by: Sagi Grimberg <sai@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
@ 2018-08-31  0:18     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:18 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sai at grimberg.me>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
  2018-08-30 18:53   ` Logan Gunthorpe
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-31  0:25     ` Sagi Grimberg
  -1 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:25 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig


> +/*
> + * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
> + * Ι/O commands. This requires the PCI p2p device to be compatible with the
> + * backing device for every namespace on this controller.
> + */
> +static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
> +{
> +	struct nvmet_ns *ns;
> +	int ret;
> +
> +	if (!req->port->use_p2pmem || !req->p2p_client)
> +		return;
> +
> +	mutex_lock(&ctrl->subsys->lock);
> +
> +	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
> +	if (ret) {
> +		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
> +		       dev_name(req->p2p_client), ret);
> +		goto free_devices;
> +	}
> +
> +	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
> +		ret = nvmet_p2pdma_add_client(ctrl, ns);
> +		if (ret)
> +			goto free_devices;
> +	}
> +
> +	if (req->port->p2p_dev) {
> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
> +						&ctrl->p2p_clients)) {
> +			pr_info("peer-to-peer memory on %s is not supported\n",
> +				pci_name(req->port->p2p_dev));
> +			goto free_devices;
> +		}
> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
> +	} else {

When is port->p2p_dev == NULL? a little more documentation would help 
here...
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31  0:25     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:25 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Bjorn Helgaas,
	Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Steve Wise


> +/*
> + * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
> + * Ι/O commands. This requires the PCI p2p device to be compatible with the
> + * backing device for every namespace on this controller.
> + */
> +static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
> +{
> +	struct nvmet_ns *ns;
> +	int ret;
> +
> +	if (!req->port->use_p2pmem || !req->p2p_client)
> +		return;
> +
> +	mutex_lock(&ctrl->subsys->lock);
> +
> +	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
> +	if (ret) {
> +		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
> +		       dev_name(req->p2p_client), ret);
> +		goto free_devices;
> +	}
> +
> +	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
> +		ret = nvmet_p2pdma_add_client(ctrl, ns);
> +		if (ret)
> +			goto free_devices;
> +	}
> +
> +	if (req->port->p2p_dev) {
> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
> +						&ctrl->p2p_clients)) {
> +			pr_info("peer-to-peer memory on %s is not supported\n",
> +				pci_name(req->port->p2p_dev));
> +			goto free_devices;
> +		}
> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
> +	} else {

When is port->p2p_dev == NULL? a little more documentation would help 
here...

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31  0:25     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:25 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig


> +/*
> + * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
> + * Ι/O commands. This requires the PCI p2p device to be compatible with the
> + * backing device for every namespace on this controller.
> + */
> +static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
> +{
> +	struct nvmet_ns *ns;
> +	int ret;
> +
> +	if (!req->port->use_p2pmem || !req->p2p_client)
> +		return;
> +
> +	mutex_lock(&ctrl->subsys->lock);
> +
> +	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
> +	if (ret) {
> +		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
> +		       dev_name(req->p2p_client), ret);
> +		goto free_devices;
> +	}
> +
> +	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
> +		ret = nvmet_p2pdma_add_client(ctrl, ns);
> +		if (ret)
> +			goto free_devices;
> +	}
> +
> +	if (req->port->p2p_dev) {
> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
> +						&ctrl->p2p_clients)) {
> +			pr_info("peer-to-peer memory on %s is not supported\n",
> +				pci_name(req->port->p2p_dev));
> +			goto free_devices;
> +		}
> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
> +	} else {

When is port->p2p_dev == NULL? a little more documentation would help 
here...
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31  0:25     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:25 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

Cj4gKy8qCj4gKyAqIElmIGFsbG93X3AycG1lbSBpcyBzZXQsIHdlIHdpbGwgdHJ5IHRvIHVzZSBQ
MlAgbWVtb3J5IGZvciB0aGUgU0dMIGxpc3RzIGZvcgo+ICsgKiDOmS9PIGNvbW1hbmRzLiBUaGlz
IHJlcXVpcmVzIHRoZSBQQ0kgcDJwIGRldmljZSB0byBiZSBjb21wYXRpYmxlIHdpdGggdGhlCj4g
KyAqIGJhY2tpbmcgZGV2aWNlIGZvciBldmVyeSBuYW1lc3BhY2Ugb24gdGhpcyBjb250cm9sbGVy
Lgo+ICsgKi8KPiArc3RhdGljIHZvaWQgbnZtZXRfc2V0dXBfcDJwbWVtKHN0cnVjdCBudm1ldF9j
dHJsICpjdHJsLCBzdHJ1Y3QgbnZtZXRfcmVxICpyZXEpCj4gK3sKPiArCXN0cnVjdCBudm1ldF9u
cyAqbnM7Cj4gKwlpbnQgcmV0Owo+ICsKPiArCWlmICghcmVxLT5wb3J0LT51c2VfcDJwbWVtIHx8
ICFyZXEtPnAycF9jbGllbnQpCj4gKwkJcmV0dXJuOwo+ICsKPiArCW11dGV4X2xvY2soJmN0cmwt
PnN1YnN5cy0+bG9jayk7Cj4gKwo+ICsJcmV0ID0gcGNpX3AycGRtYV9hZGRfY2xpZW50KCZjdHJs
LT5wMnBfY2xpZW50cywgcmVxLT5wMnBfY2xpZW50KTsKPiArCWlmIChyZXQpIHsKPiArCQlwcl9l
cnIoImZhaWxlZCBhZGRpbmcgcGVlci10by1wZWVyIERNQSBjbGllbnQgJXM6ICVkXG4iLAo+ICsJ
CSAgICAgICBkZXZfbmFtZShyZXEtPnAycF9jbGllbnQpLCByZXQpOwo+ICsJCWdvdG8gZnJlZV9k
ZXZpY2VzOwo+ICsJfQo+ICsKPiArCWxpc3RfZm9yX2VhY2hfZW50cnlfcmN1KG5zLCAmY3RybC0+
c3Vic3lzLT5uYW1lc3BhY2VzLCBkZXZfbGluaykgewo+ICsJCXJldCA9IG52bWV0X3AycGRtYV9h
ZGRfY2xpZW50KGN0cmwsIG5zKTsKPiArCQlpZiAocmV0KQo+ICsJCQlnb3RvIGZyZWVfZGV2aWNl
czsKPiArCX0KPiArCj4gKwlpZiAocmVxLT5wb3J0LT5wMnBfZGV2KSB7Cj4gKwkJaWYgKCFwY2lf
cDJwZG1hX2Fzc2lnbl9wcm92aWRlcihyZXEtPnBvcnQtPnAycF9kZXYsCj4gKwkJCQkJCSZjdHJs
LT5wMnBfY2xpZW50cykpIHsKPiArCQkJcHJfaW5mbygicGVlci10by1wZWVyIG1lbW9yeSBvbiAl
cyBpcyBub3Qgc3VwcG9ydGVkXG4iLAo+ICsJCQkJcGNpX25hbWUocmVxLT5wb3J0LT5wMnBfZGV2
KSk7Cj4gKwkJCWdvdG8gZnJlZV9kZXZpY2VzOwo+ICsJCX0KPiArCQljdHJsLT5wMnBfZGV2ID0g
cGNpX2Rldl9nZXQocmVxLT5wb3J0LT5wMnBfZGV2KTsKPiArCX0gZWxzZSB7CgpXaGVuIGlzIHBv
cnQtPnAycF9kZXYgPT0gTlVMTD8gYSBsaXR0bGUgbW9yZSBkb2N1bWVudGF0aW9uIHdvdWxkIGhl
bHAgCmhlcmUuLi4KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fCkxpbnV4LW52bWUgbWFpbGluZyBsaXN0CkxpbnV4LW52bWVAbGlzdHMuaW5mcmFkZWFkLm9y
ZwpodHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52bWUK

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31  0:25     ` Sagi Grimberg
  0 siblings, 0 replies; 265+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:25 UTC (permalink / raw)



> +/*
> + * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for
> + * ?/O commands. This requires the PCI p2p device to be compatible with the
> + * backing device for every namespace on this controller.
> + */
> +static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
> +{
> +	struct nvmet_ns *ns;
> +	int ret;
> +
> +	if (!req->port->use_p2pmem || !req->p2p_client)
> +		return;
> +
> +	mutex_lock(&ctrl->subsys->lock);
> +
> +	ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client);
> +	if (ret) {
> +		pr_err("failed adding peer-to-peer DMA client %s: %d\n",
> +		       dev_name(req->p2p_client), ret);
> +		goto free_devices;
> +	}
> +
> +	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
> +		ret = nvmet_p2pdma_add_client(ctrl, ns);
> +		if (ret)
> +			goto free_devices;
> +	}
> +
> +	if (req->port->p2p_dev) {
> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
> +						&ctrl->p2p_clients)) {
> +			pr_info("peer-to-peer memory on %s is not supported\n",
> +				pci_name(req->port->p2p_dev));
> +			goto free_devices;
> +		}
> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
> +	} else {

When is port->p2p_dev == NULL? a little more documentation would help 
here...

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-30 18:53   ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-31  0:34     ` Randy Dunlap
  -1 siblings, 0 replies; 265+ messages in thread
From: Randy Dunlap @ 2018-08-31  0:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Hi,

I have a few comments below...

On 08/30/2018 11:53 AM, Logan Gunthorpe wrote:
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/driver-api/pci/index.rst  |   1 +
>  Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>  create mode 100644 Documentation/driver-api/pci/p2pdma.rst

> diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
> new file mode 100644
> index 000000000000..ac857450d53f
> --- /dev/null
> +++ b/Documentation/driver-api/pci/p2pdma.rst
> @@ -0,0 +1,170 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the

                                            transactions

> +hierarchy will be routable, but it does not require routing
> +between hierarchies.
> +
> +The second issue is that to make use of existing interfaces in Linux,
> +memory that is used for P2P transactions needs to be backed by struct
> +pages. However, PCI BARs are not typically cache coherent so there are
> +a few corner case gotchas with these pages so developers need to
> +be careful about what they do with them.
> +
> +
> +Driver Writer's Guide
> +=====================
> +
> +In a given P2P implementation there may be three or more different
> +types of kernel drivers in play:
> +
> +* Provider - A driver which provides or publishes P2P resources like
> +  memory or doorbell registers to other drivers.
> +* Client - A driver which makes use of a resource by setting up a
> +  DMA transaction to or from it.
> +* Orchestrator - A driver which orchestrates the flow of data between
> +  clients and providers

Might as well end that last one with a period since the other 2 are.

> +
> +In many cases there could be overlap between these three types (i.e.,
> +it may be typical for a driver to be both a provider and a client).
> +

[snip]

> +
> +Orchestrator Drivers
> +--------------------
> +
> +The first task an orchestrator driver must do is compile a list of
> +all client devices that will be involved in a given transaction. For
> +example, the NVMe Target driver creates a list including all NVMe
> +devices and the RNIC in use. The list is stored as an anonymous struct
> +list_head which must be initialized with the usual INIT_LIST_HEAD.
> +The following functions may then be used to add to, remove from and free
> +the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
> +:c:func:`pci_p2pdma_remove_client()` and
> +:c:func:`pci_p2pdma_client_list_free()`.
> +
> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
> +that is supported (behind the same root port) as all the clients. If more
> +than one provider is supported, the one nearest to all the clients will
> +be chosen first. If there are more than one provider is an equal distance
> +away, the one returned will be chosen at random. This function returns the PCI

random or just arbitrarily?

> +device to use for the provider with a reference taken and therefore
> +when it's no longer needed it should be returned with pci_dev_put().


thanks,
-- 
~Randy
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  0:34     ` Randy Dunlap
  0 siblings, 0 replies; 265+ messages in thread
From: Randy Dunlap @ 2018-08-31  0:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Jonathan Corbet

Hi,

I have a few comments below...

On 08/30/2018 11:53 AM, Logan Gunthorpe wrote:
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/driver-api/pci/index.rst  |   1 +
>  Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>  create mode 100644 Documentation/driver-api/pci/p2pdma.rst

> diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
> new file mode 100644
> index 000000000000..ac857450d53f
> --- /dev/null
> +++ b/Documentation/driver-api/pci/p2pdma.rst
> @@ -0,0 +1,170 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the

                                            transactions

> +hierarchy will be routable, but it does not require routing
> +between hierarchies.
> +
> +The second issue is that to make use of existing interfaces in Linux,
> +memory that is used for P2P transactions needs to be backed by struct
> +pages. However, PCI BARs are not typically cache coherent so there are
> +a few corner case gotchas with these pages so developers need to
> +be careful about what they do with them.
> +
> +
> +Driver Writer's Guide
> +=====================
> +
> +In a given P2P implementation there may be three or more different
> +types of kernel drivers in play:
> +
> +* Provider - A driver which provides or publishes P2P resources like
> +  memory or doorbell registers to other drivers.
> +* Client - A driver which makes use of a resource by setting up a
> +  DMA transaction to or from it.
> +* Orchestrator - A driver which orchestrates the flow of data between
> +  clients and providers

Might as well end that last one with a period since the other 2 are.

> +
> +In many cases there could be overlap between these three types (i.e.,
> +it may be typical for a driver to be both a provider and a client).
> +

[snip]

> +
> +Orchestrator Drivers
> +--------------------
> +
> +The first task an orchestrator driver must do is compile a list of
> +all client devices that will be involved in a given transaction. For
> +example, the NVMe Target driver creates a list including all NVMe
> +devices and the RNIC in use. The list is stored as an anonymous struct
> +list_head which must be initialized with the usual INIT_LIST_HEAD.
> +The following functions may then be used to add to, remove from and free
> +the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
> +:c:func:`pci_p2pdma_remove_client()` and
> +:c:func:`pci_p2pdma_client_list_free()`.
> +
> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
> +that is supported (behind the same root port) as all the clients. If more
> +than one provider is supported, the one nearest to all the clients will
> +be chosen first. If there are more than one provider is an equal distance
> +away, the one returned will be chosen at random. This function returns the PCI

random or just arbitrarily?

> +device to use for the provider with a reference taken and therefore
> +when it's no longer needed it should be returned with pci_dev_put().


thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  0:34     ` Randy Dunlap
  0 siblings, 0 replies; 265+ messages in thread
From: Randy Dunlap @ 2018-08-31  0:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Jonathan Corbet, Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

Hi,

I have a few comments below...

On 08/30/2018 11:53 AM, Logan Gunthorpe wrote:
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/driver-api/pci/index.rst  |   1 +
>  Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>  create mode 100644 Documentation/driver-api/pci/p2pdma.rst

> diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
> new file mode 100644
> index 000000000000..ac857450d53f
> --- /dev/null
> +++ b/Documentation/driver-api/pci/p2pdma.rst
> @@ -0,0 +1,170 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the

                                            transactions

> +hierarchy will be routable, but it does not require routing
> +between hierarchies.
> +
> +The second issue is that to make use of existing interfaces in Linux,
> +memory that is used for P2P transactions needs to be backed by struct
> +pages. However, PCI BARs are not typically cache coherent so there are
> +a few corner case gotchas with these pages so developers need to
> +be careful about what they do with them.
> +
> +
> +Driver Writer's Guide
> +=====================
> +
> +In a given P2P implementation there may be three or more different
> +types of kernel drivers in play:
> +
> +* Provider - A driver which provides or publishes P2P resources like
> +  memory or doorbell registers to other drivers.
> +* Client - A driver which makes use of a resource by setting up a
> +  DMA transaction to or from it.
> +* Orchestrator - A driver which orchestrates the flow of data between
> +  clients and providers

Might as well end that last one with a period since the other 2 are.

> +
> +In many cases there could be overlap between these three types (i.e.,
> +it may be typical for a driver to be both a provider and a client).
> +

[snip]

> +
> +Orchestrator Drivers
> +--------------------
> +
> +The first task an orchestrator driver must do is compile a list of
> +all client devices that will be involved in a given transaction. For
> +example, the NVMe Target driver creates a list including all NVMe
> +devices and the RNIC in use. The list is stored as an anonymous struct
> +list_head which must be initialized with the usual INIT_LIST_HEAD.
> +The following functions may then be used to add to, remove from and free
> +the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
> +:c:func:`pci_p2pdma_remove_client()` and
> +:c:func:`pci_p2pdma_client_list_free()`.
> +
> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
> +that is supported (behind the same root port) as all the clients. If more
> +than one provider is supported, the one nearest to all the clients will
> +be chosen first. If there are more than one provider is an equal distance
> +away, the one returned will be chosen at random. This function returns the PCI

random or just arbitrarily?

> +device to use for the provider with a reference taken and therefore
> +when it's no longer needed it should be returned with pci_dev_put().


thanks,
-- 
~Randy

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  0:34     ` Randy Dunlap
  0 siblings, 0 replies; 265+ messages in thread
From: Randy Dunlap @ 2018-08-31  0:34 UTC (permalink / raw)


Hi,

I have a few comments below...

On 08/30/2018 11:53 AM, Logan Gunthorpe wrote:
> 
> Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
> Cc: Jonathan Corbet <corbet at lwn.net>
> ---
>  Documentation/driver-api/pci/index.rst  |   1 +
>  Documentation/driver-api/pci/p2pdma.rst | 170 ++++++++++++++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>  create mode 100644 Documentation/driver-api/pci/p2pdma.rst

> diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
> new file mode 100644
> index 000000000000..ac857450d53f
> --- /dev/null
> +++ b/Documentation/driver-api/pci/p2pdma.rst
> @@ -0,0 +1,170 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the

                                            transactions

> +hierarchy will be routable, but it does not require routing
> +between hierarchies.
> +
> +The second issue is that to make use of existing interfaces in Linux,
> +memory that is used for P2P transactions needs to be backed by struct
> +pages. However, PCI BARs are not typically cache coherent so there are
> +a few corner case gotchas with these pages so developers need to
> +be careful about what they do with them.
> +
> +
> +Driver Writer's Guide
> +=====================
> +
> +In a given P2P implementation there may be three or more different
> +types of kernel drivers in play:
> +
> +* Provider - A driver which provides or publishes P2P resources like
> +  memory or doorbell registers to other drivers.
> +* Client - A driver which makes use of a resource by setting up a
> +  DMA transaction to or from it.
> +* Orchestrator - A driver which orchestrates the flow of data between
> +  clients and providers

Might as well end that last one with a period since the other 2 are.

> +
> +In many cases there could be overlap between these three types (i.e.,
> +it may be typical for a driver to be both a provider and a client).
> +

[snip]

> +
> +Orchestrator Drivers
> +--------------------
> +
> +The first task an orchestrator driver must do is compile a list of
> +all client devices that will be involved in a given transaction. For
> +example, the NVMe Target driver creates a list including all NVMe
> +devices and the RNIC in use. The list is stored as an anonymous struct
> +list_head which must be initialized with the usual INIT_LIST_HEAD.
> +The following functions may then be used to add to, remove from and free
> +the list of clients with the functions :c:func:`pci_p2pdma_add_client()`,
> +:c:func:`pci_p2pdma_remove_client()` and
> +:c:func:`pci_p2pdma_client_list_free()`.
> +
> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
> +that is supported (behind the same root port) as all the clients. If more
> +than one provider is supported, the one nearest to all the clients will
> +be chosen first. If there are more than one provider is an equal distance
> +away, the one returned will be chosen at random. This function returns the PCI

random or just arbitrarily?

> +device to use for the provider with a reference taken and therefore
> +when it's no longer needed it should be returned with pci_dev_put().


thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-30 18:53   ` Logan Gunthorpe
  (?)
  (?)
@ 2018-08-31  8:04     ` Christian König
  -1 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:04 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Benjamin Herrenschmidt, Alex Williamson, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.

How hard would it be to separate enabling P2P detection (e.g. distance 
between two devices) from this?

Regards,
Christian.

>
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
>
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
>
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
>
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
>
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
>
> This commit includes significant rework and feedback from Christoph
> Hellwig.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/Kconfig        |  17 +
>   drivers/pci/Makefile       |   1 +
>   drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/memremap.h   |   5 +
>   include/linux/mm.h         |  18 ++
>   include/linux/pci-p2pdma.h | 102 ++++++
>   include/linux/pci.h        |   4 +
>   7 files changed, 908 insertions(+)
>   create mode 100644 drivers/pci/p2pdma.c
>   create mode 100644 include/linux/pci-p2pdma.h
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 56ff8f6d31fc..deb68be4fdac 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -132,6 +132,23 @@ config PCI_PASID
>   
>   	  If unsure, say N.
>   
> +config PCI_P2PDMA
> +	bool "PCI peer-to-peer transfer support"
> +	depends on PCI && ZONE_DEVICE
> +	select GENERIC_ALLOCATOR
> +	help
> +	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
> +	  BARs that are exposed in other devices that are the part of
> +	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +	  specification to work (ie. anything below a single PCI bridge).
> +
> +	  Many PCIe root complexes do not support P2P transactions and
> +	  it's hard to tell which support it at all, so at this time,
> +	  P2P DMA transations must be between devices behind the same root
> +	  port.
> +
> +	  If unsure, say N.
> +
>   config PCI_LABEL
>   	def_bool y if (DMI || ACPI)
>   	depends on PCI
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 1b2cfe51e8d7..85f4a703b2be 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
>   obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
>   obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
>   obj-$(CONFIG_PCI_ECAM)		+= ecam.o
> +obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
>   obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>   
>   # Endpoint library must be initialized before its users
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index 000000000000..88aaec5351cd
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,761 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + */
> +
> +#define pr_fmt(fmt) "pci-p2pdma: " fmt
> +#include <linux/pci-p2pdma.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/genalloc.h>
> +#include <linux/memremap.h>
> +#include <linux/percpu-refcount.h>
> +#include <linux/random.h>
> +#include <linux/seq_buf.h>
> +
> +struct pci_p2pdma {
> +	struct percpu_ref devmap_ref;
> +	struct completion devmap_ref_done;
> +	struct gen_pool *pool;
> +	bool p2pmem_published;
> +};
> +
> +static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
> +{
> +	struct pci_p2pdma *p2p =
> +		container_of(ref, struct pci_p2pdma, devmap_ref);
> +
> +	complete_all(&p2p->devmap_ref_done);
> +}
> +
> +static void pci_p2pdma_percpu_kill(void *data)
> +{
> +	struct percpu_ref *ref = data;
> +
> +	if (percpu_ref_is_dying(ref))
> +		return;
> +
> +	percpu_ref_kill(ref);
> +}
> +
> +static void pci_p2pdma_release(void *data)
> +{
> +	struct pci_dev *pdev = data;
> +
> +	if (!pdev->p2pdma)
> +		return;
> +
> +	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
> +	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
> +
> +	gen_pool_destroy(pdev->p2pdma->pool);
> +	pdev->p2pdma = NULL;
> +}
> +
> +static int pci_p2pdma_setup(struct pci_dev *pdev)
> +{
> +	int error = -ENOMEM;
> +	struct pci_p2pdma *p2p;
> +
> +	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
> +	if (!p2p)
> +		return -ENOMEM;
> +
> +	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
> +	if (!p2p->pool)
> +		goto out;
> +
> +	init_completion(&p2p->devmap_ref_done);
> +	error = percpu_ref_init(&p2p->devmap_ref,
> +			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	pdev->p2pdma = p2p;
> +
> +	return 0;
> +
> +out_pool_destroy:
> +	gen_pool_destroy(p2p->pool);
> +out:
> +	devm_kfree(&pdev->dev, p2p);
> +	return error;
> +}
> +
> +/**
> + * pci_p2pdma_add_resource - add memory for use as p2p memory
> + * @pdev: the device to add the memory to
> + * @bar: PCI BAR to add
> + * @size: size of the memory to add, may be zero to use the whole BAR
> + * @offset: offset into the PCI BAR
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any DMA request.
> + */
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +			    u64 offset)
> +{
> +	struct dev_pagemap *pgmap;
> +	void *addr;
> +	int error;
> +
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> +		return -EINVAL;
> +
> +	if (offset >= pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!size)
> +		size = pci_resource_len(pdev, bar) - offset;
> +
> +	if (size + offset > pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!pdev->p2pdma) {
> +		error = pci_p2pdma_setup(pdev);
> +		if (error)
> +			return error;
> +	}
> +
> +	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> +	if (!pgmap)
> +		return -ENOMEM;
> +
> +	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> +	pgmap->res.end = pgmap->res.start + size - 1;
> +	pgmap->res.flags = pci_resource_flags(pdev, bar);
> +	pgmap->ref = &pdev->p2pdma->devmap_ref;
> +	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
> +
> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
> +	if (IS_ERR(addr)) {
> +		error = PTR_ERR(addr);
> +		goto pgmap_free;
> +	}
> +
> +	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
> +			pci_bus_address(pdev, bar) + offset,
> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> +	if (error)
> +		goto pgmap_free;
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
> +					  &pdev->p2pdma->devmap_ref);
> +	if (error)
> +		goto pgmap_free;
> +
> +	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
> +		 &pgmap->res);
> +
> +	return 0;
> +
> +pgmap_free:
> +	devres_free(pgmap);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
> +
> +static struct pci_dev *find_parent_pci_dev(struct device *dev)
> +{
> +	struct device *parent;
> +
> +	dev = get_device(dev);
> +
> +	while (dev) {
> +		if (dev_is_pci(dev))
> +			return to_pci_dev(dev);
> +
> +		parent = get_device(dev->parent);
> +		put_device(dev);
> +		dev = parent;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
> + * TLPs upstream via ACS. Returns 1 if the packets will be redirected
> + * upstream, 0 otherwise.
> + */
> +static int pci_bridge_has_acs_redir(struct pci_dev *dev)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return 0;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
> +{
> +	if (!buf)
> +		return;
> +
> +	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
> +		       dev->bus->number, PCI_SLOT(dev->devfn),
> +		       PCI_FUNC(dev->devfn));
> +}
> +
> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.
> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */
> +static int upstream_bridge_distance(struct pci_dev *a,
> +				    struct pci_dev *b,
> +				    struct seq_buf *acs_list)
> +{
> +	int dist_a = 0;
> +	int dist_b = 0;
> +	struct pci_dev *bb = NULL;
> +	int acs_cnt = 0;
> +
> +	/*
> +	 * Note, we don't need to take references to devices returned by
> +	 * pci_upstream_bridge() seeing we hold a reference to a child
> +	 * device which will already hold a reference to the upstream bridge.
> +	 */
> +
> +	while (a) {
> +		dist_b = 0;
> +
> +		if (pci_bridge_has_acs_redir(a)) {
> +			seq_buf_print_bus_devfn(acs_list, a);
> +			acs_cnt++;
> +		}
> +
> +		bb = b;
> +
> +		while (bb) {
> +			if (a == bb)
> +				goto check_b_path_acs;
> +
> +			bb = pci_upstream_bridge(bb);
> +			dist_b++;
> +		}
> +
> +		a = pci_upstream_bridge(a);
> +		dist_a++;
> +	}
> +
> +	return -1;
> +
> +check_b_path_acs:
> +	bb = b;
> +
> +	while (bb) {
> +		if (a == bb)
> +			break;
> +
> +		if (pci_bridge_has_acs_redir(bb)) {
> +			seq_buf_print_bus_devfn(acs_list, bb);
> +			acs_cnt++;
> +		}
> +
> +		bb = pci_upstream_bridge(bb);
> +	}
> +
> +	if (acs_cnt)
> +		return -2;
> +
> +	return dist_a + dist_b;
> +}
> +
> +static int upstream_bridge_distance_warn(struct pci_dev *provider,
> +					 struct pci_dev *client)
> +{
> +	struct seq_buf acs_list;
> +	int ret;
> +
> +	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> +
> +	ret = upstream_bridge_distance(provider, client, &acs_list);
> +	if (ret == -2) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
> +		/* Drop final semicolon */
> +		acs_list.buffer[acs_list.len-1] = 0;
> +		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +			 acs_list.buffer);
> +
> +	} else if (ret < 0) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
> +	}
> +
> +	kfree(acs_list.buffer);
> +
> +	return ret;
> +}
> +
> +struct pci_p2pdma_client {
> +	struct list_head list;
> +	struct pci_dev *client;
> +	struct pci_dev *provider;
> +};
> +
> +/**
> + * pci_p2pdma_add_client - allocate a new element in a client device list
> + * @head: list head of p2pdma clients
> + * @dev: device to add to the list
> + *
> + * This adds @dev to a list of clients used by a p2pdma device.
> + * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
> + * been called successfully, the list will be bound to a specific p2pdma
> + * device and new clients can only be added to the list if they are
> + * supported by that p2pdma device.
> + *
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + *
> + * Returns 0 if the client was successfully added.
> + */
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *item, *new_item;
> +	struct pci_dev *provider = NULL;
> +	struct pci_dev *client;
> +	int ret;
> +
> +	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> +		return -ENODEV;
> +	}
> +
> +	client = find_parent_pci_dev(dev);
> +	if (!client) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
> +		return -ENODEV;
> +	}
> +
> +	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
> +	if (item && item->provider) {
> +		provider = item->provider;
> +
> +		ret = upstream_bridge_distance_warn(provider, client);
> +		if (ret < 0) {
> +			ret = -EXDEV;
> +			goto put_client;
> +		}
> +	}
> +
> +	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
> +	if (!new_item) {
> +		ret = -ENOMEM;
> +		goto put_client;
> +	}
> +
> +	new_item->client = client;
> +	new_item->provider = pci_dev_get(provider);
> +
> +	list_add_tail(&new_item->list, head);
> +
> +	return 0;
> +
> +put_client:
> +	pci_dev_put(client);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
> +
> +static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
> +{
> +	list_del(&item->list);
> +	pci_dev_put(item->client);
> +	pci_dev_put(item->provider);
> +	kfree(item);
> +}
> +
> +/**
> + * pci_p2pdma_remove_client - remove and free a p2pdma client
> + * @head: list head of p2pdma clients
> + * @dev: device to remove from the list
> + *
> + * This removes @dev from a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +	struct pci_dev *pdev;
> +
> +	pdev = find_parent_pci_dev(dev);
> +	if (!pdev)
> +		return;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		if (pos->client != pdev)
> +			continue;
> +
> +		pci_p2pdma_client_free(pos);
> +	}
> +
> +	pci_dev_put(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
> +
> +/**
> + * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
> + * @head: list head of p2pdma clients
> + *
> + * This removes all devices in a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2pdma functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list)
> +		pci_p2pdma_client_free(pos);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
> +
> +/**
> + * pci_p2pdma_distance - Determive the cumulative distance between
> + *	a p2pdma provider and the clients in use.
> + * @provider: p2pdma provider to check against the client list
> + * @clients: list of devices to check (NULL-terminated)
> + * @verbose: if true, print warnings for devices when we return -1
> + *
> + * Returns -1 if any of the clients are not compatible (behind the same
> + * root port as the provider), otherwise returns a positive number where
> + * the lower number is the preferrable choice. (If there's one client
> + * that's the same as the provider it will return 0, which is best choice).
> + *
> + * For now, "compatible" means the provider and the clients are all behind
> + * the same PCI root port. This cuts out cases that may work but is safest
> + * for the user. Future work can expand this to white-list root complexes that
> + * can safely forward between each ports.
> + */
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose)
> +{
> +	struct pci_p2pdma_client *pos;
> +	int ret;
> +	int distance = 0;
> +	bool not_supported = false;
> +
> +	if (list_empty(clients))
> +		return -1;
> +
> +	list_for_each_entry(pos, clients, list) {
> +		if (verbose)
> +			ret = upstream_bridge_distance_warn(provider,
> +							    pos->client);
> +		else
> +			ret = upstream_bridge_distance(provider, pos->client,
> +						       NULL);
> +
> +		if (ret < 0)
> +			not_supported = true;
> +
> +		if (not_supported && !verbose)
> +			break;
> +
> +		distance += ret;
> +	}
> +
> +	if (not_supported)
> +		return -1;
> +
> +	return distance;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
> +
> +/**
> + * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
> + *	and assign a provider to a list of clients
> + * @provider: p2pdma provider to assign to the client list
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * Returns false if any of the clients are not compatible, true if the
> + * provider was successfully assigned to the clients.
> + */
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients)
> +{
> +	struct pci_p2pdma_client *pos;
> +
> +	if (pci_p2pdma_distance(provider, clients, true) < 0)
> +		return false;
> +
> +	list_for_each_entry(pos, clients, list)
> +		pos->provider = provider;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
> +
> +/**
> + * pci_has_p2pmem - check if a given PCI device has published any p2pmem
> + * @pdev: PCI device to check
> + */
> +bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
> +}
> +EXPORT_SYMBOL_GPL(pci_has_p2pmem);
> +
> +/**
> + * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
> + *	the specified list of clients and shortest distance (as determined
> + *	by pci_p2pmem_dma())
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * If multiple devices are behind the same switch, the one "closest" to the
> + * client devices in use will be chosen first. (So if one of the providers are
> + * the same as one of the clients, that provider will be used ahead of any
> + * other providers that are unrelated). If multiple providers are an equal
> + * distance away, one will be chosen at random.
> + *
> + * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
> + * to return the reference) or NULL if no compatible device is found. The
> + * found provider will also be assigned to the client list.
> + */
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct pci_p2pdma_client *pos;
> +	int distance;
> +	int closest_distance = INT_MAX;
> +	struct pci_dev **closest_pdevs;
> +	int dev_cnt = 0;
> +	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
> +	int i;
> +
> +	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
> +		if (!pci_has_p2pmem(pdev))
> +			continue;
> +
> +		distance = pci_p2pdma_distance(pdev, clients, false);
> +		if (distance < 0 || distance > closest_distance)
> +			continue;
> +
> +		if (distance == closest_distance && dev_cnt >= max_devs)
> +			continue;
> +
> +		if (distance < closest_distance) {
> +			for (i = 0; i < dev_cnt; i++)
> +				pci_dev_put(closest_pdevs[i]);
> +
> +			dev_cnt = 0;
> +			closest_distance = distance;
> +		}
> +
> +		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
> +	}
> +
> +	if (dev_cnt)
> +		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
> +
> +	for (i = 0; i < dev_cnt; i++)
> +		pci_dev_put(closest_pdevs[i]);
> +
> +	if (pdev)
> +		list_for_each_entry(pos, clients, list)
> +			pos->provider = pdev;
> +
> +	kfree(closest_pdevs);
> +	return pdev;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_find);
> +
> +/**
> + * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
> + * @pdev: the device to allocate memory from
> + * @size: number of bytes to allocate
> + *
> + * Returns the allocated memory or NULL on error.
> + */
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	void *ret;
> +
> +	if (unlikely(!pdev->p2pdma))
> +		return NULL;
> +
> +	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
> +		return NULL;
> +
> +	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
> +
> +	if (unlikely(!ret))
> +		percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
> +
> +/**
> + * pci_free_p2pmem - allocate peer-to-peer DMA memory
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + * @size: number of bytes that was allocated
> + */
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
> +{
> +	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
> +	percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +}
> +EXPORT_SYMBOL_GPL(pci_free_p2pmem);
> +
> +/**
> + * pci_virt_to_bus - return the PCI bus address for a given virtual
> + *	address obtained with pci_alloc_p2pmem()
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> +	if (!addr)
> +		return 0;
> +	if (!pdev->p2pdma)
> +		return 0;
> +
> +	/*
> +	 * Note: when we added the memory to the pool we used the PCI
> +	 * bus address as the physical address. So gen_pool_virt_to_phys()
> +	 * actually returns the bus address despite the misleading name.
> +	 */
> +	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
> +
> +/**
> + * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + * @length: number of bytes to allocate
> + *
> + * Returns 0 on success
> + */
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length)
> +{
> +	struct scatterlist *sg;
> +	void *addr;
> +
> +	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
> +	if (!sg)
> +		return NULL;
> +
> +	sg_init_table(sg, 1);
> +
> +	addr = pci_alloc_p2pmem(pdev, length);
> +	if (!addr)
> +		goto out_free_sg;
> +
> +	sg_set_buf(sg, addr, length);
> +	*nents = 1;
> +	return sg;
> +
> +out_free_sg:
> +	kfree(sg);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
> +
> +/**
> + * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + */
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
> +{
> +	struct scatterlist *sg;
> +	int count;
> +
> +	for_each_sg(sgl, sg, INT_MAX, count) {
> +		if (!sg)
> +			break;
> +
> +		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
> +	}
> +	kfree(sgl);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
> +
> +/**
> + * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
> + *	other devices with pci_p2pmem_find()
> + * @pdev: the device with peer-to-peer DMA memory to publish
> + * @publish: set to true to publish the memory, false to unpublish it
> + *
> + * Published memory can be used by other PCI device drivers for
> + * peer-2-peer DMA operations. Non-published memory is reserved for
> + * exlusive use of the device driver that registers the peer-to-peer
> + * memory.
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +	if (publish && !pdev->p2pdma)
> +		return;
> +
> +	pdev->p2pdma->p2pmem_published = publish;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index f91f9e763557..9553370ebdad 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -53,11 +53,16 @@ struct vmem_altmap {
>    * wakeup event whenever a page is unpinned and becomes idle. This
>    * wakeup is used to coordinate physical address space management (ex:
>    * fs truncate/hole punch) vs pinned pages (ex: device dma).
> + *
> + * MEMORY_DEVICE_PCI_P2PDMA:
> + * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
> + * transactions.
>    */
>   enum memory_type {
>   	MEMORY_DEVICE_PRIVATE = 1,
>   	MEMORY_DEVICE_PUBLIC,
>   	MEMORY_DEVICE_FS_DAX,
> +	MEMORY_DEVICE_PCI_P2PDMA,
>   };
>   
>   /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..2055df412a77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
>   		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>   }
>   
> +#ifdef CONFIG_PCI_P2PDMA
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return is_zone_device_page(page) &&
> +		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
> +}
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +
>   #else /* CONFIG_DEV_PAGEMAP_OPS */
>   static inline void dev_pagemap_get_ops(void)
>   {
> @@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
>   {
>   	return false;
>   }
> +
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
>   #endif /* CONFIG_DEV_PAGEMAP_OPS */
>   
>   static inline void get_page(struct page *page)
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> new file mode 100644
> index 000000000000..7b2b0f547528
> --- /dev/null
> +++ b/include/linux/pci-p2pdma.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + *
> + */
> +
> +#ifndef _LINUX_PCI_P2PDMA_H
> +#define _LINUX_PCI_P2PDMA_H
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +		u64 offset);
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_client_list_free(struct list_head *head);
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose);
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients);
> +bool pci_has_p2pmem(struct pci_dev *pdev);
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients);
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length);
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
> +		size_t size, u64 offset)
> +{
> +	return -EOPNOTSUPP;
> +}
> +static inline int pci_p2pdma_add_client(struct list_head *head,
> +		struct device *dev)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_remove_client(struct list_head *head,
> +		struct device *dev)
> +{
> +}
> +static inline void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +}
> +static inline int pci_p2pdma_distance(struct pci_dev *provider,
> +				      struct list_head *clients,
> +				      bool verbose)
> +{
> +	return -1;
> +}
> +static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +					      struct list_head *clients)
> +{
> +	return false;
> +}
> +static inline bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	return NULL;
> +}
> +static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	return NULL;
> +}
> +static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
> +		size_t size)
> +{
> +}
> +static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
> +						    void *addr)
> +{
> +	return 0;
> +}
> +static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +		unsigned int *nents, u32 length)
> +{
> +	return NULL;
> +}
> +static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
> +		struct scatterlist *sgl)
> +{
> +}
> +static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +#endif /* _LINUX_PCI_P2P_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..5d95dbf21f4a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -281,6 +281,7 @@ struct pcie_link_state;
>   struct pci_vpd;
>   struct pci_sriov;
>   struct pci_ats;
> +struct pci_p2pdma;
>   
>   /* The pci_dev structure describes PCI devices */
>   struct pci_dev {
> @@ -439,6 +440,9 @@ struct pci_dev {
>   #ifdef CONFIG_PCI_PASID
>   	u16		pasid_features;
>   #endif
> +#ifdef CONFIG_PCI_P2PDMA
> +	struct pci_p2pdma *p2pdma;
> +#endif
>   	phys_addr_t	rom;		/* Physical address if not from BAR */
>   	size_t		romlen;		/* Length if not from BAR */
>   	char		*driver_override; /* Driver name to force a match */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31  8:04     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:04 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.

How hard would it be to separate enabling P2P detection (e.g. distance 
between two devices) from this?

Regards,
Christian.

>
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
>
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
>
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
>
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
>
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
>
> This commit includes significant rework and feedback from Christoph
> Hellwig.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/Kconfig        |  17 +
>   drivers/pci/Makefile       |   1 +
>   drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/memremap.h   |   5 +
>   include/linux/mm.h         |  18 ++
>   include/linux/pci-p2pdma.h | 102 ++++++
>   include/linux/pci.h        |   4 +
>   7 files changed, 908 insertions(+)
>   create mode 100644 drivers/pci/p2pdma.c
>   create mode 100644 include/linux/pci-p2pdma.h
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 56ff8f6d31fc..deb68be4fdac 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -132,6 +132,23 @@ config PCI_PASID
>   
>   	  If unsure, say N.
>   
> +config PCI_P2PDMA
> +	bool "PCI peer-to-peer transfer support"
> +	depends on PCI && ZONE_DEVICE
> +	select GENERIC_ALLOCATOR
> +	help
> +	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
> +	  BARs that are exposed in other devices that are the part of
> +	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +	  specification to work (ie. anything below a single PCI bridge).
> +
> +	  Many PCIe root complexes do not support P2P transactions and
> +	  it's hard to tell which support it at all, so at this time,
> +	  P2P DMA transations must be between devices behind the same root
> +	  port.
> +
> +	  If unsure, say N.
> +
>   config PCI_LABEL
>   	def_bool y if (DMI || ACPI)
>   	depends on PCI
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 1b2cfe51e8d7..85f4a703b2be 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
>   obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
>   obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
>   obj-$(CONFIG_PCI_ECAM)		+= ecam.o
> +obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
>   obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>   
>   # Endpoint library must be initialized before its users
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index 000000000000..88aaec5351cd
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,761 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + */
> +
> +#define pr_fmt(fmt) "pci-p2pdma: " fmt
> +#include <linux/pci-p2pdma.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/genalloc.h>
> +#include <linux/memremap.h>
> +#include <linux/percpu-refcount.h>
> +#include <linux/random.h>
> +#include <linux/seq_buf.h>
> +
> +struct pci_p2pdma {
> +	struct percpu_ref devmap_ref;
> +	struct completion devmap_ref_done;
> +	struct gen_pool *pool;
> +	bool p2pmem_published;
> +};
> +
> +static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
> +{
> +	struct pci_p2pdma *p2p =
> +		container_of(ref, struct pci_p2pdma, devmap_ref);
> +
> +	complete_all(&p2p->devmap_ref_done);
> +}
> +
> +static void pci_p2pdma_percpu_kill(void *data)
> +{
> +	struct percpu_ref *ref = data;
> +
> +	if (percpu_ref_is_dying(ref))
> +		return;
> +
> +	percpu_ref_kill(ref);
> +}
> +
> +static void pci_p2pdma_release(void *data)
> +{
> +	struct pci_dev *pdev = data;
> +
> +	if (!pdev->p2pdma)
> +		return;
> +
> +	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
> +	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
> +
> +	gen_pool_destroy(pdev->p2pdma->pool);
> +	pdev->p2pdma = NULL;
> +}
> +
> +static int pci_p2pdma_setup(struct pci_dev *pdev)
> +{
> +	int error = -ENOMEM;
> +	struct pci_p2pdma *p2p;
> +
> +	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
> +	if (!p2p)
> +		return -ENOMEM;
> +
> +	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
> +	if (!p2p->pool)
> +		goto out;
> +
> +	init_completion(&p2p->devmap_ref_done);
> +	error = percpu_ref_init(&p2p->devmap_ref,
> +			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	pdev->p2pdma = p2p;
> +
> +	return 0;
> +
> +out_pool_destroy:
> +	gen_pool_destroy(p2p->pool);
> +out:
> +	devm_kfree(&pdev->dev, p2p);
> +	return error;
> +}
> +
> +/**
> + * pci_p2pdma_add_resource - add memory for use as p2p memory
> + * @pdev: the device to add the memory to
> + * @bar: PCI BAR to add
> + * @size: size of the memory to add, may be zero to use the whole BAR
> + * @offset: offset into the PCI BAR
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any DMA request.
> + */
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +			    u64 offset)
> +{
> +	struct dev_pagemap *pgmap;
> +	void *addr;
> +	int error;
> +
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> +		return -EINVAL;
> +
> +	if (offset >= pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!size)
> +		size = pci_resource_len(pdev, bar) - offset;
> +
> +	if (size + offset > pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!pdev->p2pdma) {
> +		error = pci_p2pdma_setup(pdev);
> +		if (error)
> +			return error;
> +	}
> +
> +	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> +	if (!pgmap)
> +		return -ENOMEM;
> +
> +	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> +	pgmap->res.end = pgmap->res.start + size - 1;
> +	pgmap->res.flags = pci_resource_flags(pdev, bar);
> +	pgmap->ref = &pdev->p2pdma->devmap_ref;
> +	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
> +
> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
> +	if (IS_ERR(addr)) {
> +		error = PTR_ERR(addr);
> +		goto pgmap_free;
> +	}
> +
> +	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
> +			pci_bus_address(pdev, bar) + offset,
> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> +	if (error)
> +		goto pgmap_free;
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
> +					  &pdev->p2pdma->devmap_ref);
> +	if (error)
> +		goto pgmap_free;
> +
> +	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
> +		 &pgmap->res);
> +
> +	return 0;
> +
> +pgmap_free:
> +	devres_free(pgmap);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
> +
> +static struct pci_dev *find_parent_pci_dev(struct device *dev)
> +{
> +	struct device *parent;
> +
> +	dev = get_device(dev);
> +
> +	while (dev) {
> +		if (dev_is_pci(dev))
> +			return to_pci_dev(dev);
> +
> +		parent = get_device(dev->parent);
> +		put_device(dev);
> +		dev = parent;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
> + * TLPs upstream via ACS. Returns 1 if the packets will be redirected
> + * upstream, 0 otherwise.
> + */
> +static int pci_bridge_has_acs_redir(struct pci_dev *dev)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return 0;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
> +{
> +	if (!buf)
> +		return;
> +
> +	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
> +		       dev->bus->number, PCI_SLOT(dev->devfn),
> +		       PCI_FUNC(dev->devfn));
> +}
> +
> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.
> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */
> +static int upstream_bridge_distance(struct pci_dev *a,
> +				    struct pci_dev *b,
> +				    struct seq_buf *acs_list)
> +{
> +	int dist_a = 0;
> +	int dist_b = 0;
> +	struct pci_dev *bb = NULL;
> +	int acs_cnt = 0;
> +
> +	/*
> +	 * Note, we don't need to take references to devices returned by
> +	 * pci_upstream_bridge() seeing we hold a reference to a child
> +	 * device which will already hold a reference to the upstream bridge.
> +	 */
> +
> +	while (a) {
> +		dist_b = 0;
> +
> +		if (pci_bridge_has_acs_redir(a)) {
> +			seq_buf_print_bus_devfn(acs_list, a);
> +			acs_cnt++;
> +		}
> +
> +		bb = b;
> +
> +		while (bb) {
> +			if (a == bb)
> +				goto check_b_path_acs;
> +
> +			bb = pci_upstream_bridge(bb);
> +			dist_b++;
> +		}
> +
> +		a = pci_upstream_bridge(a);
> +		dist_a++;
> +	}
> +
> +	return -1;
> +
> +check_b_path_acs:
> +	bb = b;
> +
> +	while (bb) {
> +		if (a == bb)
> +			break;
> +
> +		if (pci_bridge_has_acs_redir(bb)) {
> +			seq_buf_print_bus_devfn(acs_list, bb);
> +			acs_cnt++;
> +		}
> +
> +		bb = pci_upstream_bridge(bb);
> +	}
> +
> +	if (acs_cnt)
> +		return -2;
> +
> +	return dist_a + dist_b;
> +}
> +
> +static int upstream_bridge_distance_warn(struct pci_dev *provider,
> +					 struct pci_dev *client)
> +{
> +	struct seq_buf acs_list;
> +	int ret;
> +
> +	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> +
> +	ret = upstream_bridge_distance(provider, client, &acs_list);
> +	if (ret == -2) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
> +		/* Drop final semicolon */
> +		acs_list.buffer[acs_list.len-1] = 0;
> +		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +			 acs_list.buffer);
> +
> +	} else if (ret < 0) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
> +	}
> +
> +	kfree(acs_list.buffer);
> +
> +	return ret;
> +}
> +
> +struct pci_p2pdma_client {
> +	struct list_head list;
> +	struct pci_dev *client;
> +	struct pci_dev *provider;
> +};
> +
> +/**
> + * pci_p2pdma_add_client - allocate a new element in a client device list
> + * @head: list head of p2pdma clients
> + * @dev: device to add to the list
> + *
> + * This adds @dev to a list of clients used by a p2pdma device.
> + * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
> + * been called successfully, the list will be bound to a specific p2pdma
> + * device and new clients can only be added to the list if they are
> + * supported by that p2pdma device.
> + *
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + *
> + * Returns 0 if the client was successfully added.
> + */
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *item, *new_item;
> +	struct pci_dev *provider = NULL;
> +	struct pci_dev *client;
> +	int ret;
> +
> +	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> +		return -ENODEV;
> +	}
> +
> +	client = find_parent_pci_dev(dev);
> +	if (!client) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
> +		return -ENODEV;
> +	}
> +
> +	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
> +	if (item && item->provider) {
> +		provider = item->provider;
> +
> +		ret = upstream_bridge_distance_warn(provider, client);
> +		if (ret < 0) {
> +			ret = -EXDEV;
> +			goto put_client;
> +		}
> +	}
> +
> +	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
> +	if (!new_item) {
> +		ret = -ENOMEM;
> +		goto put_client;
> +	}
> +
> +	new_item->client = client;
> +	new_item->provider = pci_dev_get(provider);
> +
> +	list_add_tail(&new_item->list, head);
> +
> +	return 0;
> +
> +put_client:
> +	pci_dev_put(client);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
> +
> +static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
> +{
> +	list_del(&item->list);
> +	pci_dev_put(item->client);
> +	pci_dev_put(item->provider);
> +	kfree(item);
> +}
> +
> +/**
> + * pci_p2pdma_remove_client - remove and free a p2pdma client
> + * @head: list head of p2pdma clients
> + * @dev: device to remove from the list
> + *
> + * This removes @dev from a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +	struct pci_dev *pdev;
> +
> +	pdev = find_parent_pci_dev(dev);
> +	if (!pdev)
> +		return;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		if (pos->client != pdev)
> +			continue;
> +
> +		pci_p2pdma_client_free(pos);
> +	}
> +
> +	pci_dev_put(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
> +
> +/**
> + * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
> + * @head: list head of p2pdma clients
> + *
> + * This removes all devices in a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2pdma functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list)
> +		pci_p2pdma_client_free(pos);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
> +
> +/**
> + * pci_p2pdma_distance - Determive the cumulative distance between
> + *	a p2pdma provider and the clients in use.
> + * @provider: p2pdma provider to check against the client list
> + * @clients: list of devices to check (NULL-terminated)
> + * @verbose: if true, print warnings for devices when we return -1
> + *
> + * Returns -1 if any of the clients are not compatible (behind the same
> + * root port as the provider), otherwise returns a positive number where
> + * the lower number is the preferrable choice. (If there's one client
> + * that's the same as the provider it will return 0, which is best choice).
> + *
> + * For now, "compatible" means the provider and the clients are all behind
> + * the same PCI root port. This cuts out cases that may work but is safest
> + * for the user. Future work can expand this to white-list root complexes that
> + * can safely forward between each ports.
> + */
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose)
> +{
> +	struct pci_p2pdma_client *pos;
> +	int ret;
> +	int distance = 0;
> +	bool not_supported = false;
> +
> +	if (list_empty(clients))
> +		return -1;
> +
> +	list_for_each_entry(pos, clients, list) {
> +		if (verbose)
> +			ret = upstream_bridge_distance_warn(provider,
> +							    pos->client);
> +		else
> +			ret = upstream_bridge_distance(provider, pos->client,
> +						       NULL);
> +
> +		if (ret < 0)
> +			not_supported = true;
> +
> +		if (not_supported && !verbose)
> +			break;
> +
> +		distance += ret;
> +	}
> +
> +	if (not_supported)
> +		return -1;
> +
> +	return distance;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
> +
> +/**
> + * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
> + *	and assign a provider to a list of clients
> + * @provider: p2pdma provider to assign to the client list
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * Returns false if any of the clients are not compatible, true if the
> + * provider was successfully assigned to the clients.
> + */
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients)
> +{
> +	struct pci_p2pdma_client *pos;
> +
> +	if (pci_p2pdma_distance(provider, clients, true) < 0)
> +		return false;
> +
> +	list_for_each_entry(pos, clients, list)
> +		pos->provider = provider;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
> +
> +/**
> + * pci_has_p2pmem - check if a given PCI device has published any p2pmem
> + * @pdev: PCI device to check
> + */
> +bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
> +}
> +EXPORT_SYMBOL_GPL(pci_has_p2pmem);
> +
> +/**
> + * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
> + *	the specified list of clients and shortest distance (as determined
> + *	by pci_p2pmem_dma())
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * If multiple devices are behind the same switch, the one "closest" to the
> + * client devices in use will be chosen first. (So if one of the providers are
> + * the same as one of the clients, that provider will be used ahead of any
> + * other providers that are unrelated). If multiple providers are an equal
> + * distance away, one will be chosen at random.
> + *
> + * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
> + * to return the reference) or NULL if no compatible device is found. The
> + * found provider will also be assigned to the client list.
> + */
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct pci_p2pdma_client *pos;
> +	int distance;
> +	int closest_distance = INT_MAX;
> +	struct pci_dev **closest_pdevs;
> +	int dev_cnt = 0;
> +	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
> +	int i;
> +
> +	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
> +		if (!pci_has_p2pmem(pdev))
> +			continue;
> +
> +		distance = pci_p2pdma_distance(pdev, clients, false);
> +		if (distance < 0 || distance > closest_distance)
> +			continue;
> +
> +		if (distance == closest_distance && dev_cnt >= max_devs)
> +			continue;
> +
> +		if (distance < closest_distance) {
> +			for (i = 0; i < dev_cnt; i++)
> +				pci_dev_put(closest_pdevs[i]);
> +
> +			dev_cnt = 0;
> +			closest_distance = distance;
> +		}
> +
> +		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
> +	}
> +
> +	if (dev_cnt)
> +		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
> +
> +	for (i = 0; i < dev_cnt; i++)
> +		pci_dev_put(closest_pdevs[i]);
> +
> +	if (pdev)
> +		list_for_each_entry(pos, clients, list)
> +			pos->provider = pdev;
> +
> +	kfree(closest_pdevs);
> +	return pdev;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_find);
> +
> +/**
> + * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
> + * @pdev: the device to allocate memory from
> + * @size: number of bytes to allocate
> + *
> + * Returns the allocated memory or NULL on error.
> + */
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	void *ret;
> +
> +	if (unlikely(!pdev->p2pdma))
> +		return NULL;
> +
> +	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
> +		return NULL;
> +
> +	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
> +
> +	if (unlikely(!ret))
> +		percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
> +
> +/**
> + * pci_free_p2pmem - allocate peer-to-peer DMA memory
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + * @size: number of bytes that was allocated
> + */
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
> +{
> +	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
> +	percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +}
> +EXPORT_SYMBOL_GPL(pci_free_p2pmem);
> +
> +/**
> + * pci_virt_to_bus - return the PCI bus address for a given virtual
> + *	address obtained with pci_alloc_p2pmem()
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> +	if (!addr)
> +		return 0;
> +	if (!pdev->p2pdma)
> +		return 0;
> +
> +	/*
> +	 * Note: when we added the memory to the pool we used the PCI
> +	 * bus address as the physical address. So gen_pool_virt_to_phys()
> +	 * actually returns the bus address despite the misleading name.
> +	 */
> +	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
> +
> +/**
> + * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + * @length: number of bytes to allocate
> + *
> + * Returns 0 on success
> + */
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length)
> +{
> +	struct scatterlist *sg;
> +	void *addr;
> +
> +	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
> +	if (!sg)
> +		return NULL;
> +
> +	sg_init_table(sg, 1);
> +
> +	addr = pci_alloc_p2pmem(pdev, length);
> +	if (!addr)
> +		goto out_free_sg;
> +
> +	sg_set_buf(sg, addr, length);
> +	*nents = 1;
> +	return sg;
> +
> +out_free_sg:
> +	kfree(sg);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
> +
> +/**
> + * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + */
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
> +{
> +	struct scatterlist *sg;
> +	int count;
> +
> +	for_each_sg(sgl, sg, INT_MAX, count) {
> +		if (!sg)
> +			break;
> +
> +		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
> +	}
> +	kfree(sgl);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
> +
> +/**
> + * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
> + *	other devices with pci_p2pmem_find()
> + * @pdev: the device with peer-to-peer DMA memory to publish
> + * @publish: set to true to publish the memory, false to unpublish it
> + *
> + * Published memory can be used by other PCI device drivers for
> + * peer-2-peer DMA operations. Non-published memory is reserved for
> + * exlusive use of the device driver that registers the peer-to-peer
> + * memory.
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +	if (publish && !pdev->p2pdma)
> +		return;
> +
> +	pdev->p2pdma->p2pmem_published = publish;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index f91f9e763557..9553370ebdad 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -53,11 +53,16 @@ struct vmem_altmap {
>    * wakeup event whenever a page is unpinned and becomes idle. This
>    * wakeup is used to coordinate physical address space management (ex:
>    * fs truncate/hole punch) vs pinned pages (ex: device dma).
> + *
> + * MEMORY_DEVICE_PCI_P2PDMA:
> + * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
> + * transactions.
>    */
>   enum memory_type {
>   	MEMORY_DEVICE_PRIVATE = 1,
>   	MEMORY_DEVICE_PUBLIC,
>   	MEMORY_DEVICE_FS_DAX,
> +	MEMORY_DEVICE_PCI_P2PDMA,
>   };
>   
>   /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..2055df412a77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
>   		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>   }
>   
> +#ifdef CONFIG_PCI_P2PDMA
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return is_zone_device_page(page) &&
> +		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
> +}
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +
>   #else /* CONFIG_DEV_PAGEMAP_OPS */
>   static inline void dev_pagemap_get_ops(void)
>   {
> @@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
>   {
>   	return false;
>   }
> +
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
>   #endif /* CONFIG_DEV_PAGEMAP_OPS */
>   
>   static inline void get_page(struct page *page)
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> new file mode 100644
> index 000000000000..7b2b0f547528
> --- /dev/null
> +++ b/include/linux/pci-p2pdma.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + *
> + */
> +
> +#ifndef _LINUX_PCI_P2PDMA_H
> +#define _LINUX_PCI_P2PDMA_H
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +		u64 offset);
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_client_list_free(struct list_head *head);
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose);
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients);
> +bool pci_has_p2pmem(struct pci_dev *pdev);
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients);
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length);
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
> +		size_t size, u64 offset)
> +{
> +	return -EOPNOTSUPP;
> +}
> +static inline int pci_p2pdma_add_client(struct list_head *head,
> +		struct device *dev)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_remove_client(struct list_head *head,
> +		struct device *dev)
> +{
> +}
> +static inline void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +}
> +static inline int pci_p2pdma_distance(struct pci_dev *provider,
> +				      struct list_head *clients,
> +				      bool verbose)
> +{
> +	return -1;
> +}
> +static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +					      struct list_head *clients)
> +{
> +	return false;
> +}
> +static inline bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	return NULL;
> +}
> +static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	return NULL;
> +}
> +static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
> +		size_t size)
> +{
> +}
> +static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
> +						    void *addr)
> +{
> +	return 0;
> +}
> +static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +		unsigned int *nents, u32 length)
> +{
> +	return NULL;
> +}
> +static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
> +		struct scatterlist *sgl)
> +{
> +}
> +static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +#endif /* _LINUX_PCI_P2P_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..5d95dbf21f4a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -281,6 +281,7 @@ struct pcie_link_state;
>   struct pci_vpd;
>   struct pci_sriov;
>   struct pci_ats;
> +struct pci_p2pdma;
>   
>   /* The pci_dev structure describes PCI devices */
>   struct pci_dev {
> @@ -439,6 +440,9 @@ struct pci_dev {
>   #ifdef CONFIG_PCI_PASID
>   	u16		pasid_features;
>   #endif
> +#ifdef CONFIG_PCI_P2PDMA
> +	struct pci_p2pdma *p2pdma;
> +#endif
>   	phys_addr_t	rom;		/* Physical address if not from BAR */
>   	size_t		romlen;		/* Length if not from BAR */
>   	char		*driver_override; /* Driver name to force a match */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31  8:04     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:04 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Benjamin Herrenschmidt, Alex Williamson, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.

How hard would it be to separate enabling P2P detection (e.g. distance 
between two devices) from this?

Regards,
Christian.

>
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
>
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
>
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
>
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
>
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
>
> This commit includes significant rework and feedback from Christoph
> Hellwig.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/Kconfig        |  17 +
>   drivers/pci/Makefile       |   1 +
>   drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/memremap.h   |   5 +
>   include/linux/mm.h         |  18 ++
>   include/linux/pci-p2pdma.h | 102 ++++++
>   include/linux/pci.h        |   4 +
>   7 files changed, 908 insertions(+)
>   create mode 100644 drivers/pci/p2pdma.c
>   create mode 100644 include/linux/pci-p2pdma.h
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 56ff8f6d31fc..deb68be4fdac 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -132,6 +132,23 @@ config PCI_PASID
>   
>   	  If unsure, say N.
>   
> +config PCI_P2PDMA
> +	bool "PCI peer-to-peer transfer support"
> +	depends on PCI && ZONE_DEVICE
> +	select GENERIC_ALLOCATOR
> +	help
> +	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
> +	  BARs that are exposed in other devices that are the part of
> +	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +	  specification to work (ie. anything below a single PCI bridge).
> +
> +	  Many PCIe root complexes do not support P2P transactions and
> +	  it's hard to tell which support it at all, so at this time,
> +	  P2P DMA transations must be between devices behind the same root
> +	  port.
> +
> +	  If unsure, say N.
> +
>   config PCI_LABEL
>   	def_bool y if (DMI || ACPI)
>   	depends on PCI
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 1b2cfe51e8d7..85f4a703b2be 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
>   obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
>   obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
>   obj-$(CONFIG_PCI_ECAM)		+= ecam.o
> +obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
>   obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>   
>   # Endpoint library must be initialized before its users
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index 000000000000..88aaec5351cd
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,761 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + */
> +
> +#define pr_fmt(fmt) "pci-p2pdma: " fmt
> +#include <linux/pci-p2pdma.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/genalloc.h>
> +#include <linux/memremap.h>
> +#include <linux/percpu-refcount.h>
> +#include <linux/random.h>
> +#include <linux/seq_buf.h>
> +
> +struct pci_p2pdma {
> +	struct percpu_ref devmap_ref;
> +	struct completion devmap_ref_done;
> +	struct gen_pool *pool;
> +	bool p2pmem_published;
> +};
> +
> +static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
> +{
> +	struct pci_p2pdma *p2p =
> +		container_of(ref, struct pci_p2pdma, devmap_ref);
> +
> +	complete_all(&p2p->devmap_ref_done);
> +}
> +
> +static void pci_p2pdma_percpu_kill(void *data)
> +{
> +	struct percpu_ref *ref = data;
> +
> +	if (percpu_ref_is_dying(ref))
> +		return;
> +
> +	percpu_ref_kill(ref);
> +}
> +
> +static void pci_p2pdma_release(void *data)
> +{
> +	struct pci_dev *pdev = data;
> +
> +	if (!pdev->p2pdma)
> +		return;
> +
> +	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
> +	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
> +
> +	gen_pool_destroy(pdev->p2pdma->pool);
> +	pdev->p2pdma = NULL;
> +}
> +
> +static int pci_p2pdma_setup(struct pci_dev *pdev)
> +{
> +	int error = -ENOMEM;
> +	struct pci_p2pdma *p2p;
> +
> +	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
> +	if (!p2p)
> +		return -ENOMEM;
> +
> +	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
> +	if (!p2p->pool)
> +		goto out;
> +
> +	init_completion(&p2p->devmap_ref_done);
> +	error = percpu_ref_init(&p2p->devmap_ref,
> +			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	pdev->p2pdma = p2p;
> +
> +	return 0;
> +
> +out_pool_destroy:
> +	gen_pool_destroy(p2p->pool);
> +out:
> +	devm_kfree(&pdev->dev, p2p);
> +	return error;
> +}
> +
> +/**
> + * pci_p2pdma_add_resource - add memory for use as p2p memory
> + * @pdev: the device to add the memory to
> + * @bar: PCI BAR to add
> + * @size: size of the memory to add, may be zero to use the whole BAR
> + * @offset: offset into the PCI BAR
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any DMA request.
> + */
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +			    u64 offset)
> +{
> +	struct dev_pagemap *pgmap;
> +	void *addr;
> +	int error;
> +
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> +		return -EINVAL;
> +
> +	if (offset >= pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!size)
> +		size = pci_resource_len(pdev, bar) - offset;
> +
> +	if (size + offset > pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!pdev->p2pdma) {
> +		error = pci_p2pdma_setup(pdev);
> +		if (error)
> +			return error;
> +	}
> +
> +	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> +	if (!pgmap)
> +		return -ENOMEM;
> +
> +	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> +	pgmap->res.end = pgmap->res.start + size - 1;
> +	pgmap->res.flags = pci_resource_flags(pdev, bar);
> +	pgmap->ref = &pdev->p2pdma->devmap_ref;
> +	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
> +
> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
> +	if (IS_ERR(addr)) {
> +		error = PTR_ERR(addr);
> +		goto pgmap_free;
> +	}
> +
> +	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
> +			pci_bus_address(pdev, bar) + offset,
> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> +	if (error)
> +		goto pgmap_free;
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
> +					  &pdev->p2pdma->devmap_ref);
> +	if (error)
> +		goto pgmap_free;
> +
> +	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
> +		 &pgmap->res);
> +
> +	return 0;
> +
> +pgmap_free:
> +	devres_free(pgmap);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
> +
> +static struct pci_dev *find_parent_pci_dev(struct device *dev)
> +{
> +	struct device *parent;
> +
> +	dev = get_device(dev);
> +
> +	while (dev) {
> +		if (dev_is_pci(dev))
> +			return to_pci_dev(dev);
> +
> +		parent = get_device(dev->parent);
> +		put_device(dev);
> +		dev = parent;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
> + * TLPs upstream via ACS. Returns 1 if the packets will be redirected
> + * upstream, 0 otherwise.
> + */
> +static int pci_bridge_has_acs_redir(struct pci_dev *dev)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return 0;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
> +{
> +	if (!buf)
> +		return;
> +
> +	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
> +		       dev->bus->number, PCI_SLOT(dev->devfn),
> +		       PCI_FUNC(dev->devfn));
> +}
> +
> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.
> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */
> +static int upstream_bridge_distance(struct pci_dev *a,
> +				    struct pci_dev *b,
> +				    struct seq_buf *acs_list)
> +{
> +	int dist_a = 0;
> +	int dist_b = 0;
> +	struct pci_dev *bb = NULL;
> +	int acs_cnt = 0;
> +
> +	/*
> +	 * Note, we don't need to take references to devices returned by
> +	 * pci_upstream_bridge() seeing we hold a reference to a child
> +	 * device which will already hold a reference to the upstream bridge.
> +	 */
> +
> +	while (a) {
> +		dist_b = 0;
> +
> +		if (pci_bridge_has_acs_redir(a)) {
> +			seq_buf_print_bus_devfn(acs_list, a);
> +			acs_cnt++;
> +		}
> +
> +		bb = b;
> +
> +		while (bb) {
> +			if (a == bb)
> +				goto check_b_path_acs;
> +
> +			bb = pci_upstream_bridge(bb);
> +			dist_b++;
> +		}
> +
> +		a = pci_upstream_bridge(a);
> +		dist_a++;
> +	}
> +
> +	return -1;
> +
> +check_b_path_acs:
> +	bb = b;
> +
> +	while (bb) {
> +		if (a == bb)
> +			break;
> +
> +		if (pci_bridge_has_acs_redir(bb)) {
> +			seq_buf_print_bus_devfn(acs_list, bb);
> +			acs_cnt++;
> +		}
> +
> +		bb = pci_upstream_bridge(bb);
> +	}
> +
> +	if (acs_cnt)
> +		return -2;
> +
> +	return dist_a + dist_b;
> +}
> +
> +static int upstream_bridge_distance_warn(struct pci_dev *provider,
> +					 struct pci_dev *client)
> +{
> +	struct seq_buf acs_list;
> +	int ret;
> +
> +	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> +
> +	ret = upstream_bridge_distance(provider, client, &acs_list);
> +	if (ret == -2) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
> +		/* Drop final semicolon */
> +		acs_list.buffer[acs_list.len-1] = 0;
> +		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +			 acs_list.buffer);
> +
> +	} else if (ret < 0) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
> +	}
> +
> +	kfree(acs_list.buffer);
> +
> +	return ret;
> +}
> +
> +struct pci_p2pdma_client {
> +	struct list_head list;
> +	struct pci_dev *client;
> +	struct pci_dev *provider;
> +};
> +
> +/**
> + * pci_p2pdma_add_client - allocate a new element in a client device list
> + * @head: list head of p2pdma clients
> + * @dev: device to add to the list
> + *
> + * This adds @dev to a list of clients used by a p2pdma device.
> + * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
> + * been called successfully, the list will be bound to a specific p2pdma
> + * device and new clients can only be added to the list if they are
> + * supported by that p2pdma device.
> + *
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + *
> + * Returns 0 if the client was successfully added.
> + */
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *item, *new_item;
> +	struct pci_dev *provider = NULL;
> +	struct pci_dev *client;
> +	int ret;
> +
> +	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> +		return -ENODEV;
> +	}
> +
> +	client = find_parent_pci_dev(dev);
> +	if (!client) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
> +		return -ENODEV;
> +	}
> +
> +	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
> +	if (item && item->provider) {
> +		provider = item->provider;
> +
> +		ret = upstream_bridge_distance_warn(provider, client);
> +		if (ret < 0) {
> +			ret = -EXDEV;
> +			goto put_client;
> +		}
> +	}
> +
> +	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
> +	if (!new_item) {
> +		ret = -ENOMEM;
> +		goto put_client;
> +	}
> +
> +	new_item->client = client;
> +	new_item->provider = pci_dev_get(provider);
> +
> +	list_add_tail(&new_item->list, head);
> +
> +	return 0;
> +
> +put_client:
> +	pci_dev_put(client);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
> +
> +static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
> +{
> +	list_del(&item->list);
> +	pci_dev_put(item->client);
> +	pci_dev_put(item->provider);
> +	kfree(item);
> +}
> +
> +/**
> + * pci_p2pdma_remove_client - remove and free a p2pdma client
> + * @head: list head of p2pdma clients
> + * @dev: device to remove from the list
> + *
> + * This removes @dev from a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +	struct pci_dev *pdev;
> +
> +	pdev = find_parent_pci_dev(dev);
> +	if (!pdev)
> +		return;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		if (pos->client != pdev)
> +			continue;
> +
> +		pci_p2pdma_client_free(pos);
> +	}
> +
> +	pci_dev_put(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
> +
> +/**
> + * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
> + * @head: list head of p2pdma clients
> + *
> + * This removes all devices in a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2pdma functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list)
> +		pci_p2pdma_client_free(pos);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
> +
> +/**
> + * pci_p2pdma_distance - Determive the cumulative distance between
> + *	a p2pdma provider and the clients in use.
> + * @provider: p2pdma provider to check against the client list
> + * @clients: list of devices to check (NULL-terminated)
> + * @verbose: if true, print warnings for devices when we return -1
> + *
> + * Returns -1 if any of the clients are not compatible (behind the same
> + * root port as the provider), otherwise returns a positive number where
> + * the lower number is the preferrable choice. (If there's one client
> + * that's the same as the provider it will return 0, which is best choice).
> + *
> + * For now, "compatible" means the provider and the clients are all behind
> + * the same PCI root port. This cuts out cases that may work but is safest
> + * for the user. Future work can expand this to white-list root complexes that
> + * can safely forward between each ports.
> + */
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose)
> +{
> +	struct pci_p2pdma_client *pos;
> +	int ret;
> +	int distance = 0;
> +	bool not_supported = false;
> +
> +	if (list_empty(clients))
> +		return -1;
> +
> +	list_for_each_entry(pos, clients, list) {
> +		if (verbose)
> +			ret = upstream_bridge_distance_warn(provider,
> +							    pos->client);
> +		else
> +			ret = upstream_bridge_distance(provider, pos->client,
> +						       NULL);
> +
> +		if (ret < 0)
> +			not_supported = true;
> +
> +		if (not_supported && !verbose)
> +			break;
> +
> +		distance += ret;
> +	}
> +
> +	if (not_supported)
> +		return -1;
> +
> +	return distance;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
> +
> +/**
> + * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
> + *	and assign a provider to a list of clients
> + * @provider: p2pdma provider to assign to the client list
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * Returns false if any of the clients are not compatible, true if the
> + * provider was successfully assigned to the clients.
> + */
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients)
> +{
> +	struct pci_p2pdma_client *pos;
> +
> +	if (pci_p2pdma_distance(provider, clients, true) < 0)
> +		return false;
> +
> +	list_for_each_entry(pos, clients, list)
> +		pos->provider = provider;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
> +
> +/**
> + * pci_has_p2pmem - check if a given PCI device has published any p2pmem
> + * @pdev: PCI device to check
> + */
> +bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
> +}
> +EXPORT_SYMBOL_GPL(pci_has_p2pmem);
> +
> +/**
> + * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
> + *	the specified list of clients and shortest distance (as determined
> + *	by pci_p2pmem_dma())
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * If multiple devices are behind the same switch, the one "closest" to the
> + * client devices in use will be chosen first. (So if one of the providers are
> + * the same as one of the clients, that provider will be used ahead of any
> + * other providers that are unrelated). If multiple providers are an equal
> + * distance away, one will be chosen at random.
> + *
> + * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
> + * to return the reference) or NULL if no compatible device is found. The
> + * found provider will also be assigned to the client list.
> + */
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct pci_p2pdma_client *pos;
> +	int distance;
> +	int closest_distance = INT_MAX;
> +	struct pci_dev **closest_pdevs;
> +	int dev_cnt = 0;
> +	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
> +	int i;
> +
> +	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
> +		if (!pci_has_p2pmem(pdev))
> +			continue;
> +
> +		distance = pci_p2pdma_distance(pdev, clients, false);
> +		if (distance < 0 || distance > closest_distance)
> +			continue;
> +
> +		if (distance == closest_distance && dev_cnt >= max_devs)
> +			continue;
> +
> +		if (distance < closest_distance) {
> +			for (i = 0; i < dev_cnt; i++)
> +				pci_dev_put(closest_pdevs[i]);
> +
> +			dev_cnt = 0;
> +			closest_distance = distance;
> +		}
> +
> +		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
> +	}
> +
> +	if (dev_cnt)
> +		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
> +
> +	for (i = 0; i < dev_cnt; i++)
> +		pci_dev_put(closest_pdevs[i]);
> +
> +	if (pdev)
> +		list_for_each_entry(pos, clients, list)
> +			pos->provider = pdev;
> +
> +	kfree(closest_pdevs);
> +	return pdev;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_find);
> +
> +/**
> + * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
> + * @pdev: the device to allocate memory from
> + * @size: number of bytes to allocate
> + *
> + * Returns the allocated memory or NULL on error.
> + */
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	void *ret;
> +
> +	if (unlikely(!pdev->p2pdma))
> +		return NULL;
> +
> +	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
> +		return NULL;
> +
> +	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
> +
> +	if (unlikely(!ret))
> +		percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
> +
> +/**
> + * pci_free_p2pmem - allocate peer-to-peer DMA memory
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + * @size: number of bytes that was allocated
> + */
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
> +{
> +	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
> +	percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +}
> +EXPORT_SYMBOL_GPL(pci_free_p2pmem);
> +
> +/**
> + * pci_virt_to_bus - return the PCI bus address for a given virtual
> + *	address obtained with pci_alloc_p2pmem()
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> +	if (!addr)
> +		return 0;
> +	if (!pdev->p2pdma)
> +		return 0;
> +
> +	/*
> +	 * Note: when we added the memory to the pool we used the PCI
> +	 * bus address as the physical address. So gen_pool_virt_to_phys()
> +	 * actually returns the bus address despite the misleading name.
> +	 */
> +	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
> +
> +/**
> + * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + * @length: number of bytes to allocate
> + *
> + * Returns 0 on success
> + */
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length)
> +{
> +	struct scatterlist *sg;
> +	void *addr;
> +
> +	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
> +	if (!sg)
> +		return NULL;
> +
> +	sg_init_table(sg, 1);
> +
> +	addr = pci_alloc_p2pmem(pdev, length);
> +	if (!addr)
> +		goto out_free_sg;
> +
> +	sg_set_buf(sg, addr, length);
> +	*nents = 1;
> +	return sg;
> +
> +out_free_sg:
> +	kfree(sg);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
> +
> +/**
> + * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + */
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
> +{
> +	struct scatterlist *sg;
> +	int count;
> +
> +	for_each_sg(sgl, sg, INT_MAX, count) {
> +		if (!sg)
> +			break;
> +
> +		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
> +	}
> +	kfree(sgl);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
> +
> +/**
> + * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
> + *	other devices with pci_p2pmem_find()
> + * @pdev: the device with peer-to-peer DMA memory to publish
> + * @publish: set to true to publish the memory, false to unpublish it
> + *
> + * Published memory can be used by other PCI device drivers for
> + * peer-2-peer DMA operations. Non-published memory is reserved for
> + * exlusive use of the device driver that registers the peer-to-peer
> + * memory.
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +	if (publish && !pdev->p2pdma)
> +		return;
> +
> +	pdev->p2pdma->p2pmem_published = publish;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index f91f9e763557..9553370ebdad 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -53,11 +53,16 @@ struct vmem_altmap {
>    * wakeup event whenever a page is unpinned and becomes idle. This
>    * wakeup is used to coordinate physical address space management (ex:
>    * fs truncate/hole punch) vs pinned pages (ex: device dma).
> + *
> + * MEMORY_DEVICE_PCI_P2PDMA:
> + * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
> + * transactions.
>    */
>   enum memory_type {
>   	MEMORY_DEVICE_PRIVATE = 1,
>   	MEMORY_DEVICE_PUBLIC,
>   	MEMORY_DEVICE_FS_DAX,
> +	MEMORY_DEVICE_PCI_P2PDMA,
>   };
>   
>   /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..2055df412a77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
>   		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>   }
>   
> +#ifdef CONFIG_PCI_P2PDMA
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return is_zone_device_page(page) &&
> +		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
> +}
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +
>   #else /* CONFIG_DEV_PAGEMAP_OPS */
>   static inline void dev_pagemap_get_ops(void)
>   {
> @@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
>   {
>   	return false;
>   }
> +
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
>   #endif /* CONFIG_DEV_PAGEMAP_OPS */
>   
>   static inline void get_page(struct page *page)
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> new file mode 100644
> index 000000000000..7b2b0f547528
> --- /dev/null
> +++ b/include/linux/pci-p2pdma.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + *
> + */
> +
> +#ifndef _LINUX_PCI_P2PDMA_H
> +#define _LINUX_PCI_P2PDMA_H
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +		u64 offset);
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_client_list_free(struct list_head *head);
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose);
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients);
> +bool pci_has_p2pmem(struct pci_dev *pdev);
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients);
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length);
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
> +		size_t size, u64 offset)
> +{
> +	return -EOPNOTSUPP;
> +}
> +static inline int pci_p2pdma_add_client(struct list_head *head,
> +		struct device *dev)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_remove_client(struct list_head *head,
> +		struct device *dev)
> +{
> +}
> +static inline void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +}
> +static inline int pci_p2pdma_distance(struct pci_dev *provider,
> +				      struct list_head *clients,
> +				      bool verbose)
> +{
> +	return -1;
> +}
> +static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +					      struct list_head *clients)
> +{
> +	return false;
> +}
> +static inline bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	return NULL;
> +}
> +static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	return NULL;
> +}
> +static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
> +		size_t size)
> +{
> +}
> +static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
> +						    void *addr)
> +{
> +	return 0;
> +}
> +static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +		unsigned int *nents, u32 length)
> +{
> +	return NULL;
> +}
> +static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
> +		struct scatterlist *sgl)
> +{
> +}
> +static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +#endif /* _LINUX_PCI_P2P_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..5d95dbf21f4a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -281,6 +281,7 @@ struct pcie_link_state;
>   struct pci_vpd;
>   struct pci_sriov;
>   struct pci_ats;
> +struct pci_p2pdma;
>   
>   /* The pci_dev structure describes PCI devices */
>   struct pci_dev {
> @@ -439,6 +440,9 @@ struct pci_dev {
>   #ifdef CONFIG_PCI_PASID
>   	u16		pasid_features;
>   #endif
> +#ifdef CONFIG_PCI_P2PDMA
> +	struct pci_p2pdma *p2pdma;
> +#endif
>   	phys_addr_t	rom;		/* Physical address if not from BAR */
>   	size_t		romlen;		/* Length if not from BAR */
>   	char		*driver_override; /* Driver name to force a match */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31  8:04     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:04 UTC (permalink / raw)


Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.

How hard would it be to separate enabling P2P detection (e.g. distance 
between two devices) from this?

Regards,
Christian.

>
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
>
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
>
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
>
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
>
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
>
> This commit includes significant rework and feedback from Christoph
> Hellwig.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
> ---
>   drivers/pci/Kconfig        |  17 +
>   drivers/pci/Makefile       |   1 +
>   drivers/pci/p2pdma.c       | 761 +++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/memremap.h   |   5 +
>   include/linux/mm.h         |  18 ++
>   include/linux/pci-p2pdma.h | 102 ++++++
>   include/linux/pci.h        |   4 +
>   7 files changed, 908 insertions(+)
>   create mode 100644 drivers/pci/p2pdma.c
>   create mode 100644 include/linux/pci-p2pdma.h
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 56ff8f6d31fc..deb68be4fdac 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -132,6 +132,23 @@ config PCI_PASID
>   
>   	  If unsure, say N.
>   
> +config PCI_P2PDMA
> +	bool "PCI peer-to-peer transfer support"
> +	depends on PCI && ZONE_DEVICE
> +	select GENERIC_ALLOCATOR
> +	help
> +	  Enable? drivers to do PCI peer-to-peer transactions to and from
> +	  BARs that are exposed in other devices that are the part of
> +	  the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +	  specification to work (ie. anything below a single PCI bridge).
> +
> +	  Many PCIe root complexes do not support P2P transactions and
> +	  it's hard to tell which support it at all, so at this time,
> +	  P2P DMA transations must be between devices behind the same root
> +	  port.
> +
> +	  If unsure, say N.
> +
>   config PCI_LABEL
>   	def_bool y if (DMI || ACPI)
>   	depends on PCI
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 1b2cfe51e8d7..85f4a703b2be 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_SYSCALL)	+= syscall.o
>   obj-$(CONFIG_PCI_STUB)		+= pci-stub.o
>   obj-$(CONFIG_PCI_PF_STUB)	+= pci-pf-stub.o
>   obj-$(CONFIG_PCI_ECAM)		+= ecam.o
> +obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
>   obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>   
>   # Endpoint library must be initialized before its users
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index 000000000000..88aaec5351cd
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,761 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + */
> +
> +#define pr_fmt(fmt) "pci-p2pdma: " fmt
> +#include <linux/pci-p2pdma.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/genalloc.h>
> +#include <linux/memremap.h>
> +#include <linux/percpu-refcount.h>
> +#include <linux/random.h>
> +#include <linux/seq_buf.h>
> +
> +struct pci_p2pdma {
> +	struct percpu_ref devmap_ref;
> +	struct completion devmap_ref_done;
> +	struct gen_pool *pool;
> +	bool p2pmem_published;
> +};
> +
> +static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
> +{
> +	struct pci_p2pdma *p2p =
> +		container_of(ref, struct pci_p2pdma, devmap_ref);
> +
> +	complete_all(&p2p->devmap_ref_done);
> +}
> +
> +static void pci_p2pdma_percpu_kill(void *data)
> +{
> +	struct percpu_ref *ref = data;
> +
> +	if (percpu_ref_is_dying(ref))
> +		return;
> +
> +	percpu_ref_kill(ref);
> +}
> +
> +static void pci_p2pdma_release(void *data)
> +{
> +	struct pci_dev *pdev = data;
> +
> +	if (!pdev->p2pdma)
> +		return;
> +
> +	wait_for_completion(&pdev->p2pdma->devmap_ref_done);
> +	percpu_ref_exit(&pdev->p2pdma->devmap_ref);
> +
> +	gen_pool_destroy(pdev->p2pdma->pool);
> +	pdev->p2pdma = NULL;
> +}
> +
> +static int pci_p2pdma_setup(struct pci_dev *pdev)
> +{
> +	int error = -ENOMEM;
> +	struct pci_p2pdma *p2p;
> +
> +	p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
> +	if (!p2p)
> +		return -ENOMEM;
> +
> +	p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
> +	if (!p2p->pool)
> +		goto out;
> +
> +	init_completion(&p2p->devmap_ref_done);
> +	error = percpu_ref_init(&p2p->devmap_ref,
> +			pci_p2pdma_percpu_release, 0, GFP_KERNEL);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref);
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
> +	if (error)
> +		goto out_pool_destroy;
> +
> +	pdev->p2pdma = p2p;
> +
> +	return 0;
> +
> +out_pool_destroy:
> +	gen_pool_destroy(p2p->pool);
> +out:
> +	devm_kfree(&pdev->dev, p2p);
> +	return error;
> +}
> +
> +/**
> + * pci_p2pdma_add_resource - add memory for use as p2p memory
> + * @pdev: the device to add the memory to
> + * @bar: PCI BAR to add
> + * @size: size of the memory to add, may be zero to use the whole BAR
> + * @offset: offset into the PCI BAR
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any DMA request.
> + */
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +			    u64 offset)
> +{
> +	struct dev_pagemap *pgmap;
> +	void *addr;
> +	int error;
> +
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> +		return -EINVAL;
> +
> +	if (offset >= pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!size)
> +		size = pci_resource_len(pdev, bar) - offset;
> +
> +	if (size + offset > pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	if (!pdev->p2pdma) {
> +		error = pci_p2pdma_setup(pdev);
> +		if (error)
> +			return error;
> +	}
> +
> +	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> +	if (!pgmap)
> +		return -ENOMEM;
> +
> +	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> +	pgmap->res.end = pgmap->res.start + size - 1;
> +	pgmap->res.flags = pci_resource_flags(pdev, bar);
> +	pgmap->ref = &pdev->p2pdma->devmap_ref;
> +	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
> +
> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
> +	if (IS_ERR(addr)) {
> +		error = PTR_ERR(addr);
> +		goto pgmap_free;
> +	}
> +
> +	error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr,
> +			pci_bus_address(pdev, bar) + offset,
> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> +	if (error)
> +		goto pgmap_free;
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
> +					  &pdev->p2pdma->devmap_ref);
> +	if (error)
> +		goto pgmap_free;
> +
> +	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
> +		 &pgmap->res);
> +
> +	return 0;
> +
> +pgmap_free:
> +	devres_free(pgmap);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
> +
> +static struct pci_dev *find_parent_pci_dev(struct device *dev)
> +{
> +	struct device *parent;
> +
> +	dev = get_device(dev);
> +
> +	while (dev) {
> +		if (dev_is_pci(dev))
> +			return to_pci_dev(dev);
> +
> +		parent = get_device(dev->parent);
> +		put_device(dev);
> +		dev = parent;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Check if a PCI bridge has it's ACS redirection bits set to redirect P2P
> + * TLPs upstream via ACS. Returns 1 if the packets will be redirected
> + * upstream, 0 otherwise.
> + */
> +static int pci_bridge_has_acs_redir(struct pci_dev *dev)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return 0;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	if (ctrl & (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *dev)
> +{
> +	if (!buf)
> +		return;
> +
> +	seq_buf_printf(buf, "%04x:%02x:%02x.%x;", pci_domain_nr(dev->bus),
> +		       dev->bus->number, PCI_SLOT(dev->devfn),
> +		       PCI_FUNC(dev->devfn));
> +}
> +
> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.
> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */
> +static int upstream_bridge_distance(struct pci_dev *a,
> +				    struct pci_dev *b,
> +				    struct seq_buf *acs_list)
> +{
> +	int dist_a = 0;
> +	int dist_b = 0;
> +	struct pci_dev *bb = NULL;
> +	int acs_cnt = 0;
> +
> +	/*
> +	 * Note, we don't need to take references to devices returned by
> +	 * pci_upstream_bridge() seeing we hold a reference to a child
> +	 * device which will already hold a reference to the upstream bridge.
> +	 */
> +
> +	while (a) {
> +		dist_b = 0;
> +
> +		if (pci_bridge_has_acs_redir(a)) {
> +			seq_buf_print_bus_devfn(acs_list, a);
> +			acs_cnt++;
> +		}
> +
> +		bb = b;
> +
> +		while (bb) {
> +			if (a == bb)
> +				goto check_b_path_acs;
> +
> +			bb = pci_upstream_bridge(bb);
> +			dist_b++;
> +		}
> +
> +		a = pci_upstream_bridge(a);
> +		dist_a++;
> +	}
> +
> +	return -1;
> +
> +check_b_path_acs:
> +	bb = b;
> +
> +	while (bb) {
> +		if (a == bb)
> +			break;
> +
> +		if (pci_bridge_has_acs_redir(bb)) {
> +			seq_buf_print_bus_devfn(acs_list, bb);
> +			acs_cnt++;
> +		}
> +
> +		bb = pci_upstream_bridge(bb);
> +	}
> +
> +	if (acs_cnt)
> +		return -2;
> +
> +	return dist_a + dist_b;
> +}
> +
> +static int upstream_bridge_distance_warn(struct pci_dev *provider,
> +					 struct pci_dev *client)
> +{
> +	struct seq_buf acs_list;
> +	int ret;
> +
> +	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> +
> +	ret = upstream_bridge_distance(provider, client, &acs_list);
> +	if (ret == -2) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as ACS redirect is set between the client and provider\n");
> +		/* Drop final semicolon */
> +		acs_list.buffer[acs_list.len-1] = 0;
> +		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +			 acs_list.buffer);
> +
> +	} else if (ret < 0) {
> +		pci_warn(client, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n");
> +	}
> +
> +	kfree(acs_list.buffer);
> +
> +	return ret;
> +}
> +
> +struct pci_p2pdma_client {
> +	struct list_head list;
> +	struct pci_dev *client;
> +	struct pci_dev *provider;
> +};
> +
> +/**
> + * pci_p2pdma_add_client - allocate a new element in a client device list
> + * @head: list head of p2pdma clients
> + * @dev: device to add to the list
> + *
> + * This adds @dev to a list of clients used by a p2pdma device.
> + * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has
> + * been called successfully, the list will be bound to a specific p2pdma
> + * device and new clients can only be added to the list if they are
> + * supported by that p2pdma device.
> + *
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + *
> + * Returns 0 if the client was successfully added.
> + */
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *item, *new_item;
> +	struct pci_dev *provider = NULL;
> +	struct pci_dev *client;
> +	int ret;
> +
> +	if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> +		return -ENODEV;
> +	}
> +
> +	client = find_parent_pci_dev(dev);
> +	if (!client) {
> +		dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n");
> +		return -ENODEV;
> +	}
> +
> +	item = list_first_entry_or_null(head, struct pci_p2pdma_client, list);
> +	if (item && item->provider) {
> +		provider = item->provider;
> +
> +		ret = upstream_bridge_distance_warn(provider, client);
> +		if (ret < 0) {
> +			ret = -EXDEV;
> +			goto put_client;
> +		}
> +	}
> +
> +	new_item = kzalloc(sizeof(*new_item), GFP_KERNEL);
> +	if (!new_item) {
> +		ret = -ENOMEM;
> +		goto put_client;
> +	}
> +
> +	new_item->client = client;
> +	new_item->provider = pci_dev_get(provider);
> +
> +	list_add_tail(&new_item->list, head);
> +
> +	return 0;
> +
> +put_client:
> +	pci_dev_put(client);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_add_client);
> +
> +static void pci_p2pdma_client_free(struct pci_p2pdma_client *item)
> +{
> +	list_del(&item->list);
> +	pci_dev_put(item->client);
> +	pci_dev_put(item->provider);
> +	kfree(item);
> +}
> +
> +/**
> + * pci_p2pdma_remove_client - remove and free a p2pdma client
> + * @head: list head of p2pdma clients
> + * @dev: device to remove from the list
> + *
> + * This removes @dev from a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2p functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +	struct pci_dev *pdev;
> +
> +	pdev = find_parent_pci_dev(dev);
> +	if (!pdev)
> +		return;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		if (pos->client != pdev)
> +			continue;
> +
> +		pci_p2pdma_client_free(pos);
> +	}
> +
> +	pci_dev_put(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client);
> +
> +/**
> + * pci_p2pdma_client_list_free - free an entire list of p2pdma clients
> + * @head: list head of p2pdma clients
> + *
> + * This removes all devices in a list of clients used by a p2pdma device.
> + * The caller is expected to have a lock which protects @head as necessary
> + * so that none of the pci_p2pdma functions can be called concurrently
> + * on that list.
> + */
> +void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +	struct pci_p2pdma_client *pos, *tmp;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list)
> +		pci_p2pdma_client_free(pos);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free);
> +
> +/**
> + * pci_p2pdma_distance - Determive the cumulative distance between
> + *	a p2pdma provider and the clients in use.
> + * @provider: p2pdma provider to check against the client list
> + * @clients: list of devices to check (NULL-terminated)
> + * @verbose: if true, print warnings for devices when we return -1
> + *
> + * Returns -1 if any of the clients are not compatible (behind the same
> + * root port as the provider), otherwise returns a positive number where
> + * the lower number is the preferrable choice. (If there's one client
> + * that's the same as the provider it will return 0, which is best choice).
> + *
> + * For now, "compatible" means the provider and the clients are all behind
> + * the same PCI root port. This cuts out cases that may work but is safest
> + * for the user. Future work can expand this to white-list root complexes that
> + * can safely forward between each ports.
> + */
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose)
> +{
> +	struct pci_p2pdma_client *pos;
> +	int ret;
> +	int distance = 0;
> +	bool not_supported = false;
> +
> +	if (list_empty(clients))
> +		return -1;
> +
> +	list_for_each_entry(pos, clients, list) {
> +		if (verbose)
> +			ret = upstream_bridge_distance_warn(provider,
> +							    pos->client);
> +		else
> +			ret = upstream_bridge_distance(provider, pos->client,
> +						       NULL);
> +
> +		if (ret < 0)
> +			not_supported = true;
> +
> +		if (not_supported && !verbose)
> +			break;
> +
> +		distance += ret;
> +	}
> +
> +	if (not_supported)
> +		return -1;
> +
> +	return distance;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_distance);
> +
> +/**
> + * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance)
> + *	and assign a provider to a list of clients
> + * @provider: p2pdma provider to assign to the client list
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * Returns false if any of the clients are not compatible, true if the
> + * provider was successfully assigned to the clients.
> + */
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients)
> +{
> +	struct pci_p2pdma_client *pos;
> +
> +	if (pci_p2pdma_distance(provider, clients, true) < 0)
> +		return false;
> +
> +	list_for_each_entry(pos, clients, list)
> +		pos->provider = provider;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider);
> +
> +/**
> + * pci_has_p2pmem - check if a given PCI device has published any p2pmem
> + * @pdev: PCI device to check
> + */
> +bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return pdev->p2pdma && pdev->p2pdma->p2pmem_published;
> +}
> +EXPORT_SYMBOL_GPL(pci_has_p2pmem);
> +
> +/**
> + * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with
> + *	the specified list of clients and shortest distance (as determined
> + *	by pci_p2pmem_dma())
> + * @clients: list of devices to check (NULL-terminated)
> + *
> + * If multiple devices are behind the same switch, the one "closest" to the
> + * client devices in use will be chosen first. (So if one of the providers are
> + * the same as one of the clients, that provider will be used ahead of any
> + * other providers that are unrelated). If multiple providers are an equal
> + * distance away, one will be chosen at random.
> + *
> + * Returns a pointer to the PCI device with a reference taken (use pci_dev_put
> + * to return the reference) or NULL if no compatible device is found. The
> + * found provider will also be assigned to the client list.
> + */
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct pci_p2pdma_client *pos;
> +	int distance;
> +	int closest_distance = INT_MAX;
> +	struct pci_dev **closest_pdevs;
> +	int dev_cnt = 0;
> +	const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs);
> +	int i;
> +
> +	closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
> +		if (!pci_has_p2pmem(pdev))
> +			continue;
> +
> +		distance = pci_p2pdma_distance(pdev, clients, false);
> +		if (distance < 0 || distance > closest_distance)
> +			continue;
> +
> +		if (distance == closest_distance && dev_cnt >= max_devs)
> +			continue;
> +
> +		if (distance < closest_distance) {
> +			for (i = 0; i < dev_cnt; i++)
> +				pci_dev_put(closest_pdevs[i]);
> +
> +			dev_cnt = 0;
> +			closest_distance = distance;
> +		}
> +
> +		closest_pdevs[dev_cnt++] = pci_dev_get(pdev);
> +	}
> +
> +	if (dev_cnt)
> +		pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]);
> +
> +	for (i = 0; i < dev_cnt; i++)
> +		pci_dev_put(closest_pdevs[i]);
> +
> +	if (pdev)
> +		list_for_each_entry(pos, clients, list)
> +			pos->provider = pdev;
> +
> +	kfree(closest_pdevs);
> +	return pdev;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_find);
> +
> +/**
> + * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory
> + * @pdev: the device to allocate memory from
> + * @size: number of bytes to allocate
> + *
> + * Returns the allocated memory or NULL on error.
> + */
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	void *ret;
> +
> +	if (unlikely(!pdev->p2pdma))
> +		return NULL;
> +
> +	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
> +		return NULL;
> +
> +	ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size);
> +
> +	if (unlikely(!ret))
> +		percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
> +
> +/**
> + * pci_free_p2pmem - allocate peer-to-peer DMA memory
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + * @size: number of bytes that was allocated
> + */
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
> +{
> +	gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size);
> +	percpu_ref_put(&pdev->p2pdma->devmap_ref);
> +}
> +EXPORT_SYMBOL_GPL(pci_free_p2pmem);
> +
> +/**
> + * pci_virt_to_bus - return the PCI bus address for a given virtual
> + *	address obtained with pci_alloc_p2pmem()
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> +	if (!addr)
> +		return 0;
> +	if (!pdev->p2pdma)
> +		return 0;
> +
> +	/*
> +	 * Note: when we added the memory to the pool we used the PCI
> +	 * bus address as the physical address. So gen_pool_virt_to_phys()
> +	 * actually returns the bus address despite the misleading name.
> +	 */
> +	return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus);
> +
> +/**
> + * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + * @length: number of bytes to allocate
> + *
> + * Returns 0 on success
> + */
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length)
> +{
> +	struct scatterlist *sg;
> +	void *addr;
> +
> +	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
> +	if (!sg)
> +		return NULL;
> +
> +	sg_init_table(sg, 1);
> +
> +	addr = pci_alloc_p2pmem(pdev, length);
> +	if (!addr)
> +		goto out_free_sg;
> +
> +	sg_set_buf(sg, addr, length);
> +	*nents = 1;
> +	return sg;
> +
> +out_free_sg:
> +	kfree(sg);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl);
> +
> +/**
> + * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl()
> + * @pdev: the device to allocate memory from
> + * @sgl: the allocated scatterlist
> + * @nents: the number of SG entries in the list
> + */
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl)
> +{
> +	struct scatterlist *sg;
> +	int count;
> +
> +	for_each_sg(sgl, sg, INT_MAX, count) {
> +		if (!sg)
> +			break;
> +
> +		pci_free_p2pmem(pdev, sg_virt(sg), sg->length);
> +	}
> +	kfree(sgl);
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
> +
> +/**
> + * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by
> + *	other devices with pci_p2pmem_find()
> + * @pdev: the device with peer-to-peer DMA memory to publish
> + * @publish: set to true to publish the memory, false to unpublish it
> + *
> + * Published memory can be used by other PCI device drivers for
> + * peer-2-peer DMA operations. Non-published memory is reserved for
> + * exlusive use of the device driver that registers the peer-to-peer
> + * memory.
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +	if (publish && !pdev->p2pdma)
> +		return;
> +
> +	pdev->p2pdma->p2pmem_published = publish;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index f91f9e763557..9553370ebdad 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -53,11 +53,16 @@ struct vmem_altmap {
>    * wakeup event whenever a page is unpinned and becomes idle. This
>    * wakeup is used to coordinate physical address space management (ex:
>    * fs truncate/hole punch) vs pinned pages (ex: device dma).
> + *
> + * MEMORY_DEVICE_PCI_P2PDMA:
> + * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
> + * transactions.
>    */
>   enum memory_type {
>   	MEMORY_DEVICE_PRIVATE = 1,
>   	MEMORY_DEVICE_PUBLIC,
>   	MEMORY_DEVICE_FS_DAX,
> +	MEMORY_DEVICE_PCI_P2PDMA,
>   };
>   
>   /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..2055df412a77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -890,6 +890,19 @@ static inline bool is_device_public_page(const struct page *page)
>   		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>   }
>   
> +#ifdef CONFIG_PCI_P2PDMA
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return is_zone_device_page(page) &&
> +		page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
> +}
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +
>   #else /* CONFIG_DEV_PAGEMAP_OPS */
>   static inline void dev_pagemap_get_ops(void)
>   {
> @@ -913,6 +926,11 @@ static inline bool is_device_public_page(const struct page *page)
>   {
>   	return false;
>   }
> +
> +static inline bool is_pci_p2pdma_page(const struct page *page)
> +{
> +	return false;
> +}
>   #endif /* CONFIG_DEV_PAGEMAP_OPS */
>   
>   static inline void get_page(struct page *page)
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> new file mode 100644
> index 000000000000..7b2b0f547528
> --- /dev/null
> +++ b/include/linux/pci-p2pdma.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.
> + *
> + * Copyright (c) 2016-2018, Logan Gunthorpe
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig
> + * Copyright (c) 2018, Eideticom Inc.
> + *
> + */
> +
> +#ifndef _LINUX_PCI_P2PDMA_H
> +#define _LINUX_PCI_P2PDMA_H
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +		u64 offset);
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev);
> +void pci_p2pdma_client_list_free(struct list_head *head);
> +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients,
> +			bool verbose);
> +bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +				struct list_head *clients);
> +bool pci_has_p2pmem(struct pci_dev *pdev);
> +struct pci_dev *pci_p2pmem_find(struct list_head *clients);
> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size);
> +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size);
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr);
> +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +					 unsigned int *nents, u32 length);
> +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +#else /* CONFIG_PCI_P2PDMA */
> +static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
> +		size_t size, u64 offset)
> +{
> +	return -EOPNOTSUPP;
> +}
> +static inline int pci_p2pdma_add_client(struct list_head *head,
> +		struct device *dev)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_remove_client(struct list_head *head,
> +		struct device *dev)
> +{
> +}
> +static inline void pci_p2pdma_client_list_free(struct list_head *head)
> +{
> +}
> +static inline int pci_p2pdma_distance(struct pci_dev *provider,
> +				      struct list_head *clients,
> +				      bool verbose)
> +{
> +	return -1;
> +}
> +static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider,
> +					      struct list_head *clients)
> +{
> +	return false;
> +}
> +static inline bool pci_has_p2pmem(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients)
> +{
> +	return NULL;
> +}
> +static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> +{
> +	return NULL;
> +}
> +static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr,
> +		size_t size)
> +{
> +}
> +static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev,
> +						    void *addr)
> +{
> +	return 0;
> +}
> +static inline struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
> +		unsigned int *nents, u32 length)
> +{
> +	return NULL;
> +}
> +static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
> +		struct scatterlist *sgl)
> +{
> +}
> +static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +}
> +#endif /* CONFIG_PCI_P2PDMA */
> +#endif /* _LINUX_PCI_P2P_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..5d95dbf21f4a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -281,6 +281,7 @@ struct pcie_link_state;
>   struct pci_vpd;
>   struct pci_sriov;
>   struct pci_ats;
> +struct pci_p2pdma;
>   
>   /* The pci_dev structure describes PCI devices */
>   struct pci_dev {
> @@ -439,6 +440,9 @@ struct pci_dev {
>   #ifdef CONFIG_PCI_PASID
>   	u16		pasid_features;
>   #endif
> +#ifdef CONFIG_PCI_P2PDMA
> +	struct pci_p2pdma *p2pdma;
> +#endif
>   	phys_addr_t	rom;		/* Physical address if not from BAR */
>   	size_t		romlen;		/* Length if not from BAR */
>   	char		*driver_override; /* Driver name to force a match */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-30 18:53   ` Logan Gunthorpe
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-31  8:08     ` Christian König
  -1 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:08 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> [SNIP]
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the
> +hierarchy will be routable, but it does not require routing
> +between hierarchies.

Can we add a kernel command line switch and a whitelist to enable P2P 
between separate hierarchies?

At least all newer AMD chipsets supports this and I'm pretty sure that 
Intel has a list with PCI-IDs of the root hubs for this as well.

Regards,
Christian.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  8:08     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:08 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Jonathan Corbet, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> [SNIP]
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the
> +hierarchy will be routable, but it does not require routing
> +between hierarchies.

Can we add a kernel command line switch and a whitelist to enable P2P 
between separate hierarchies?

At least all newer AMD chipsets supports this and I'm pretty sure that 
Intel has a list with PCI-IDs of the root hubs for this as well.

Regards,
Christian.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  8:08     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:08 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> [SNIP]
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the
> +hierarchy will be routable, but it does not require routing
> +between hierarchies.

Can we add a kernel command line switch and a whitelist to enable P2P 
between separate hierarchies?

At least all newer AMD chipsets supports this and I'm pretty sure that 
Intel has a list with PCI-IDs of the root hubs for this as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  8:08     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:08 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Jonathan Corbet

Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> [SNIP]
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the
> +hierarchy will be routable, but it does not require routing
> +between hierarchies.

Can we add a kernel command line switch and a whitelist to enable P2P 
between separate hierarchies?

At least all newer AMD chipsets supports this and I'm pretty sure that 
Intel has a list with PCI-IDs of the root hubs for this as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31  8:08     ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31  8:08 UTC (permalink / raw)


Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
> [SNIP]
> +============================
> +PCI Peer-to-Peer DMA Support
> +============================
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two devices on the bus. This type of transaction is henceforth
> +called Peer-to-Peer (or P2P). However, there are a number of issues that
> +make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI doesn't require forwarding
> +transactions between hierarchy domains, and in PCIe, each Root Port
> +defines a separate hierarchy domain. To make things worse, there is no
> +simple way to determine if a given Root Complex supports this or not.
> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> +only supports doing P2P when the endpoints involved are all behind the
> +same PCI bridge, as such devices are all in the same PCI hierarchy
> +domain, and the spec guarantees that all transacations within the
> +hierarchy will be routable, but it does not require routing
> +between hierarchies.

Can we add a kernel command line switch and a whitelist to enable P2P 
between separate hierarchies?

At least all newer AMD chipsets supports this and I'm pretty sure that 
Intel has a list with PCI-IDs of the root hubs for this as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
  2018-08-31  0:25     ` Sagi Grimberg
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 15:41       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:41 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Thanks, for the review.

On 30/08/18 06:25 PM, Sagi Grimberg wrote:
>> +	if (req->port->p2p_dev) {
>> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
>> +						&ctrl->p2p_clients)) {
>> +			pr_info("peer-to-peer memory on %s is not supported\n",
>> +				pci_name(req->port->p2p_dev));
>> +			goto free_devices;
>> +		}
>> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
>> +	} else {
> 
> When is port->p2p_dev == NULL? a little more documentation would help 
> here...

In the configfs functions, if the user enables p2p (port->use_p2pmem)
using 'auto' or 'y' then port->p2p_dev will be NULL. If the user sets a
specific p2p_dev to use, port->p2p_dev will be set to that device. I can
add a couple comments in the next version.

Logan

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31 15:41       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:41 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Bjorn Helgaas,
	Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Steve Wise

Thanks, for the review.

On 30/08/18 06:25 PM, Sagi Grimberg wrote:
>> +	if (req->port->p2p_dev) {
>> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
>> +						&ctrl->p2p_clients)) {
>> +			pr_info("peer-to-peer memory on %s is not supported\n",
>> +				pci_name(req->port->p2p_dev));
>> +			goto free_devices;
>> +		}
>> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
>> +	} else {
> 
> When is port->p2p_dev == NULL? a little more documentation would help 
> here...

In the configfs functions, if the user enables p2p (port->use_p2pmem)
using 'auto' or 'y' then port->p2p_dev will be NULL. If the user sets a
specific p2p_dev to use, port->p2p_dev will be set to that device. I can
add a couple comments in the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31 15:41       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:41 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Thanks, for the review.

On 30/08/18 06:25 PM, Sagi Grimberg wrote:
>> +	if (req->port->p2p_dev) {
>> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
>> +						&ctrl->p2p_clients)) {
>> +			pr_info("peer-to-peer memory on %s is not supported\n",
>> +				pci_name(req->port->p2p_dev));
>> +			goto free_devices;
>> +		}
>> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
>> +	} else {
> 
> When is port->p2p_dev == NULL? a little more documentation would help 
> here...

In the configfs functions, if the user enables p2p (port->use_p2pmem)
using 'auto' or 'y' then port->p2p_dev will be NULL. If the user sets a
specific p2p_dev to use, port->p2p_dev will be set to that device. I can
add a couple comments in the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31 15:41       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:41 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Steve Wise,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

Thanks, for the review.

On 30/08/18 06:25 PM, Sagi Grimberg wrote:
>> +	if (req->port->p2p_dev) {
>> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
>> +						&ctrl->p2p_clients)) {
>> +			pr_info("peer-to-peer memory on %s is not supported\n",
>> +				pci_name(req->port->p2p_dev));
>> +			goto free_devices;
>> +		}
>> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
>> +	} else {
> 
> When is port->p2p_dev == NULL? a little more documentation would help 
> here...

In the configfs functions, if the user enables p2p (port->use_p2pmem)
using 'auto' or 'y' then port->p2p_dev will be NULL. If the user sets a
specific p2p_dev to use, port->p2p_dev will be set to that device. I can
add a couple comments in the next version.

Logan


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory
@ 2018-08-31 15:41       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:41 UTC (permalink / raw)


Thanks, for the review.

On 30/08/18 06:25 PM, Sagi Grimberg wrote:
>> +	if (req->port->p2p_dev) {
>> +		if (!pci_p2pdma_assign_provider(req->port->p2p_dev,
>> +						&ctrl->p2p_clients)) {
>> +			pr_info("peer-to-peer memory on %s is not supported\n",
>> +				pci_name(req->port->p2p_dev));
>> +			goto free_devices;
>> +		}
>> +		ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev);
>> +	} else {
> 
> When is port->p2p_dev == NULL? a little more documentation would help 
> here...

In the configfs functions, if the user enables p2p (port->use_p2pmem)
using 'auto' or 'y' then port->p2p_dev will be NULL. If the user sets a
specific p2p_dev to use, port->p2p_dev will be set to that device. I can
add a couple comments in the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-31  0:34     ` Randy Dunlap
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 15:44       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:44 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Christian König, Benjamin Herrenschmidt, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Hey,

Thanks for the review. I'll make the fixes for the next version.

On 30/08/18 06:34 PM, Randy Dunlap wrote:
>> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
>> +that is supported (behind the same root port) as all the clients. If more
>> +than one provider is supported, the one nearest to all the clients will
>> +be chosen first. If there are more than one provider is an equal distance
>> +away, the one returned will be chosen at random. This function returns the PCI
> 
> random or just arbitrarily?

Randomly. See pci_p2pmem_find() in patch 1. We use prandom_u32_max() to
select any of the supported devices.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:44       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:44 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Jonathan Corbet

Hey,

Thanks for the review. I'll make the fixes for the next version.

On 30/08/18 06:34 PM, Randy Dunlap wrote:
>> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
>> +that is supported (behind the same root port) as all the clients. If more
>> +than one provider is supported, the one nearest to all the clients will
>> +be chosen first. If there are more than one provider is an equal distance
>> +away, the one returned will be chosen at random. This function returns the PCI
> 
> random or just arbitrarily?

Randomly. See pci_p2pmem_find() in patch 1. We use prandom_u32_max() to
select any of the supported devices.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:44       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:44 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Christian König, Benjamin Herrenschmidt, Jonathan Corbet,
	Alex Williamson, Jérôme Glisse, Jason Gunthorpe,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

Hey,

Thanks for the review. I'll make the fixes for the next version.

On 30/08/18 06:34 PM, Randy Dunlap wrote:
>> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
>> +that is supported (behind the same root port) as all the clients. If more
>> +than one provider is supported, the one nearest to all the clients will
>> +be chosen first. If there are more than one provider is an equal distance
>> +away, the one returned will be chosen at random. This function returns the PCI
> 
> random or just arbitrarily?

Randomly. See pci_p2pmem_find() in patch 1. We use prandom_u32_max() to
select any of the supported devices.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:44       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:44 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Christian König, Benjamin Herrenschmidt,
	Jonathan Corbet, Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

Hey,

Thanks for the review. I'll make the fixes for the next version.

On 30/08/18 06:34 PM, Randy Dunlap wrote:
>> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
>> +that is supported (behind the same root port) as all the clients. If more
>> +than one provider is supported, the one nearest to all the clients will
>> +be chosen first. If there are more than one provider is an equal distance
>> +away, the one returned will be chosen at random. This function returns the PCI
> 
> random or just arbitrarily?

Randomly. See pci_p2pmem_find() in patch 1. We use prandom_u32_max() to
select any of the supported devices.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:44       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:44 UTC (permalink / raw)


Hey,

Thanks for the review. I'll make the fixes for the next version.

On 30/08/18 06:34 PM, Randy Dunlap wrote:
>> +With the client list in hand, the orchestrator may then call> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
>> +that is supported (behind the same root port) as all the clients. If more
>> +than one provider is supported, the one nearest to all the clients will
>> +be chosen first. If there are more than one provider is an equal distance
>> +away, the one returned will be chosen at random. This function returns the PCI
> 
> random or just arbitrarily?

Randomly. See pci_p2pmem_find() in patch 1. We use prandom_u32_max() to
select any of the supported devices.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-31  8:04     ` Christian König
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 15:48       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:48 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Benjamin Herrenschmidt, Alex Williamson, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig



On 31/08/18 02:04 AM, Christian König wrote:
> Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
>> Some PCI devices may have memory mapped in a BAR space that's
>> intended for use in peer-to-peer transactions. In order to enable
>> such transactions the memory must be registered with ZONE_DEVICE pages
>> so it can be used by DMA interfaces in existing drivers.
> 
> We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.
> 
> How hard would it be to separate enabling P2P detection (e.g. distance 
> between two devices) from this?

Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
which has nothing to do with the ZONE_DEVICE support.

(And the distance function makes use of a number of static functions
which could be combined into a simpler interface, should we need it.)

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 15:48       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:48 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson



On 31/08/18 02:04 AM, Christian König wrote:
> Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
>> Some PCI devices may have memory mapped in a BAR space that's
>> intended for use in peer-to-peer transactions. In order to enable
>> such transactions the memory must be registered with ZONE_DEVICE pages
>> so it can be used by DMA interfaces in existing drivers.
> 
> We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.
> 
> How hard would it be to separate enabling P2P detection (e.g. distance 
> between two devices) from this?

Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
which has nothing to do with the ZONE_DEVICE support.

(And the distance function makes use of a number of static functions
which could be combined into a simpler interface, should we need it.)

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 15:48       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:48 UTC (permalink / raw)
  To: Christian König, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Benjamin Herrenschmidt, Alex Williamson, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig



On 31/08/18 02:04 AM, Christian König wrote:
> Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
>> Some PCI devices may have memory mapped in a BAR space that's
>> intended for use in peer-to-peer transactions. In order to enable
>> such transactions the memory must be registered with ZONE_DEVICE pages
>> so it can be used by DMA interfaces in existing drivers.
> 
> We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.
> 
> How hard would it be to separate enabling P2P detection (e.g. distance 
> between two devices) from this?

Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
which has nothing to do with the ZONE_DEVICE support.

(And the distance function makes use of a number of static functions
which could be combined into a simpler interface, should we need it.)

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 15:48       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:48 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Benjamin Herrenschmidt, Alex Williamson,
	Stephen Bates, Keith Busch, Jérôme Glisse,
	Jason Gunthorpe, Bjorn Helgaas, Max Gurtovoy, Dan Williams,
	Christoph Hellwig

CgpPbiAzMS8wOC8xOCAwMjowNCBBTSwgQ2hyaXN0aWFuIEvDtm5pZyB3cm90ZToKPiBBbSAzMC4w
OC4yMDE4IHVtIDIwOjUzIHNjaHJpZWIgTG9nYW4gR3VudGhvcnBlOgo+PiBTb21lIFBDSSBkZXZp
Y2VzIG1heSBoYXZlIG1lbW9yeSBtYXBwZWQgaW4gYSBCQVIgc3BhY2UgdGhhdCdzCj4+IGludGVu
ZGVkIGZvciB1c2UgaW4gcGVlci10by1wZWVyIHRyYW5zYWN0aW9ucy4gSW4gb3JkZXIgdG8gZW5h
YmxlCj4+IHN1Y2ggdHJhbnNhY3Rpb25zIHRoZSBtZW1vcnkgbXVzdCBiZSByZWdpc3RlcmVkIHdp
dGggWk9ORV9ERVZJQ0UgcGFnZXMKPj4gc28gaXQgY2FuIGJlIHVzZWQgYnkgRE1BIGludGVyZmFj
ZXMgaW4gZXhpc3RpbmcgZHJpdmVycy4KPiAKPiBXZSB3YW50IHRvIHVzZSB0aGF0IGZlYXR1cmUg
d2l0aG91dCBaT05FX0RFVklDRSBwYWdlcyBmb3IgRE1BLWJ1ZiBhcyB3ZWxsLgo+IAo+IEhvdyBo
YXJkIHdvdWxkIGl0IGJlIHRvIHNlcGFyYXRlIGVuYWJsaW5nIFAyUCBkZXRlY3Rpb24gKGUuZy4g
ZGlzdGFuY2UgCj4gYmV0d2VlbiB0d28gZGV2aWNlcykgZnJvbSB0aGlzPwoKUHJldHR5IGVhc3ku
IFAyUCBkZXRlY3Rpb24gaXMgcHJldHR5IG11Y2gganVzdCBwY2lfcDJwZG1hX2Rpc3RhbmNlKCkg
LAp3aGljaCBoYXMgbm90aGluZyB0byBkbyB3aXRoIHRoZSBaT05FX0RFVklDRSBzdXBwb3J0LgoK
KEFuZCB0aGUgZGlzdGFuY2UgZnVuY3Rpb24gbWFrZXMgdXNlIG9mIGEgbnVtYmVyIG9mIHN0YXRp
YyBmdW5jdGlvbnMKd2hpY2ggY291bGQgYmUgY29tYmluZWQgaW50byBhIHNpbXBsZXIgaW50ZXJm
YWNlLCBzaG91bGQgd2UgbmVlZCBpdC4pCgpMb2dhbgoKX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX18KTGludXgtbnZtZSBtYWlsaW5nIGxpc3QKTGludXgtbnZt
ZUBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4v
bGlzdGluZm8vbGludXgtbnZtZQo=

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 15:48       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:48 UTC (permalink / raw)




On 31/08/18 02:04 AM, Christian K?nig wrote:
> Am 30.08.2018 um 20:53 schrieb Logan Gunthorpe:
>> Some PCI devices may have memory mapped in a BAR space that's
>> intended for use in peer-to-peer transactions. In order to enable
>> such transactions the memory must be registered with ZONE_DEVICE pages
>> so it can be used by DMA interfaces in existing drivers.
> 
> We want to use that feature without ZONE_DEVICE pages for DMA-buf as well.
> 
> How hard would it be to separate enabling P2P detection (e.g. distance 
> between two devices) from this?

Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
which has nothing to do with the ZONE_DEVICE support.

(And the distance function makes use of a number of static functions
which could be combined into a simpler interface, should we need it.)

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-31  8:08     ` Christian König
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 15:51       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:51 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 31/08/18 02:08 AM, Christian König wrote:
>> +One of the biggest issues is that PCI doesn't require forwarding
>> +transactions between hierarchy domains, and in PCIe, each Root Port
>> +defines a separate hierarchy domain. To make things worse, there is no
>> +simple way to determine if a given Root Complex supports this or not.
>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>> +only supports doing P2P when the endpoints involved are all behind the
>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>> +domain, and the spec guarantees that all transacations within the
>> +hierarchy will be routable, but it does not require routing
>> +between hierarchies.
> 
> Can we add a kernel command line switch and a whitelist to enable P2P 
> between separate hierarchies?

In future work, yes. But not for this patchset. This is definitely the
way I see things going, but we've chosen to start with what we've presented.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:51 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Jonathan Corbet



On 31/08/18 02:08 AM, Christian König wrote:
>> +One of the biggest issues is that PCI doesn't require forwarding
>> +transactions between hierarchy domains, and in PCIe, each Root Port
>> +defines a separate hierarchy domain. To make things worse, there is no
>> +simple way to determine if a given Root Complex supports this or not.
>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>> +only supports doing P2P when the endpoints involved are all behind the
>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>> +domain, and the spec guarantees that all transacations within the
>> +hierarchy will be routable, but it does not require routing
>> +between hierarchies.
> 
> Can we add a kernel command line switch and a whitelist to enable P2P 
> between separate hierarchies?

In future work, yes. But not for this patchset. This is definitely the
way I see things going, but we've chosen to start with what we've presented.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:51 UTC (permalink / raw)
  To: Christian König, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 31/08/18 02:08 AM, Christian König wrote:
>> +One of the biggest issues is that PCI doesn't require forwarding
>> +transactions between hierarchy domains, and in PCIe, each Root Port
>> +defines a separate hierarchy domain. To make things worse, there is no
>> +simple way to determine if a given Root Complex supports this or not.
>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>> +only supports doing P2P when the endpoints involved are all behind the
>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>> +domain, and the spec guarantees that all transacations within the
>> +hierarchy will be routable, but it does not require routing
>> +between hierarchies.
> 
> Can we add a kernel command line switch and a whitelist to enable P2P 
> between separate hierarchies?

In future work, yes. But not for this patchset. This is definitely the
way I see things going, but we've chosen to start with what we've presented.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:51 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Jonathan Corbet, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

CgpPbiAzMS8wOC8xOCAwMjowOCBBTSwgQ2hyaXN0aWFuIEvDtm5pZyB3cm90ZToKPj4gK09uZSBv
ZiB0aGUgYmlnZ2VzdCBpc3N1ZXMgaXMgdGhhdCBQQ0kgZG9lc24ndCByZXF1aXJlIGZvcndhcmRp
bmcKPj4gK3RyYW5zYWN0aW9ucyBiZXR3ZWVuIGhpZXJhcmNoeSBkb21haW5zLCBhbmQgaW4gUENJ
ZSwgZWFjaCBSb290IFBvcnQKPj4gK2RlZmluZXMgYSBzZXBhcmF0ZSBoaWVyYXJjaHkgZG9tYWlu
LiBUbyBtYWtlIHRoaW5ncyB3b3JzZSwgdGhlcmUgaXMgbm8KPj4gK3NpbXBsZSB3YXkgdG8gZGV0
ZXJtaW5lIGlmIGEgZ2l2ZW4gUm9vdCBDb21wbGV4IHN1cHBvcnRzIHRoaXMgb3Igbm90Lgo+PiAr
KFNlZSBQQ0llIHI0LjAsIHNlYyAxLjMuMSkuIFRoZXJlZm9yZSwgYXMgb2YgdGhpcyB3cml0aW5n
LCB0aGUga2VybmVsCj4+ICtvbmx5IHN1cHBvcnRzIGRvaW5nIFAyUCB3aGVuIHRoZSBlbmRwb2lu
dHMgaW52b2x2ZWQgYXJlIGFsbCBiZWhpbmQgdGhlCj4+ICtzYW1lIFBDSSBicmlkZ2UsIGFzIHN1
Y2ggZGV2aWNlcyBhcmUgYWxsIGluIHRoZSBzYW1lIFBDSSBoaWVyYXJjaHkKPj4gK2RvbWFpbiwg
YW5kIHRoZSBzcGVjIGd1YXJhbnRlZXMgdGhhdCBhbGwgdHJhbnNhY2F0aW9ucyB3aXRoaW4gdGhl
Cj4+ICtoaWVyYXJjaHkgd2lsbCBiZSByb3V0YWJsZSwgYnV0IGl0IGRvZXMgbm90IHJlcXVpcmUg
cm91dGluZwo+PiArYmV0d2VlbiBoaWVyYXJjaGllcy4KPiAKPiBDYW4gd2UgYWRkIGEga2VybmVs
IGNvbW1hbmQgbGluZSBzd2l0Y2ggYW5kIGEgd2hpdGVsaXN0IHRvIGVuYWJsZSBQMlAgCj4gYmV0
d2VlbiBzZXBhcmF0ZSBoaWVyYXJjaGllcz8KCkluIGZ1dHVyZSB3b3JrLCB5ZXMuIEJ1dCBub3Qg
Zm9yIHRoaXMgcGF0Y2hzZXQuIFRoaXMgaXMgZGVmaW5pdGVseSB0aGUKd2F5IEkgc2VlIHRoaW5n
cyBnb2luZywgYnV0IHdlJ3ZlIGNob3NlbiB0byBzdGFydCB3aXRoIHdoYXQgd2UndmUgcHJlc2Vu
dGVkLgoKTG9nYW4KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fCkxpbnV4LW52bWUgbWFpbGluZyBsaXN0CkxpbnV4LW52bWVAbGlzdHMuaW5mcmFkZWFkLm9y
ZwpodHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52bWUK

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 15:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 15:51 UTC (permalink / raw)




On 31/08/18 02:08 AM, Christian K?nig wrote:
>> +One of the biggest issues is that PCI doesn't require forwarding
>> +transactions between hierarchy domains, and in PCIe, each Root Port
>> +defines a separate hierarchy domain. To make things worse, there is no
>> +simple way to determine if a given Root Complex supports this or not.
>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>> +only supports doing P2P when the endpoints involved are all behind the
>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>> +domain, and the spec guarantees that all transacations within the
>> +hierarchy will be routable, but it does not require routing
>> +between hierarchies.
> 
> Can we add a kernel command line switch and a whitelist to enable P2P 
> between separate hierarchies?

In future work, yes. But not for this patchset. This is definitely the
way I see things going, but we've chosen to start with what we've presented.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-30 18:53   ` Logan Gunthorpe
                       ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 16:19     ` Jonathan Cameron
  -1 siblings, 0 replies; 265+ messages in thread
From: Jonathan Cameron @ 2018-08-31 16:19 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:19     ` Jonathan Cameron
  0 siblings, 0 replies; 265+ messages in thread
From: Jonathan Cameron @ 2018-08-31 16:19 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Jérôme Glisse, Benjamin Herrenschmidt,
	Alex Williamson, Christian König

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:19     ` Jonathan Cameron
  0 siblings, 0 replies; 265+ messages in thread
From: Jonathan Cameron @ 2018-08-31 16:19 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:19     ` Jonathan Cameron
  0 siblings, 0 replies; 265+ messages in thread
From: Jonathan Cameron @ 2018-08-31 16:19 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:19     ` Jonathan Cameron
  0 siblings, 0 replies; 265+ messages in thread
From: Jonathan Cameron @ 2018-08-31 16:19 UTC (permalink / raw)


On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Logan Gunthorpe <logang at deltatee.com>

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + *     -+  Root Port
> + *      \+ Switch Upstream Port
> + *       +-+ Switch Downstream Port
> + *       + \- Device A
> + *       \-+ Switch Downstream Port
> + *         \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-31 16:19     ` Jonathan Cameron
                         ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 16:26       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 16:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig



On 31/08/18 10:19 AM, Jonathan Cameron wrote:
> This feels like a somewhat simplistic starting point rather than a
> generally correct estimate to use.  Should we be taking the bandwidth of
> those links into account for example, or any discoverable latencies?
> Not all PCIe switches are alike - particularly when it comes to P2P.

I don't think this is necessary. There won't typically be a ton of
choice in terms of devices to use and if there is, the hardware will
probably be fairly homogenous. For example, it would be unusual to have
an NVMe drive on a x4 and another one on an x8. Or mixing say Gen3
switches with Gen4 would also be very strange. In weird unusual cases
like this where the user specifically wants to use a faster device they
can specify the specific device in the configfs interface.

I think the latency would probably be proportional to the distance which
is what we are already using.

> I guess that can be a topic for future development if it turns out people
> have horrible mixed systems.

Yup!

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:26       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 16:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Jérôme Glisse, Benjamin Herrenschmidt,
	Alex Williamson, Christian König



On 31/08/18 10:19 AM, Jonathan Cameron wrote:
> This feels like a somewhat simplistic starting point rather than a
> generally correct estimate to use.  Should we be taking the bandwidth of
> those links into account for example, or any discoverable latencies?
> Not all PCIe switches are alike - particularly when it comes to P2P.

I don't think this is necessary. There won't typically be a ton of
choice in terms of devices to use and if there is, the hardware will
probably be fairly homogenous. For example, it would be unusual to have
an NVMe drive on a x4 and another one on an x8. Or mixing say Gen3
switches with Gen4 would also be very strange. In weird unusual cases
like this where the user specifically wants to use a faster device they
can specify the specific device in the configfs interface.

I think the latency would probably be proportional to the distance which
is what we are already using.

> I guess that can be a topic for future development if it turns out people
> have horrible mixed systems.

Yup!

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:26       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 16:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig



On 31/08/18 10:19 AM, Jonathan Cameron wrote:
> This feels like a somewhat simplistic starting point rather than a
> generally correct estimate to use.  Should we be taking the bandwidth of
> those links into account for example, or any discoverable latencies?
> Not all PCIe switches are alike - particularly when it comes to P2P.

I don't think this is necessary. There won't typically be a ton of
choice in terms of devices to use and if there is, the hardware will
probably be fairly homogenous. For example, it would be unusual to have
an NVMe drive on a x4 and another one on an x8. Or mixing say Gen3
switches with Gen4 would also be very strange. In weird unusual cases
like this where the user specifically wants to use a faster device they
can specify the specific device in the configfs interface.

I think the latency would probably be proportional to the distance which
is what we are already using.

> I guess that can be a topic for future development if it turns out people
> have horrible mixed systems.

Yup!

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:26       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 16:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig



On 31/08/18 10:19 AM, Jonathan Cameron wrote:
> This feels like a somewhat simplistic starting point rather than a
> generally correct estimate to use.  Should we be taking the bandwidth of
> those links into account for example, or any discoverable latencies?
> Not all PCIe switches are alike - particularly when it comes to P2P.

I don't think this is necessary. There won't typically be a ton of
choice in terms of devices to use and if there is, the hardware will
probably be fairly homogenous. For example, it would be unusual to have
an NVMe drive on a x4 and another one on an x8. Or mixing say Gen3
switches with Gen4 would also be very strange. In weird unusual cases
like this where the user specifically wants to use a faster device they
can specify the specific device in the configfs interface.

I think the latency would probably be proportional to the distance which
is what we are already using.

> I guess that can be a topic for future development if it turns out people
> have horrible mixed systems.

Yup!

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-08-31 16:26       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 16:26 UTC (permalink / raw)




On 31/08/18 10:19 AM, Jonathan Cameron wrote:
> This feels like a somewhat simplistic starting point rather than a
> generally correct estimate to use.  Should we be taking the bandwidth of
> those links into account for example, or any discoverable latencies?
> Not all PCIe switches are alike - particularly when it comes to P2P.

I don't think this is necessary. There won't typically be a ton of
choice in terms of devices to use and if there is, the hardware will
probably be fairly homogenous. For example, it would be unusual to have
an NVMe drive on a x4 and another one on an x8. Or mixing say Gen3
switches with Gen4 would also be very strange. In weird unusual cases
like this where the user specifically wants to use a faster device they
can specify the specific device in the configfs interface.

I think the latency would probably be proportional to the distance which
is what we are already using.

> I guess that can be a topic for future development if it turns out people
> have horrible mixed systems.

Yup!

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-31 15:51       ` Logan Gunthorpe
                           ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 17:38         ` Christian König
  -1 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31 17:38 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Am 31.08.2018 um 17:51 schrieb Logan Gunthorpe:
>
> On 31/08/18 02:08 AM, Christian König wrote:
>>> +One of the biggest issues is that PCI doesn't require forwarding
>>> +transactions between hierarchy domains, and in PCIe, each Root Port
>>> +defines a separate hierarchy domain. To make things worse, there is no
>>> +simple way to determine if a given Root Complex supports this or not.
>>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>>> +only supports doing P2P when the endpoints involved are all behind the
>>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>>> +domain, and the spec guarantees that all transacations within the
>>> +hierarchy will be routable, but it does not require routing
>>> +between hierarchies.
>> Can we add a kernel command line switch and a whitelist to enable P2P
>> between separate hierarchies?
> In future work, yes. But not for this patchset. This is definitely the
> way I see things going, but we've chosen to start with what we've presented.

Sounds like a plan to me.

If you can separate out adding the detection I can take a look adding 
this with my DMA-buf P2P efforts.

Christian.

>
> Logan

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 17:38         ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31 17:38 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Jonathan Corbet, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

QW0gMzEuMDguMjAxOCB1bSAxNzo1MSBzY2hyaWViIExvZ2FuIEd1bnRob3JwZToKPgo+IE9uIDMx
LzA4LzE4IDAyOjA4IEFNLCBDaHJpc3RpYW4gS8O2bmlnIHdyb3RlOgo+Pj4gK09uZSBvZiB0aGUg
YmlnZ2VzdCBpc3N1ZXMgaXMgdGhhdCBQQ0kgZG9lc24ndCByZXF1aXJlIGZvcndhcmRpbmcKPj4+
ICt0cmFuc2FjdGlvbnMgYmV0d2VlbiBoaWVyYXJjaHkgZG9tYWlucywgYW5kIGluIFBDSWUsIGVh
Y2ggUm9vdCBQb3J0Cj4+PiArZGVmaW5lcyBhIHNlcGFyYXRlIGhpZXJhcmNoeSBkb21haW4uIFRv
IG1ha2UgdGhpbmdzIHdvcnNlLCB0aGVyZSBpcyBubwo+Pj4gK3NpbXBsZSB3YXkgdG8gZGV0ZXJt
aW5lIGlmIGEgZ2l2ZW4gUm9vdCBDb21wbGV4IHN1cHBvcnRzIHRoaXMgb3Igbm90Lgo+Pj4gKyhT
ZWUgUENJZSByNC4wLCBzZWMgMS4zLjEpLiBUaGVyZWZvcmUsIGFzIG9mIHRoaXMgd3JpdGluZywg
dGhlIGtlcm5lbAo+Pj4gK29ubHkgc3VwcG9ydHMgZG9pbmcgUDJQIHdoZW4gdGhlIGVuZHBvaW50
cyBpbnZvbHZlZCBhcmUgYWxsIGJlaGluZCB0aGUKPj4+ICtzYW1lIFBDSSBicmlkZ2UsIGFzIHN1
Y2ggZGV2aWNlcyBhcmUgYWxsIGluIHRoZSBzYW1lIFBDSSBoaWVyYXJjaHkKPj4+ICtkb21haW4s
IGFuZCB0aGUgc3BlYyBndWFyYW50ZWVzIHRoYXQgYWxsIHRyYW5zYWNhdGlvbnMgd2l0aGluIHRo
ZQo+Pj4gK2hpZXJhcmNoeSB3aWxsIGJlIHJvdXRhYmxlLCBidXQgaXQgZG9lcyBub3QgcmVxdWly
ZSByb3V0aW5nCj4+PiArYmV0d2VlbiBoaWVyYXJjaGllcy4KPj4gQ2FuIHdlIGFkZCBhIGtlcm5l
bCBjb21tYW5kIGxpbmUgc3dpdGNoIGFuZCBhIHdoaXRlbGlzdCB0byBlbmFibGUgUDJQCj4+IGJl
dHdlZW4gc2VwYXJhdGUgaGllcmFyY2hpZXM/Cj4gSW4gZnV0dXJlIHdvcmssIHllcy4gQnV0IG5v
dCBmb3IgdGhpcyBwYXRjaHNldC4gVGhpcyBpcyBkZWZpbml0ZWx5IHRoZQo+IHdheSBJIHNlZSB0
aGluZ3MgZ29pbmcsIGJ1dCB3ZSd2ZSBjaG9zZW4gdG8gc3RhcnQgd2l0aCB3aGF0IHdlJ3ZlIHBy
ZXNlbnRlZC4KClNvdW5kcyBsaWtlIGEgcGxhbiB0byBtZS4KCklmIHlvdSBjYW4gc2VwYXJhdGUg
b3V0IGFkZGluZyB0aGUgZGV0ZWN0aW9uIEkgY2FuIHRha2UgYSBsb29rIGFkZGluZyAKdGhpcyB3
aXRoIG15IERNQS1idWYgUDJQIGVmZm9ydHMuCgpDaHJpc3RpYW4uCgo+Cj4gTG9nYW4KCgpfX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1udm1lIG1h
aWxpbmcgbGlzdApMaW51eC1udm1lQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmlu
ZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udm1lCg==

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 17:38         ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31 17:38 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

Am 31.08.2018 um 17:51 schrieb Logan Gunthorpe:
>
> On 31/08/18 02:08 AM, Christian König wrote:
>>> +One of the biggest issues is that PCI doesn't require forwarding
>>> +transactions between hierarchy domains, and in PCIe, each Root Port
>>> +defines a separate hierarchy domain. To make things worse, there is no
>>> +simple way to determine if a given Root Complex supports this or not.
>>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>>> +only supports doing P2P when the endpoints involved are all behind the
>>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>>> +domain, and the spec guarantees that all transacations within the
>>> +hierarchy will be routable, but it does not require routing
>>> +between hierarchies.
>> Can we add a kernel command line switch and a whitelist to enable P2P
>> between separate hierarchies?
> In future work, yes. But not for this patchset. This is definitely the
> way I see things going, but we've chosen to start with what we've presented.

Sounds like a plan to me.

If you can separate out adding the detection I can take a look adding 
this with my DMA-buf P2P efforts.

Christian.

>
> Logan

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 17:38         ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31 17:38 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Jonathan Corbet

Am 31.08.2018 um 17:51 schrieb Logan Gunthorpe:
>
> On 31/08/18 02:08 AM, Christian König wrote:
>>> +One of the biggest issues is that PCI doesn't require forwarding
>>> +transactions between hierarchy domains, and in PCIe, each Root Port
>>> +defines a separate hierarchy domain. To make things worse, there is no
>>> +simple way to determine if a given Root Complex supports this or not.
>>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>>> +only supports doing P2P when the endpoints involved are all behind the
>>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>>> +domain, and the spec guarantees that all transacations within the
>>> +hierarchy will be routable, but it does not require routing
>>> +between hierarchies.
>> Can we add a kernel command line switch and a whitelist to enable P2P
>> between separate hierarchies?
> In future work, yes. But not for this patchset. This is definitely the
> way I see things going, but we've chosen to start with what we've presented.

Sounds like a plan to me.

If you can separate out adding the detection I can take a look adding 
this with my DMA-buf P2P efforts.

Christian.

>
> Logan


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 17:38         ` Christian König
  0 siblings, 0 replies; 265+ messages in thread
From: Christian König @ 2018-08-31 17:38 UTC (permalink / raw)


Am 31.08.2018 um 17:51 schrieb Logan Gunthorpe:
>
> On 31/08/18 02:08 AM, Christian K?nig wrote:
>>> +One of the biggest issues is that PCI doesn't require forwarding
>>> +transactions between hierarchy domains, and in PCIe, each Root Port
>>> +defines a separate hierarchy domain. To make things worse, there is no
>>> +simple way to determine if a given Root Complex supports this or not.
>>> +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
>>> +only supports doing P2P when the endpoints involved are all behind the
>>> +same PCI bridge, as such devices are all in the same PCI hierarchy
>>> +domain, and the spec guarantees that all transacations within the
>>> +hierarchy will be routable, but it does not require routing
>>> +between hierarchies.
>> Can we add a kernel command line switch and a whitelist to enable P2P
>> between separate hierarchies?
> In future work, yes. But not for this patchset. This is definitely the
> way I see things going, but we've chosen to start with what we've presented.

Sounds like a plan to me.

If you can separate out adding the detection I can take a look adding 
this with my DMA-buf P2P efforts.

Christian.

>
> Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
  2018-08-31 17:38         ` Christian König
                             ` (2 preceding siblings ...)
  (?)
@ 2018-08-31 19:11           ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 19:11 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 31/08/18 11:38 AM, Christian König wrote:
> If you can separate out adding the detection I can take a look adding 
> this with my DMA-buf P2P efforts.

Oh, maybe my previous email wasn't clear, but I'd say that detection is
already separate from ZONE_DEVICE. Nothing really needs to be changed.
I just think you'll probably want to write you're own function similar
to pci_p2pdma_distance that perhaps just takes two pci_devs instead of
the list of clients as is needed by nvme-of-like users.

To enable a whitelist we just have to handle the case where
upstream_bridge_distance() returns -1 and check if the devices are in
the same root complex with supported root ports before deciding the
transaction is not supported.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 19:11           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 19:11 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Stephen Bates, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Jonathan Corbet



On 31/08/18 11:38 AM, Christian König wrote:
> If you can separate out adding the detection I can take a look adding 
> this with my DMA-buf P2P efforts.

Oh, maybe my previous email wasn't clear, but I'd say that detection is
already separate from ZONE_DEVICE. Nothing really needs to be changed.
I just think you'll probably want to write you're own function similar
to pci_p2pdma_distance that perhaps just takes two pci_devs instead of
the list of clients as is needed by nvme-of-like users.

To enable a whitelist we just have to handle the case where
upstream_bridge_distance() returns -1 and check if the devices are in
the same root complex with supported root ports before deciding the
transaction is not supported.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 19:11           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 19:11 UTC (permalink / raw)
  To: Christian König, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-block-u79uwXL29TY76Z2rM5mHXA
  Cc: Jonathan Corbet, Benjamin Herrenschmidt, Alex Williamson,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 31/08/18 11:38 AM, Christian König wrote:
> If you can separate out adding the detection I can take a look adding 
> this with my DMA-buf P2P efforts.

Oh, maybe my previous email wasn't clear, but I'd say that detection is
already separate from ZONE_DEVICE. Nothing really needs to be changed.
I just think you'll probably want to write you're own function similar
to pci_p2pdma_distance that perhaps just takes two pci_devs instead of
the list of clients as is needed by nvme-of-like users.

To enable a whitelist we just have to handle the case where
upstream_bridge_distance() returns -1 and check if the devices are in
the same root complex with supported root ports before deciding the
transaction is not supported.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 19:11           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 19:11 UTC (permalink / raw)
  To: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block
  Cc: Sagi Grimberg, Jonathan Corbet, Benjamin Herrenschmidt,
	Alex Williamson, Stephen Bates, Keith Busch,
	Jérôme Glisse, Jason Gunthorpe, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

CgpPbiAzMS8wOC8xOCAxMTozOCBBTSwgQ2hyaXN0aWFuIEvDtm5pZyB3cm90ZToKPiBJZiB5b3Ug
Y2FuIHNlcGFyYXRlIG91dCBhZGRpbmcgdGhlIGRldGVjdGlvbiBJIGNhbiB0YWtlIGEgbG9vayBh
ZGRpbmcgCj4gdGhpcyB3aXRoIG15IERNQS1idWYgUDJQIGVmZm9ydHMuCgpPaCwgbWF5YmUgbXkg
cHJldmlvdXMgZW1haWwgd2Fzbid0IGNsZWFyLCBidXQgSSdkIHNheSB0aGF0IGRldGVjdGlvbiBp
cwphbHJlYWR5IHNlcGFyYXRlIGZyb20gWk9ORV9ERVZJQ0UuIE5vdGhpbmcgcmVhbGx5IG5lZWRz
IHRvIGJlIGNoYW5nZWQuCkkganVzdCB0aGluayB5b3UnbGwgcHJvYmFibHkgd2FudCB0byB3cml0
ZSB5b3UncmUgb3duIGZ1bmN0aW9uIHNpbWlsYXIKdG8gcGNpX3AycGRtYV9kaXN0YW5jZSB0aGF0
IHBlcmhhcHMganVzdCB0YWtlcyB0d28gcGNpX2RldnMgaW5zdGVhZCBvZgp0aGUgbGlzdCBvZiBj
bGllbnRzIGFzIGlzIG5lZWRlZCBieSBudm1lLW9mLWxpa2UgdXNlcnMuCgpUbyBlbmFibGUgYSB3
aGl0ZWxpc3Qgd2UganVzdCBoYXZlIHRvIGhhbmRsZSB0aGUgY2FzZSB3aGVyZQp1cHN0cmVhbV9i
cmlkZ2VfZGlzdGFuY2UoKSByZXR1cm5zIC0xIGFuZCBjaGVjayBpZiB0aGUgZGV2aWNlcyBhcmUg
aW4KdGhlIHNhbWUgcm9vdCBjb21wbGV4IHdpdGggc3VwcG9ydGVkIHJvb3QgcG9ydHMgYmVmb3Jl
IGRlY2lkaW5nIHRoZQp0cmFuc2FjdGlvbiBpcyBub3Qgc3VwcG9ydGVkLgoKTG9nYW4KCl9fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCkxpbnV4LW52bWUgbWFp
bGluZyBsaXN0CkxpbnV4LW52bWVAbGlzdHMuaW5mcmFkZWFkLm9yZwpodHRwOi8vbGlzdHMuaW5m
cmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52bWUK

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation
@ 2018-08-31 19:11           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-08-31 19:11 UTC (permalink / raw)




On 31/08/18 11:38 AM, Christian K?nig wrote:
> If you can separate out adding the detection I can take a look adding 
> this with my DMA-buf P2P efforts.

Oh, maybe my previous email wasn't clear, but I'd say that detection is
already separate from ZONE_DEVICE. Nothing really needs to be changed.
I just think you'll probably want to write you're own function similar
to pci_p2pdma_distance that perhaps just takes two pci_devs instead of
the list of clients as is needed by nvme-of-like users.

To enable a whitelist we just have to handle the case where
upstream_bridge_distance() returns -1 and check if the devices are in
the same root complex with supported root ports before deciding the
transaction is not supported.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
  2018-08-31 15:48       ` Logan Gunthorpe
  (?)
  (?)
@ 2018-09-01  8:27         ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:27 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig, Christian König

On Fri, Aug 31, 2018 at 09:48:40AM -0600, Logan Gunthorpe wrote:
> Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
> which has nothing to do with the ZONE_DEVICE support.
> 
> (And the distance function makes use of a number of static functions
> which could be combined into a simpler interface, should we need it.)

I'd ѕay lets get things merged as-is, so that we can review the
non-ZONE_DEVICE users.  I'm a little curious how that is going to work,
so having it as a full series would be useful.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-09-01  8:27         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:27 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Christian König, linux-kernel, linux-pci, linux-nvme,
	linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Christoph Hellwig, Keith Busch, Sagi Grimberg, Bjorn Helgaas,
	Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson

On Fri, Aug 31, 2018 at 09:48:40AM -0600, Logan Gunthorpe wrote:
> Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
> which has nothing to do with the ZONE_DEVICE support.
> 
> (And the distance function makes use of a number of static functions
> which could be combined into a simpler interface, should we need it.)

I'd ѕay lets get things merged as-is, so that we can review the
non-ZONE_DEVICE users.  I'm a little curious how that is going to work,
so having it as a full series would be useful.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-09-01  8:27         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:27 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christoph Hellwig, Christian König

T24gRnJpLCBBdWcgMzEsIDIwMTggYXQgMDk6NDg6NDBBTSAtMDYwMCwgTG9nYW4gR3VudGhvcnBl
IHdyb3RlOgo+IFByZXR0eSBlYXN5LiBQMlAgZGV0ZWN0aW9uIGlzIHByZXR0eSBtdWNoIGp1c3Qg
cGNpX3AycGRtYV9kaXN0YW5jZSgpICwKPiB3aGljaCBoYXMgbm90aGluZyB0byBkbyB3aXRoIHRo
ZSBaT05FX0RFVklDRSBzdXBwb3J0Lgo+IAo+IChBbmQgdGhlIGRpc3RhbmNlIGZ1bmN0aW9uIG1h
a2VzIHVzZSBvZiBhIG51bWJlciBvZiBzdGF0aWMgZnVuY3Rpb25zCj4gd2hpY2ggY291bGQgYmUg
Y29tYmluZWQgaW50byBhIHNpbXBsZXIgaW50ZXJmYWNlLCBzaG91bGQgd2UgbmVlZCBpdC4pCgpJ
J2Qg0ZVheSBsZXRzIGdldCB0aGluZ3MgbWVyZ2VkIGFzLWlzLCBzbyB0aGF0IHdlIGNhbiByZXZp
ZXcgdGhlCm5vbi1aT05FX0RFVklDRSB1c2Vycy4gIEknbSBhIGxpdHRsZSBjdXJpb3VzIGhvdyB0
aGF0IGlzIGdvaW5nIHRvIHdvcmssCnNvIGhhdmluZyBpdCBhcyBhIGZ1bGwgc2VyaWVzIHdvdWxk
IGJlIHVzZWZ1bC4KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fCkxpbnV4LW52bWUgbWFpbGluZyBsaXN0CkxpbnV4LW52bWVAbGlzdHMuaW5mcmFkZWFkLm9y
ZwpodHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52bWUK

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory
@ 2018-09-01  8:27         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:27 UTC (permalink / raw)


On Fri, Aug 31, 2018@09:48:40AM -0600, Logan Gunthorpe wrote:
> Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
> which has nothing to do with the ZONE_DEVICE support.
> 
> (And the distance function makes use of a number of static functions
> which could be combined into a simpler interface, should we need it.)

I'd ?ay lets get things merged as-is, so that we can review the
non-ZONE_DEVICE users.  I'm a little curious how that is going to work,
so having it as a full series would be useful.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-08-30 19:11     ` Jens Axboe
                         ` (2 preceding siblings ...)
  (?)
@ 2018-09-01  8:28       ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-01  8:28       ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block, Stephen Bates, Christoph Hellwig,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-01  8:28       ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-01  8:28       ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-01  8:28       ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-01  8:28 UTC (permalink / raw)


On Thu, Aug 30, 2018@01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-01  8:28       ` Christoph Hellwig
                           ` (2 preceding siblings ...)
  (?)
@ 2018-09-03 22:26         ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-03 22:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 01/09/18 02:28 AM, Christoph Hellwig wrote:
> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> It is just a single branch, which will be predicted as not taken
> for non-P2P users.  The benefit is that we get proper error checking
> by doing it in the block code.

I personally agree with Christoph. But if there's consensus in the other
direction or this is a real blocker moving this forward, I can remove it
for the next version.

Logan

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-03 22:26         ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-03 22:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 01/09/18 02:28 AM, Christoph Hellwig wrote:
> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> It is just a single branch, which will be predicted as not taken
> for non-P2P users.  The benefit is that we get proper error checking
> by doing it in the block code.

I personally agree with Christoph. But if there's consensus in the other
direction or this is a real blocker moving this forward, I can remove it
for the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-03 22:26         ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-03 22:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 01/09/18 02:28 AM, Christoph Hellwig wrote:
> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> It is just a single branch, which will be predicted as not taken
> for non-P2P users.  The benefit is that we get proper error checking
> by doing it in the block code.

I personally agree with Christoph. But if there's consensus in the other
direction or this is a real blocker moving this forward, I can remove it
for the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-03 22:26         ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-03 22:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 01/09/18 02:28 AM, Christoph Hellwig wrote:
> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> It is just a single branch, which will be predicted as not taken
> for non-P2P users.  The benefit is that we get proper error checking
> by doing it in the block code.

I personally agree with Christoph. But if there's consensus in the other
direction or this is a real blocker moving this forward, I can remove it
for the next version.

Logan


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-03 22:26         ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-03 22:26 UTC (permalink / raw)




On 01/09/18 02:28 AM, Christoph Hellwig wrote:
> On Thu, Aug 30, 2018@01:11:18PM -0600, Jens Axboe wrote:
>> I think this belongs in the caller - both the validity check, and
>> passing in NOMERGE for this type of request. I don't want to impose
>> this overhead on everything, for a pretty niche case.
> 
> It is just a single branch, which will be predicted as not taken
> for non-P2P users.  The benefit is that we get proper error checking
> by doing it in the block code.

I personally agree with Christoph. But if there's consensus in the other
direction or this is a real blocker moving this forward, I can remove it
for the next version.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
  2018-08-30 18:53   ` Logan Gunthorpe
  (?)
  (?)
@ 2018-09-04 15:16     ` Jason Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jason Gunthorpe @ 2018-09-04 15:16 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig

On Thu, Aug 30, 2018 at 12:53:49PM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
> 
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which tells the core to
> set QUEUE_FLAG_PCI_P2P in the request queue.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>  drivers/nvme/host/core.c |  4 ++++
>  drivers/nvme/host/nvme.h |  1 +
>  drivers/nvme/host/pci.c  | 17 +++++++++++++----
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd8ec1dd9219..6033ce2fd3e9 100644
> +++ b/drivers/nvme/host/core.c
> @@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>  	ns->queue = blk_mq_init_queue(ctrl->tagset);
>  	if (IS_ERR(ns->queue))
>  		goto out_free_ns;
> +
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> +	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
> +
>  	ns->queue->queuedata = ns;
>  	ns->ctrl = ctrl;
>  
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb4a2003c097..4030743c90aa 100644
> +++ b/drivers/nvme/host/nvme.h
> @@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
>  	unsigned int flags;
>  #define NVME_F_FABRICS			(1 << 0)
>  #define NVME_F_METADATA_SUPPORTED	(1 << 1)
> +#define NVME_F_PCI_P2PDMA		(1 << 2)
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>  	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2902585c6ddf..bb2120d30e39 100644
> +++ b/drivers/nvme/host/pci.c
> @@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		goto out;
>  
>  	ret = BLK_STS_RESOURCE;
> -	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
> -			DMA_ATTR_NO_WARN);
> +
> +	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> +		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> +					  dma_dir);
> +	else
> +		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> +					     dma_dir,  DMA_ATTR_NO_WARN);
>  	if (!nr_mapped)
>  		goto out;
>  
> @@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>  			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>  
>  	if (iod->nents) {
> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);

This seems like a poor direction, if we add IOMMU hairpin support we
will need unmapping.

Jason
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:16     ` Jason Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Jason Gunthorpe @ 2018-09-04 15:16 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On Thu, Aug 30, 2018 at 12:53:49PM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
> 
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which tells the core to
> set QUEUE_FLAG_PCI_P2P in the request queue.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>  drivers/nvme/host/core.c |  4 ++++
>  drivers/nvme/host/nvme.h |  1 +
>  drivers/nvme/host/pci.c  | 17 +++++++++++++----
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd8ec1dd9219..6033ce2fd3e9 100644
> +++ b/drivers/nvme/host/core.c
> @@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>  	ns->queue = blk_mq_init_queue(ctrl->tagset);
>  	if (IS_ERR(ns->queue))
>  		goto out_free_ns;
> +
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> +	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
> +
>  	ns->queue->queuedata = ns;
>  	ns->ctrl = ctrl;
>  
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb4a2003c097..4030743c90aa 100644
> +++ b/drivers/nvme/host/nvme.h
> @@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
>  	unsigned int flags;
>  #define NVME_F_FABRICS			(1 << 0)
>  #define NVME_F_METADATA_SUPPORTED	(1 << 1)
> +#define NVME_F_PCI_P2PDMA		(1 << 2)
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>  	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2902585c6ddf..bb2120d30e39 100644
> +++ b/drivers/nvme/host/pci.c
> @@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		goto out;
>  
>  	ret = BLK_STS_RESOURCE;
> -	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
> -			DMA_ATTR_NO_WARN);
> +
> +	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> +		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> +					  dma_dir);
> +	else
> +		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> +					     dma_dir,  DMA_ATTR_NO_WARN);
>  	if (!nr_mapped)
>  		goto out;
>  
> @@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>  			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>  
>  	if (iod->nents) {
> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);

This seems like a poor direction, if we add IOMMU hairpin support we
will need unmapping.

Jason

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:16     ` Jason Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Jason Gunthorpe @ 2018-09-04 15:16 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Christian König,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christoph Hellwig

On Thu, Aug 30, 2018 at 12:53:49PM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
> 
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which tells the core to
> set QUEUE_FLAG_PCI_P2P in the request queue.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>  drivers/nvme/host/core.c |  4 ++++
>  drivers/nvme/host/nvme.h |  1 +
>  drivers/nvme/host/pci.c  | 17 +++++++++++++----
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd8ec1dd9219..6033ce2fd3e9 100644
> +++ b/drivers/nvme/host/core.c
> @@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>  	ns->queue = blk_mq_init_queue(ctrl->tagset);
>  	if (IS_ERR(ns->queue))
>  		goto out_free_ns;
> +
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> +	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
> +
>  	ns->queue->queuedata = ns;
>  	ns->ctrl = ctrl;
>  
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb4a2003c097..4030743c90aa 100644
> +++ b/drivers/nvme/host/nvme.h
> @@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
>  	unsigned int flags;
>  #define NVME_F_FABRICS			(1 << 0)
>  #define NVME_F_METADATA_SUPPORTED	(1 << 1)
> +#define NVME_F_PCI_P2PDMA		(1 << 2)
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>  	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2902585c6ddf..bb2120d30e39 100644
> +++ b/drivers/nvme/host/pci.c
> @@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		goto out;
>  
>  	ret = BLK_STS_RESOURCE;
> -	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
> -			DMA_ATTR_NO_WARN);
> +
> +	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> +		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> +					  dma_dir);
> +	else
> +		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> +					     dma_dir,  DMA_ATTR_NO_WARN);
>  	if (!nr_mapped)
>  		goto out;
>  
> @@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>  			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>  
>  	if (iod->nents) {
> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);

This seems like a poor direction, if we add IOMMU hairpin support we
will need unmapping.

Jason

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:16     ` Jason Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Jason Gunthorpe @ 2018-09-04 15:16 UTC (permalink / raw)


On Thu, Aug 30, 2018@12:53:49PM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
> 
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which tells the core to
> set QUEUE_FLAG_PCI_P2P in the request queue.
> 
> Signed-off-by: Logan Gunthorpe <logang at deltatee.com>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
>  drivers/nvme/host/core.c |  4 ++++
>  drivers/nvme/host/nvme.h |  1 +
>  drivers/nvme/host/pci.c  | 17 +++++++++++++----
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd8ec1dd9219..6033ce2fd3e9 100644
> +++ b/drivers/nvme/host/core.c
> @@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>  	ns->queue = blk_mq_init_queue(ctrl->tagset);
>  	if (IS_ERR(ns->queue))
>  		goto out_free_ns;
> +
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> +	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
> +
>  	ns->queue->queuedata = ns;
>  	ns->ctrl = ctrl;
>  
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb4a2003c097..4030743c90aa 100644
> +++ b/drivers/nvme/host/nvme.h
> @@ -343,6 +343,7 @@ struct nvme_ctrl_ops {
>  	unsigned int flags;
>  #define NVME_F_FABRICS			(1 << 0)
>  #define NVME_F_METADATA_SUPPORTED	(1 << 1)
> +#define NVME_F_PCI_P2PDMA		(1 << 2)
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>  	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2902585c6ddf..bb2120d30e39 100644
> +++ b/drivers/nvme/host/pci.c
> @@ -737,8 +737,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		goto out;
>  
>  	ret = BLK_STS_RESOURCE;
> -	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
> -			DMA_ATTR_NO_WARN);
> +
> +	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> +		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> +					  dma_dir);
> +	else
> +		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> +					     dma_dir,  DMA_ATTR_NO_WARN);
>  	if (!nr_mapped)
>  		goto out;
>  
> @@ -780,7 +785,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>  			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>  
>  	if (iod->nents) {
> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);

This seems like a poor direction, if we add IOMMU hairpin support we
will need unmapping.

Jason

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
  2018-09-04 15:16     ` Jason Gunthorpe
                         ` (2 preceding siblings ...)
  (?)
@ 2018-09-04 15:47       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-04 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
>>  	if (iod->nents) {
>> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>> +		/* P2PDMA requests do not need to be unmapped */
>> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> 
> This seems like a poor direction, if we add IOMMU hairpin support we
> will need unmapping.

It can always be added later. In any case, you'll have to convince
Christoph who requested the change; I'm not that invested in this decision.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:47       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-04 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
>>  	if (iod->nents) {
>> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>> +		/* P2PDMA requests do not need to be unmapped */
>> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> 
> This seems like a poor direction, if we add IOMMU hairpin support we
> will need unmapping.

It can always be added later. In any case, you'll have to convince
Christoph who requested the change; I'm not that invested in this decision.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:47       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-04 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christoph Hellwig



On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
>>  	if (iod->nents) {
>> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>> +		/* P2PDMA requests do not need to be unmapped */
>> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> 
> This seems like a poor direction, if we add IOMMU hairpin support we
> will need unmapping.

It can always be added later. In any case, you'll have to convince
Christoph who requested the change; I'm not that invested in this decision.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:47       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-04 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Christian König,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christoph Hellwig



On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
>>  	if (iod->nents) {
>> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>> +		/* P2PDMA requests do not need to be unmapped */
>> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> 
> This seems like a poor direction, if we add IOMMU hairpin support we
> will need unmapping.

It can always be added later. In any case, you'll have to convince
Christoph who requested the change; I'm not that invested in this decision.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-04 15:47       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-04 15:47 UTC (permalink / raw)




On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
>>  	if (iod->nents) {
>> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>> +		/* P2PDMA requests do not need to be unmapped */
>> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> 
> This seems like a poor direction, if we add IOMMU hairpin support we
> will need unmapping.

It can always be added later. In any case, you'll have to convince
Christoph who requested the change; I'm not that invested in this decision.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
  2018-09-04 15:47       ` Logan Gunthorpe
  (?)
  (?)
@ 2018-09-05 19:22         ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:22 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Tue, Sep 04, 2018 at 09:47:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
> >>  	if (iod->nents) {
> >> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >> +		/* P2PDMA requests do not need to be unmapped */
> >> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> > 
> > This seems like a poor direction, if we add IOMMU hairpin support we
> > will need unmapping.
> 
> It can always be added later. In any case, you'll have to convince
> Christoph who requested the change; I'm not that invested in this decision.

Yes, no point to add dead code here.  In the long run we should
aim for hiding the p2p address translation behind the normal DMA API
anyway, but we're not quite ready for it yet.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-05 19:22         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:22 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jason Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block, Stephen Bates, Christoph Hellwig,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Jérôme Glisse, Benjamin Herrenschmidt,
	Alex Williamson, Christian König

On Tue, Sep 04, 2018 at 09:47:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
> >>  	if (iod->nents) {
> >> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >> +		/* P2PDMA requests do not need to be unmapped */
> >> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> > 
> > This seems like a poor direction, if we add IOMMU hairpin support we
> > will need unmapping.
> 
> It can always be added later. In any case, you'll have to convince
> Christoph who requested the change; I'm not that invested in this decision.

Yes, no point to add dead code here.  In the long run we should
aim for hiding the p2p address translation behind the normal DMA API
anyway, but we're not quite ready for it yet.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-05 19:22         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:22 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christoph Hellwig

On Tue, Sep 04, 2018 at 09:47:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
> >>  	if (iod->nents) {
> >> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >> +		/* P2PDMA requests do not need to be unmapped */
> >> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> > 
> > This seems like a poor direction, if we add IOMMU hairpin support we
> > will need unmapping.
> 
> It can always be added later. In any case, you'll have to convince
> Christoph who requested the change; I'm not that invested in this decision.

Yes, no point to add dead code here.  In the long run we should
aim for hiding the p2p address translation behind the normal DMA API
anyway, but we're not quite ready for it yet.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests
@ 2018-09-05 19:22         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:22 UTC (permalink / raw)


On Tue, Sep 04, 2018@09:47:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
> >>  	if (iod->nents) {
> >> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >> +		/* P2PDMA requests do not need to be unmapped */
> >> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> > 
> > This seems like a poor direction, if we add IOMMU hairpin support we
> > will need unmapping.
> 
> It can always be added later. In any case, you'll have to convince
> Christoph who requested the change; I'm not that invested in this decision.

Yes, no point to add dead code here.  In the long run we should
aim for hiding the p2p address translation behind the normal DMA API
anyway, but we're not quite ready for it yet.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-03 22:26         ` Logan Gunthorpe
                             ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 19:26           ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:26 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/09/18 02:28 AM, Christoph Hellwig wrote:
>> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>>> I think this belongs in the caller - both the validity check, and
>>> passing in NOMERGE for this type of request. I don't want to impose
>>> this overhead on everything, for a pretty niche case.
>>
>> It is just a single branch, which will be predicted as not taken
>> for non-P2P users.  The benefit is that we get proper error checking
>> by doing it in the block code.
> 
> I personally agree with Christoph. But if there's consensus in the other
> direction or this is a real blocker moving this forward, I can remove it
> for the next version.

It's a simple branch because the check isn't exhaustive. It just checks
the first page. At that point you may as well just require the caller to
flag the bio/rq as being P2P, and then do a check for P2P compatibility
with the queue.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:26           ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:26 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/09/18 02:28 AM, Christoph Hellwig wrote:
>> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>>> I think this belongs in the caller - both the validity check, and
>>> passing in NOMERGE for this type of request. I don't want to impose
>>> this overhead on everything, for a pretty niche case.
>>
>> It is just a single branch, which will be predicted as not taken
>> for non-P2P users.  The benefit is that we get proper error checking
>> by doing it in the block code.
> 
> I personally agree with Christoph. But if there's consensus in the other
> direction or this is a real blocker moving this forward, I can remove it
> for the next version.

It's a simple branch because the check isn't exhaustive. It just checks
the first page. At that point you may as well just require the caller to
flag the bio/rq as being P2P, and then do a check for P2P compatibility
with the queue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:26           ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:26 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/09/18 02:28 AM, Christoph Hellwig wrote:
>> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>>> I think this belongs in the caller - both the validity check, and
>>> passing in NOMERGE for this type of request. I don't want to impose
>>> this overhead on everything, for a pretty niche case.
>>
>> It is just a single branch, which will be predicted as not taken
>> for non-P2P users.  The benefit is that we get proper error checking
>> by doing it in the block code.
> 
> I personally agree with Christoph. But if there's consensus in the other
> direction or this is a real blocker moving this forward, I can remove it
> for the next version.

It's a simple branch because the check isn't exhaustive. It just checks
the first page. At that point you may as well just require the caller to
flag the bio/rq as being P2P, and then do a check for P2P compatibility
with the queue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:26           ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:26 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König

On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/09/18 02:28 AM, Christoph Hellwig wrote:
>> On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
>>> I think this belongs in the caller - both the validity check, and
>>> passing in NOMERGE for this type of request. I don't want to impose
>>> this overhead on everything, for a pretty niche case.
>>
>> It is just a single branch, which will be predicted as not taken
>> for non-P2P users.  The benefit is that we get proper error checking
>> by doing it in the block code.
> 
> I personally agree with Christoph. But if there's consensus in the other
> direction or this is a real blocker moving this forward, I can remove it
> for the next version.

It's a simple branch because the check isn't exhaustive. It just checks
the first page. At that point you may as well just require the caller to
flag the bio/rq as being P2P, and then do a check for P2P compatibility
with the queue.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:26           ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:26 UTC (permalink / raw)


On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/09/18 02:28 AM, Christoph Hellwig wrote:
>> On Thu, Aug 30, 2018@01:11:18PM -0600, Jens Axboe wrote:
>>> I think this belongs in the caller - both the validity check, and
>>> passing in NOMERGE for this type of request. I don't want to impose
>>> this overhead on everything, for a pretty niche case.
>>
>> It is just a single branch, which will be predicted as not taken
>> for non-P2P users.  The benefit is that we get proper error checking
>> by doing it in the block code.
> 
> I personally agree with Christoph. But if there's consensus in the other
> direction or this is a real blocker moving this forward, I can remove it
> for the next version.

It's a simple branch because the check isn't exhaustive. It just checks
the first page. At that point you may as well just require the caller to
flag the bio/rq as being P2P, and then do a check for P2P compatibility
with the queue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:26           ` Jens Axboe
                               ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 19:33             ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:33 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 01:26 PM, Jens Axboe wrote:
> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>> I personally agree with Christoph. But if there's consensus in the other
>> direction or this is a real blocker moving this forward, I can remove it
>> for the next version.
> 
> It's a simple branch because the check isn't exhaustive. It just checks
> the first page. At that point you may as well just require the caller to
> flag the bio/rq as being P2P, and then do a check for P2P compatibility
> with the queue.

Hmm, we had something like that in v4[1] but it just seemed redundant to
create a flag when the information was already in the bio and kind of
ugly for the caller to check for, then set, the flag. I'm not _that_
averse to going back to that though...

Logan

[1]
https://lore.kernel.org/lkml/20180423233046.21476-8-logang@deltatee.com/T/#u
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:33             ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:33 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 01:26 PM, Jens Axboe wrote:
> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>> I personally agree with Christoph. But if there's consensus in the other
>> direction or this is a real blocker moving this forward, I can remove it
>> for the next version.
> 
> It's a simple branch because the check isn't exhaustive. It just checks
> the first page. At that point you may as well just require the caller to
> flag the bio/rq as being P2P, and then do a check for P2P compatibility
> with the queue.

Hmm, we had something like that in v4[1] but it just seemed redundant to
create a flag when the information was already in the bio and kind of
ugly for the caller to check for, then set, the flag. I'm not _that_
averse to going back to that though...

Logan

[1]
https://lore.kernel.org/lkml/20180423233046.21476-8-logang@deltatee.com/T/#u

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:33             ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:33 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 01:26 PM, Jens Axboe wrote:
> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>> I personally agree with Christoph. But if there's consensus in the other
>> direction or this is a real blocker moving this forward, I can remove it
>> for the next version.
> 
> It's a simple branch because the check isn't exhaustive. It just checks
> the first page. At that point you may as well just require the caller to
> flag the bio/rq as being P2P, and then do a check for P2P compatibility
> with the queue.

Hmm, we had something like that in v4[1] but it just seemed redundant to
create a flag when the information was already in the bio and kind of
ugly for the caller to check for, then set, the flag. I'm not _that_
averse to going back to that though...

Logan

[1]
https://lore.kernel.org/lkml/20180423233046.21476-8-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org/T/#u

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:33             ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:33 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 01:26 PM, Jens Axboe wrote:
> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>> I personally agree with Christoph. But if there's consensus in the other
>> direction or this is a real blocker moving this forward, I can remove it
>> for the next version.
> 
> It's a simple branch because the check isn't exhaustive. It just checks
> the first page. At that point you may as well just require the caller to
> flag the bio/rq as being P2P, and then do a check for P2P compatibility
> with the queue.

Hmm, we had something like that in v4[1] but it just seemed redundant to
create a flag when the information was already in the bio and kind of
ugly for the caller to check for, then set, the flag. I'm not _that_
averse to going back to that though...

Logan

[1]
https://lore.kernel.org/lkml/20180423233046.21476-8-logang@deltatee.com/T/#u

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:33             ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:33 UTC (permalink / raw)




On 05/09/18 01:26 PM, Jens Axboe wrote:
> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>> I personally agree with Christoph. But if there's consensus in the other
>> direction or this is a real blocker moving this forward, I can remove it
>> for the next version.
> 
> It's a simple branch because the check isn't exhaustive. It just checks
> the first page. At that point you may as well just require the caller to
> flag the bio/rq as being P2P, and then do a check for P2P compatibility
> with the queue.

Hmm, we had something like that in v4[1] but it just seemed redundant to
create a flag when the information was already in the bio and kind of
ugly for the caller to check for, then set, the flag. I'm not _that_
averse to going back to that though...

Logan

[1]
https://lore.kernel.org/lkml/20180423233046.21476-8-logang at deltatee.com/T/#u

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:33             ` Logan Gunthorpe
  (?)
  (?)
@ 2018-09-05 19:45               ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:45 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 1:33 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 01:26 PM, Jens Axboe wrote:
>> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>>> I personally agree with Christoph. But if there's consensus in the other
>>> direction or this is a real blocker moving this forward, I can remove it
>>> for the next version.
>>
>> It's a simple branch because the check isn't exhaustive. It just checks
>> the first page. At that point you may as well just require the caller to
>> flag the bio/rq as being P2P, and then do a check for P2P compatibility
>> with the queue.
> 
> Hmm, we had something like that in v4[1] but it just seemed redundant to
> create a flag when the information was already in the bio and kind of
> ugly for the caller to check for, then set, the flag. I'm not _that_
> averse to going back to that though...

The point is that the caller doesn't necessarily know where the bio
will end up, hence the caller can't fully check if the whole stack
supports P2P.

What happens if a P2P request ends up with a driver that doesn't
support it?

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:45               ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:45 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/5/18 1:33 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 01:26 PM, Jens Axboe wrote:
>> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>>> I personally agree with Christoph. But if there's consensus in the other
>>> direction or this is a real blocker moving this forward, I can remove it
>>> for the next version.
>>
>> It's a simple branch because the check isn't exhaustive. It just checks
>> the first page. At that point you may as well just require the caller to
>> flag the bio/rq as being P2P, and then do a check for P2P compatibility
>> with the queue.
> 
> Hmm, we had something like that in v4[1] but it just seemed redundant to
> create a flag when the information was already in the bio and kind of
> ugly for the caller to check for, then set, the flag. I'm not _that_
> averse to going back to that though...

The point is that the caller doesn't necessarily know where the bio
will end up, hence the caller can't fully check if the whole stack
supports P2P.

What happens if a P2P request ends up with a driver that doesn't
support it?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:45               ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:45 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König

On 9/5/18 1:33 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 01:26 PM, Jens Axboe wrote:
>> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>>> I personally agree with Christoph. But if there's consensus in the other
>>> direction or this is a real blocker moving this forward, I can remove it
>>> for the next version.
>>
>> It's a simple branch because the check isn't exhaustive. It just checks
>> the first page. At that point you may as well just require the caller to
>> flag the bio/rq as being P2P, and then do a check for P2P compatibility
>> with the queue.
> 
> Hmm, we had something like that in v4[1] but it just seemed redundant to
> create a flag when the information was already in the bio and kind of
> ugly for the caller to check for, then set, the flag. I'm not _that_
> averse to going back to that though...

The point is that the caller doesn't necessarily know where the bio
will end up, hence the caller can't fully check if the whole stack
supports P2P.

What happens if a P2P request ends up with a driver that doesn't
support it?

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:45               ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:45 UTC (permalink / raw)


On 9/5/18 1:33 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 01:26 PM, Jens Axboe wrote:
>> On 9/3/18 4:26 PM, Logan Gunthorpe wrote:
>>> I personally agree with Christoph. But if there's consensus in the other
>>> direction or this is a real blocker moving this forward, I can remove it
>>> for the next version.
>>
>> It's a simple branch because the check isn't exhaustive. It just checks
>> the first page. At that point you may as well just require the caller to
>> flag the bio/rq as being P2P, and then do a check for P2P compatibility
>> with the queue.
> 
> Hmm, we had something like that in v4[1] but it just seemed redundant to
> create a flag when the information was already in the bio and kind of
> ugly for the caller to check for, then set, the flag. I'm not _that_
> averse to going back to that though...

The point is that the caller doesn't necessarily know where the bio
will end up, hence the caller can't fully check if the whole stack
supports P2P.

What happens if a P2P request ends up with a driver that doesn't
support it?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:45               ` Jens Axboe
  (?)
  (?)
@ 2018-09-05 19:53                 ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:53 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 01:45 PM, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.
> 
> What happens if a P2P request ends up with a driver that doesn't
> support it?

Yes, that's the whole point this check. Although we expect the caller to
do other checks before submitting a P2P request to a queue, so if a
driver does submit a P2P request to an unsupported queue, it is
definitely a problem in the driver (which is why we want to WARN).

Queues that support P2P (only PCI NVMe at this time, see patch 10) must
set QUEUE_FLAG_PCI_P2PDMA to indicate it. The check we are adding in
blk-core is meant to ensure any broken drivers that submit requests with
P2P memory do not get sent to a queue that doesn't indicate support.

On top of that, the code in NVMe target ensures that all namespaces on a
port are backed by queues that support P2P and, if not, it never
allocates any P2P SGLs.

Logan

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:53                 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:53 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 01:45 PM, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.
> 
> What happens if a P2P request ends up with a driver that doesn't
> support it?

Yes, that's the whole point this check. Although we expect the caller to
do other checks before submitting a P2P request to a queue, so if a
driver does submit a P2P request to an unsupported queue, it is
definitely a problem in the driver (which is why we want to WARN).

Queues that support P2P (only PCI NVMe at this time, see patch 10) must
set QUEUE_FLAG_PCI_P2PDMA to indicate it. The check we are adding in
blk-core is meant to ensure any broken drivers that submit requests with
P2P memory do not get sent to a queue that doesn't indicate support.

On top of that, the code in NVMe target ensures that all namespaces on a
port are backed by queues that support P2P and, if not, it never
allocates any P2P SGLs.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:53                 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:53 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 01:45 PM, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.
> 
> What happens if a P2P request ends up with a driver that doesn't
> support it?

Yes, that's the whole point this check. Although we expect the caller to
do other checks before submitting a P2P request to a queue, so if a
driver does submit a P2P request to an unsupported queue, it is
definitely a problem in the driver (which is why we want to WARN).

Queues that support P2P (only PCI NVMe at this time, see patch 10) must
set QUEUE_FLAG_PCI_P2PDMA to indicate it. The check we are adding in
blk-core is meant to ensure any broken drivers that submit requests with
P2P memory do not get sent to a queue that doesn't indicate support.

On top of that, the code in NVMe target ensures that all namespaces on a
port are backed by queues that support P2P and, if not, it never
allocates any P2P SGLs.

Logan


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:53                 ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 19:53 UTC (permalink / raw)




On 05/09/18 01:45 PM, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.
> 
> What happens if a P2P request ends up with a driver that doesn't
> support it?

Yes, that's the whole point this check. Although we expect the caller to
do other checks before submitting a P2P request to a queue, so if a
driver does submit a P2P request to an unsupported queue, it is
definitely a problem in the driver (which is why we want to WARN).

Queues that support P2P (only PCI NVMe at this time, see patch 10) must
set QUEUE_FLAG_PCI_P2PDMA to indicate it. The check we are adding in
blk-core is meant to ensure any broken drivers that submit requests with
P2P memory do not get sent to a queue that doesn't indicate support.

On top of that, the code in NVMe target ensures that all namespaces on a
port are backed by queues that support P2P and, if not, it never
allocates any P2P SGLs.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:56                 ` Christoph Hellwig
  (?)
  (?)
@ 2018-09-05 19:54                   ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>> The point is that the caller doesn't necessarily know where the bio
>> will end up, hence the caller can't fully check if the whole stack
>> supports P2P.
> 
> The caller must necessarily know where the bio will end up, as for P2P
> support we need to query if the bio target is P2P capable vs the
> source of the P2P memory.

Then what's the point of having the check at all?

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:54                   ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Logan Gunthorpe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block, Stephen Bates, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Jérôme Glisse, Benjamin Herrenschmidt,
	Alex Williamson, Christian König

On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>> The point is that the caller doesn't necessarily know where the bio
>> will end up, hence the caller can't fully check if the whole stack
>> supports P2P.
> 
> The caller must necessarily know where the bio will end up, as for P2P
> support we need to query if the bio target is P2P capable vs the
> source of the P2P memory.

Then what's the point of having the check at all?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:54                   ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Logan Gunthorpe, Christian König

On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>> The point is that the caller doesn't necessarily know where the bio
>> will end up, hence the caller can't fully check if the whole stack
>> supports P2P.
> 
> The caller must necessarily know where the bio will end up, as for P2P
> support we need to query if the bio target is P2P capable vs the
> source of the P2P memory.

Then what's the point of having the check at all?

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:54                   ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 19:54 UTC (permalink / raw)


On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018@01:45:04PM -0600, Jens Axboe wrote:
>> The point is that the caller doesn't necessarily know where the bio
>> will end up, hence the caller can't fully check if the whole stack
>> supports P2P.
> 
> The caller must necessarily know where the bio will end up, as for P2P
> support we need to query if the bio target is P2P capable vs the
> source of the P2P memory.

Then what's the point of having the check at all?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:45               ` Jens Axboe
                                   ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 19:56                 ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:56                 ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Logan Gunthorpe, Christoph Hellwig, linux-kernel, linux-pci,
	linux-nvme, linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:56                 ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:56                 ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 19:56                 ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 19:56 UTC (permalink / raw)


On Wed, Sep 05, 2018@01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:11                     ` Christoph Hellwig
                                         ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:09                       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:11 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>> The point is that the caller doesn't necessarily know where the bio
>>>> will end up, hence the caller can't fully check if the whole stack
>>>> supports P2P.
>>>
>>> The caller must necessarily know where the bio will end up, as for P2P
>>> support we need to query if the bio target is P2P capable vs the
>>> source of the P2P memory.
>>
>> Then what's the point of having the check at all?
> 
> Just an additional little safe guard.  If you think it isn't worth
> it I guess we can just drop it for now.

Yes, the point is to prevent driver writers from doing the wrong thing
by not doing the necessary checks before submitting to the queue.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:09                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 02:11 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>> The point is that the caller doesn't necessarily know where the bio
>>>> will end up, hence the caller can't fully check if the whole stack
>>>> supports P2P.
>>>
>>> The caller must necessarily know where the bio will end up, as for P2P
>>> support we need to query if the bio target is P2P capable vs the
>>> source of the P2P memory.
>>
>> Then what's the point of having the check at all?
> 
> Just an additional little safe guard.  If you think it isn't worth
> it I guess we can just drop it for now.

Yes, the point is to prevent driver writers from doing the wrong thing
by not doing the necessary checks before submitting to the queue.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:09                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:11 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>> The point is that the caller doesn't necessarily know where the bio
>>>> will end up, hence the caller can't fully check if the whole stack
>>>> supports P2P.
>>>
>>> The caller must necessarily know where the bio will end up, as for P2P
>>> support we need to query if the bio target is P2P capable vs the
>>> source of the P2P memory.
>>
>> Then what's the point of having the check at all?
> 
> Just an additional little safe guard.  If you think it isn't worth
> it I guess we can just drop it for now.

Yes, the point is to prevent driver writers from doing the wrong thing
by not doing the necessary checks before submitting to the queue.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:09                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 02:11 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>> The point is that the caller doesn't necessarily know where the bio
>>>> will end up, hence the caller can't fully check if the whole stack
>>>> supports P2P.
>>>
>>> The caller must necessarily know where the bio will end up, as for P2P
>>> support we need to query if the bio target is P2P capable vs the
>>> source of the P2P memory.
>>
>> Then what's the point of having the check at all?
> 
> Just an additional little safe guard.  If you think it isn't worth
> it I guess we can just drop it for now.

Yes, the point is to prevent driver writers from doing the wrong thing
by not doing the necessary checks before submitting to the queue.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:09                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:09 UTC (permalink / raw)




On 05/09/18 02:11 PM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018@01:54:31PM -0600, Jens Axboe wrote:
>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>> On Wed, Sep 05, 2018@01:45:04PM -0600, Jens Axboe wrote:
>>>> The point is that the caller doesn't necessarily know where the bio
>>>> will end up, hence the caller can't fully check if the whole stack
>>>> supports P2P.
>>>
>>> The caller must necessarily know where the bio will end up, as for P2P
>>> support we need to query if the bio target is P2P capable vs the
>>> source of the P2P memory.
>>
>> Then what's the point of having the check at all?
> 
> Just an additional little safe guard.  If you think it isn't worth
> it I guess we can just drop it for now.

Yes, the point is to prevent driver writers from doing the wrong thing
by not doing the necessary checks before submitting to the queue.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 19:54                   ` Jens Axboe
                                       ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:11                     ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 20:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:11                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 20:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-kernel, linux-pci,
	linux-nvme, linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:11                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 20:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:11                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 20:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Christian König, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Logan Gunthorpe, Christoph Hellwig

On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:11                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 20:11 UTC (permalink / raw)


On Wed, Sep 05, 2018@01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018@01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:09                       ` Logan Gunthorpe
  (?)
@ 2018-09-05 20:14                         ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:14 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 2:09 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:11 PM, Christoph Hellwig wrote:
>> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>>> The point is that the caller doesn't necessarily know where the bio
>>>>> will end up, hence the caller can't fully check if the whole stack
>>>>> supports P2P.
>>>>
>>>> The caller must necessarily know where the bio will end up, as for P2P
>>>> support we need to query if the bio target is P2P capable vs the
>>>> source of the P2P memory.
>>>
>>> Then what's the point of having the check at all?
>>
>> Just an additional little safe guard.  If you think it isn't worth
>> it I guess we can just drop it for now.
> 
> Yes, the point is to prevent driver writers from doing the wrong thing
> by not doing the necessary checks before submitting to the queue.

But if the caller must absolutely know where the bio will end up, then
it seems super redundant. So I'd vote for killing this check, it buys
us absolutely nothing and isn't even exhaustive in its current form.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:14                         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:14 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/5/18 2:09 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:11 PM, Christoph Hellwig wrote:
>> On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
>>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>>> On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
>>>>> The point is that the caller doesn't necessarily know where the bio
>>>>> will end up, hence the caller can't fully check if the whole stack
>>>>> supports P2P.
>>>>
>>>> The caller must necessarily know where the bio will end up, as for P2P
>>>> support we need to query if the bio target is P2P capable vs the
>>>> source of the P2P memory.
>>>
>>> Then what's the point of having the check at all?
>>
>> Just an additional little safe guard.  If you think it isn't worth
>> it I guess we can just drop it for now.
> 
> Yes, the point is to prevent driver writers from doing the wrong thing
> by not doing the necessary checks before submitting to the queue.

But if the caller must absolutely know where the bio will end up, then
it seems super redundant. So I'd vote for killing this check, it buys
us absolutely nothing and isn't even exhaustive in its current form.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:14                         ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:14 UTC (permalink / raw)


On 9/5/18 2:09 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:11 PM, Christoph Hellwig wrote:
>> On Wed, Sep 05, 2018@01:54:31PM -0600, Jens Axboe wrote:
>>> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
>>>> On Wed, Sep 05, 2018@01:45:04PM -0600, Jens Axboe wrote:
>>>>> The point is that the caller doesn't necessarily know where the bio
>>>>> will end up, hence the caller can't fully check if the whole stack
>>>>> supports P2P.
>>>>
>>>> The caller must necessarily know where the bio will end up, as for P2P
>>>> support we need to query if the bio target is P2P capable vs the
>>>> source of the P2P memory.
>>>
>>> Then what's the point of having the check at all?
>>
>> Just an additional little safe guard.  If you think it isn't worth
>> it I guess we can just drop it for now.
> 
> Yes, the point is to prevent driver writers from doing the wrong thing
> by not doing the necessary checks before submitting to the queue.

But if the caller must absolutely know where the bio will end up, then
it seems super redundant. So I'd vote for killing this check, it buys
us absolutely nothing and isn't even exhaustive in its current form.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:14                         ` Jens Axboe
                                             ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:18                           ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:14 PM, Jens Axboe wrote:
> But if the caller must absolutely know where the bio will end up, then
> it seems super redundant. So I'd vote for killing this check, it buys
> us absolutely nothing and isn't even exhaustive in its current form.


Ok, I'll remove it for v6.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:18                           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 02:14 PM, Jens Axboe wrote:
> But if the caller must absolutely know where the bio will end up, then
> it seems super redundant. So I'd vote for killing this check, it buys
> us absolutely nothing and isn't even exhaustive in its current form.


Ok, I'll remove it for v6.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:18                           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:14 PM, Jens Axboe wrote:
> But if the caller must absolutely know where the bio will end up, then
> it seems super redundant. So I'd vote for killing this check, it buys
> us absolutely nothing and isn't even exhaustive in its current form.


Ok, I'll remove it for v6.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:18                           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 02:14 PM, Jens Axboe wrote:
> But if the caller must absolutely know where the bio will end up, then
> it seems super redundant. So I'd vote for killing this check, it buys
> us absolutely nothing and isn't even exhaustive in its current form.


Ok, I'll remove it for v6.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:18                           ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:18 UTC (permalink / raw)




On 05/09/18 02:14 PM, Jens Axboe wrote:
> But if the caller must absolutely know where the bio will end up, then
> it seems super redundant. So I'd vote for killing this check, it buys
> us absolutely nothing and isn't even exhaustive in its current form.


Ok, I'll remove it for v6.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:18                           ` Logan Gunthorpe
                                               ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:19                             ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:19 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:14 PM, Jens Axboe wrote:
>> But if the caller must absolutely know where the bio will end up, then
>> it seems super redundant. So I'd vote for killing this check, it buys
>> us absolutely nothing and isn't even exhaustive in its current form.
> 
> 
> Ok, I'll remove it for v6.

Since the drivers needs to know it's doing it right, it might not
hurt to add a sanity check helper for that. Just have the driver
call it, and don't add it in the normal IO submission path.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:19                             ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:19 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:14 PM, Jens Axboe wrote:
>> But if the caller must absolutely know where the bio will end up, then
>> it seems super redundant. So I'd vote for killing this check, it buys
>> us absolutely nothing and isn't even exhaustive in its current form.
> 
> 
> Ok, I'll remove it for v6.

Since the drivers needs to know it's doing it right, it might not
hurt to add a sanity check helper for that. Just have the driver
call it, and don't add it in the normal IO submission path.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:19                             ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:19 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:14 PM, Jens Axboe wrote:
>> But if the caller must absolutely know where the bio will end up, then
>> it seems super redundant. So I'd vote for killing this check, it buys
>> us absolutely nothing and isn't even exhaustive in its current form.
> 
> 
> Ok, I'll remove it for v6.

Since the drivers needs to know it's doing it right, it might not
hurt to add a sanity check helper for that. Just have the driver
call it, and don't add it in the normal IO submission path.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:19                             ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:19 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König

On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:14 PM, Jens Axboe wrote:
>> But if the caller must absolutely know where the bio will end up, then
>> it seems super redundant. So I'd vote for killing this check, it buys
>> us absolutely nothing and isn't even exhaustive in its current form.
> 
> 
> Ok, I'll remove it for v6.

Since the drivers needs to know it's doing it right, it might not
hurt to add a sanity check helper for that. Just have the driver
call it, and don't add it in the normal IO submission path.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:19                             ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:19 UTC (permalink / raw)


On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:14 PM, Jens Axboe wrote:
>> But if the caller must absolutely know where the bio will end up, then
>> it seems super redundant. So I'd vote for killing this check, it buys
>> us absolutely nothing and isn't even exhaustive in its current form.
> 
> 
> Ok, I'll remove it for v6.

Since the drivers needs to know it's doing it right, it might not
hurt to add a sanity check helper for that. Just have the driver
call it, and don't add it in the normal IO submission path.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:19                             ` Jens Axboe
                                                 ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:32                               ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:19 PM, Jens Axboe wrote:
> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>> But if the caller must absolutely know where the bio will end up, then
>>> it seems super redundant. So I'd vote for killing this check, it buys
>>> us absolutely nothing and isn't even exhaustive in its current form.
>>
>>
>> Ok, I'll remove it for v6.
> 
> Since the drivers needs to know it's doing it right, it might not
> hurt to add a sanity check helper for that. Just have the driver
> call it, and don't add it in the normal IO submission path.

I'm not sure I really see the value in that. It's the same principle in
asking the driver to do the WARN: if the developer knew enough to use
the special helper, they probably knew well enough to do the rest correctly.

I guess one other thing to point out is that, on x86, if a driver
submits P2P pages to a PCI device that doesn't have kernel support,
everything will likely just work. Even though the driver isn't doing any
of the work correctly and the requests are not being mapped with
pci_p2pdma_map() functions. Such code on other arches would likely
break. So developers may be lulled into thinking they're doing the
correct thing when in fact they are not and the WARN in the common code
would prevent that.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:32                               ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 02:19 PM, Jens Axboe wrote:
> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>> But if the caller must absolutely know where the bio will end up, then
>>> it seems super redundant. So I'd vote for killing this check, it buys
>>> us absolutely nothing and isn't even exhaustive in its current form.
>>
>>
>> Ok, I'll remove it for v6.
> 
> Since the drivers needs to know it's doing it right, it might not
> hurt to add a sanity check helper for that. Just have the driver
> call it, and don't add it in the normal IO submission path.

I'm not sure I really see the value in that. It's the same principle in
asking the driver to do the WARN: if the developer knew enough to use
the special helper, they probably knew well enough to do the rest correctly.

I guess one other thing to point out is that, on x86, if a driver
submits P2P pages to a PCI device that doesn't have kernel support,
everything will likely just work. Even though the driver isn't doing any
of the work correctly and the requests are not being mapped with
pci_p2pdma_map() functions. Such code on other arches would likely
break. So developers may be lulled into thinking they're doing the
correct thing when in fact they are not and the WARN in the common code
would prevent that.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:32                               ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:19 PM, Jens Axboe wrote:
> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>> But if the caller must absolutely know where the bio will end up, then
>>> it seems super redundant. So I'd vote for killing this check, it buys
>>> us absolutely nothing and isn't even exhaustive in its current form.
>>
>>
>> Ok, I'll remove it for v6.
> 
> Since the drivers needs to know it's doing it right, it might not
> hurt to add a sanity check helper for that. Just have the driver
> call it, and don't add it in the normal IO submission path.

I'm not sure I really see the value in that. It's the same principle in
asking the driver to do the WARN: if the developer knew enough to use
the special helper, they probably knew well enough to do the rest correctly.

I guess one other thing to point out is that, on x86, if a driver
submits P2P pages to a PCI device that doesn't have kernel support,
everything will likely just work. Even though the driver isn't doing any
of the work correctly and the requests are not being mapped with
pci_p2pdma_map() functions. Such code on other arches would likely
break. So developers may be lulled into thinking they're doing the
correct thing when in fact they are not and the WARN in the common code
would prevent that.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:32                               ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 02:19 PM, Jens Axboe wrote:
> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>> But if the caller must absolutely know where the bio will end up, then
>>> it seems super redundant. So I'd vote for killing this check, it buys
>>> us absolutely nothing and isn't even exhaustive in its current form.
>>
>>
>> Ok, I'll remove it for v6.
> 
> Since the drivers needs to know it's doing it right, it might not
> hurt to add a sanity check helper for that. Just have the driver
> call it, and don't add it in the normal IO submission path.

I'm not sure I really see the value in that. It's the same principle in
asking the driver to do the WARN: if the developer knew enough to use
the special helper, they probably knew well enough to do the rest correctly.

I guess one other thing to point out is that, on x86, if a driver
submits P2P pages to a PCI device that doesn't have kernel support,
everything will likely just work. Even though the driver isn't doing any
of the work correctly and the requests are not being mapped with
pci_p2pdma_map() functions. Such code on other arches would likely
break. So developers may be lulled into thinking they're doing the
correct thing when in fact they are not and the WARN in the common code
would prevent that.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:32                               ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 20:32 UTC (permalink / raw)




On 05/09/18 02:19 PM, Jens Axboe wrote:
> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>> But if the caller must absolutely know where the bio will end up, then
>>> it seems super redundant. So I'd vote for killing this check, it buys
>>> us absolutely nothing and isn't even exhaustive in its current form.
>>
>>
>> Ok, I'll remove it for v6.
> 
> Since the drivers needs to know it's doing it right, it might not
> hurt to add a sanity check helper for that. Just have the driver
> call it, and don't add it in the normal IO submission path.

I'm not sure I really see the value in that. It's the same principle in
asking the driver to do the WARN: if the developer knew enough to use
the special helper, they probably knew well enough to do the rest correctly.

I guess one other thing to point out is that, on x86, if a driver
submits P2P pages to a PCI device that doesn't have kernel support,
everything will likely just work. Even though the driver isn't doing any
of the work correctly and the requests are not being mapped with
pci_p2pdma_map() functions. Such code on other arches would likely
break. So developers may be lulled into thinking they're doing the
correct thing when in fact they are not and the WARN in the common code
would prevent that.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:32                               ` Logan Gunthorpe
                                                   ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 20:36                                 ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:36 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:19 PM, Jens Axboe wrote:
>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>> But if the caller must absolutely know where the bio will end up, then
>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>
>>>
>>> Ok, I'll remove it for v6.
>>
>> Since the drivers needs to know it's doing it right, it might not
>> hurt to add a sanity check helper for that. Just have the driver
>> call it, and don't add it in the normal IO submission path.
> 
> I'm not sure I really see the value in that. It's the same principle in
> asking the driver to do the WARN: if the developer knew enough to use
> the special helper, they probably knew well enough to do the rest correctly.

I don't agree with that at all. It's a "is my request valid" helper,
it's not some obscure and rarely used functionality. You're making up
this API right now, if you really want it done for every IO, make it
part of the p2p submission process. You could even hide it behind a
debug thing, if you like.

> I guess one other thing to point out is that, on x86, if a driver
> submits P2P pages to a PCI device that doesn't have kernel support,
> everything will likely just work. Even though the driver isn't doing any
> of the work correctly and the requests are not being mapped with
> pci_p2pdma_map() functions. Such code on other arches would likely
> break. So developers may be lulled into thinking they're doing the
> correct thing when in fact they are not and the WARN in the common code
> would prevent that.

If you're adamant about having it in common code, put it in your
common submission code. Most folks aren't going to care about P2P, let the
ones that do have the checks.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:36                                 ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:36 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:19 PM, Jens Axboe wrote:
>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>> But if the caller must absolutely know where the bio will end up, then
>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>
>>>
>>> Ok, I'll remove it for v6.
>>
>> Since the drivers needs to know it's doing it right, it might not
>> hurt to add a sanity check helper for that. Just have the driver
>> call it, and don't add it in the normal IO submission path.
> 
> I'm not sure I really see the value in that. It's the same principle in
> asking the driver to do the WARN: if the developer knew enough to use
> the special helper, they probably knew well enough to do the rest correctly.

I don't agree with that at all. It's a "is my request valid" helper,
it's not some obscure and rarely used functionality. You're making up
this API right now, if you really want it done for every IO, make it
part of the p2p submission process. You could even hide it behind a
debug thing, if you like.

> I guess one other thing to point out is that, on x86, if a driver
> submits P2P pages to a PCI device that doesn't have kernel support,
> everything will likely just work. Even though the driver isn't doing any
> of the work correctly and the requests are not being mapped with
> pci_p2pdma_map() functions. Such code on other arches would likely
> break. So developers may be lulled into thinking they're doing the
> correct thing when in fact they are not and the WARN in the common code
> would prevent that.

If you're adamant about having it in common code, put it in your
common submission code. Most folks aren't going to care about P2P, let the
ones that do have the checks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:36                                 ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:36 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:19 PM, Jens Axboe wrote:
>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>> But if the caller must absolutely know where the bio will end up, then
>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>
>>>
>>> Ok, I'll remove it for v6.
>>
>> Since the drivers needs to know it's doing it right, it might not
>> hurt to add a sanity check helper for that. Just have the driver
>> call it, and don't add it in the normal IO submission path.
> 
> I'm not sure I really see the value in that. It's the same principle in
> asking the driver to do the WARN: if the developer knew enough to use
> the special helper, they probably knew well enough to do the rest correctly.

I don't agree with that at all. It's a "is my request valid" helper,
it's not some obscure and rarely used functionality. You're making up
this API right now, if you really want it done for every IO, make it
part of the p2p submission process. You could even hide it behind a
debug thing, if you like.

> I guess one other thing to point out is that, on x86, if a driver
> submits P2P pages to a PCI device that doesn't have kernel support,
> everything will likely just work. Even though the driver isn't doing any
> of the work correctly and the requests are not being mapped with
> pci_p2pdma_map() functions. Such code on other arches would likely
> break. So developers may be lulled into thinking they're doing the
> correct thing when in fact they are not and the WARN in the common code
> would prevent that.

If you're adamant about having it in common code, put it in your
common submission code. Most folks aren't going to care about P2P, let the
ones that do have the checks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:36                                 ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:36 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König

On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:19 PM, Jens Axboe wrote:
>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>> But if the caller must absolutely know where the bio will end up, then
>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>
>>>
>>> Ok, I'll remove it for v6.
>>
>> Since the drivers needs to know it's doing it right, it might not
>> hurt to add a sanity check helper for that. Just have the driver
>> call it, and don't add it in the normal IO submission path.
> 
> I'm not sure I really see the value in that. It's the same principle in
> asking the driver to do the WARN: if the developer knew enough to use
> the special helper, they probably knew well enough to do the rest correctly.

I don't agree with that at all. It's a "is my request valid" helper,
it's not some obscure and rarely used functionality. You're making up
this API right now, if you really want it done for every IO, make it
part of the p2p submission process. You could even hide it behind a
debug thing, if you like.

> I guess one other thing to point out is that, on x86, if a driver
> submits P2P pages to a PCI device that doesn't have kernel support,
> everything will likely just work. Even though the driver isn't doing any
> of the work correctly and the requests are not being mapped with
> pci_p2pdma_map() functions. Such code on other arches would likely
> break. So developers may be lulled into thinking they're doing the
> correct thing when in fact they are not and the WARN in the common code
> would prevent that.

If you're adamant about having it in common code, put it in your
common submission code. Most folks aren't going to care about P2P, let the
ones that do have the checks.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 20:36                                 ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 20:36 UTC (permalink / raw)


On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:19 PM, Jens Axboe wrote:
>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>> But if the caller must absolutely know where the bio will end up, then
>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>
>>>
>>> Ok, I'll remove it for v6.
>>
>> Since the drivers needs to know it's doing it right, it might not
>> hurt to add a sanity check helper for that. Just have the driver
>> call it, and don't add it in the normal IO submission path.
> 
> I'm not sure I really see the value in that. It's the same principle in
> asking the driver to do the WARN: if the developer knew enough to use
> the special helper, they probably knew well enough to do the rest correctly.

I don't agree with that at all. It's a "is my request valid" helper,
it's not some obscure and rarely used functionality. You're making up
this API right now, if you really want it done for every IO, make it
part of the p2p submission process. You could even hide it behind a
debug thing, if you like.

> I guess one other thing to point out is that, on x86, if a driver
> submits P2P pages to a PCI device that doesn't have kernel support,
> everything will likely just work. Even though the driver isn't doing any
> of the work correctly and the requests are not being mapped with
> pci_p2pdma_map() functions. Such code on other arches would likely
> break. So developers may be lulled into thinking they're doing the
> correct thing when in fact they are not and the WARN in the common code
> would prevent that.

If you're adamant about having it in common code, put it in your
common submission code. Most folks aren't going to care about P2P, let the
ones that do have the checks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 20:36                                 ` Jens Axboe
                                                     ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 21:03                                   ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 21:03 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:36 PM, Jens Axboe wrote:
> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>
>>>>
>>>> Ok, I'll remove it for v6.
>>>
>>> Since the drivers needs to know it's doing it right, it might not
>>> hurt to add a sanity check helper for that. Just have the driver
>>> call it, and don't add it in the normal IO submission path.
>>
>> I'm not sure I really see the value in that. It's the same principle in
>> asking the driver to do the WARN: if the developer knew enough to use
>> the special helper, they probably knew well enough to do the rest correctly.
> 
> I don't agree with that at all. It's a "is my request valid" helper,
> it's not some obscure and rarely used functionality. You're making up
> this API right now, if you really want it done for every IO, make it
> part of the p2p submission process. You could even hide it behind a
> debug thing, if you like.

There is no special p2p submission process. In the nvme-of case we are
using the existing process and with the code in blk-core it didn't
change it's process at all. Creating a helper will create one and I can
look at making a pci_p2pdma_submit_bio() for v6; but if the developer
screws up and still calls the regular submit_bio() things will only be
very subtly broken and that won't be obvious.

Logan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:03                                   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 21:03 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König



On 05/09/18 02:36 PM, Jens Axboe wrote:
> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>
>>>>
>>>> Ok, I'll remove it for v6.
>>>
>>> Since the drivers needs to know it's doing it right, it might not
>>> hurt to add a sanity check helper for that. Just have the driver
>>> call it, and don't add it in the normal IO submission path.
>>
>> I'm not sure I really see the value in that. It's the same principle in
>> asking the driver to do the WARN: if the developer knew enough to use
>> the special helper, they probably knew well enough to do the rest correctly.
> 
> I don't agree with that at all. It's a "is my request valid" helper,
> it's not some obscure and rarely used functionality. You're making up
> this API right now, if you really want it done for every IO, make it
> part of the p2p submission process. You could even hide it behind a
> debug thing, if you like.

There is no special p2p submission process. In the nvme-of case we are
using the existing process and with the code in blk-core it didn't
change it's process at all. Creating a helper will create one and I can
look at making a pci_p2pdma_submit_bio() for v6; but if the developer
screws up and still calls the regular submit_bio() things will only be
very subtly broken and that won't be obvious.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:03                                   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 21:03 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 05/09/18 02:36 PM, Jens Axboe wrote:
> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>
>>>>
>>>> Ok, I'll remove it for v6.
>>>
>>> Since the drivers needs to know it's doing it right, it might not
>>> hurt to add a sanity check helper for that. Just have the driver
>>> call it, and don't add it in the normal IO submission path.
>>
>> I'm not sure I really see the value in that. It's the same principle in
>> asking the driver to do the WARN: if the developer knew enough to use
>> the special helper, they probably knew well enough to do the rest correctly.
> 
> I don't agree with that at all. It's a "is my request valid" helper,
> it's not some obscure and rarely used functionality. You're making up
> this API right now, if you really want it done for every IO, make it
> part of the p2p submission process. You could even hide it behind a
> debug thing, if you like.

There is no special p2p submission process. In the nvme-of case we are
using the existing process and with the code in blk-core it didn't
change it's process at all. Creating a helper will create one and I can
look at making a pci_p2pdma_submit_bio() for v6; but if the developer
screws up and still calls the regular submit_bio() things will only be
very subtly broken and that won't be obvious.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:03                                   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 21:03 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König



On 05/09/18 02:36 PM, Jens Axboe wrote:
> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>
>>>>
>>>> Ok, I'll remove it for v6.
>>>
>>> Since the drivers needs to know it's doing it right, it might not
>>> hurt to add a sanity check helper for that. Just have the driver
>>> call it, and don't add it in the normal IO submission path.
>>
>> I'm not sure I really see the value in that. It's the same principle in
>> asking the driver to do the WARN: if the developer knew enough to use
>> the special helper, they probably knew well enough to do the rest correctly.
> 
> I don't agree with that at all. It's a "is my request valid" helper,
> it's not some obscure and rarely used functionality. You're making up
> this API right now, if you really want it done for every IO, make it
> part of the p2p submission process. You could even hide it behind a
> debug thing, if you like.

There is no special p2p submission process. In the nvme-of case we are
using the existing process and with the code in blk-core it didn't
change it's process at all. Creating a helper will create one and I can
look at making a pci_p2pdma_submit_bio() for v6; but if the developer
screws up and still calls the regular submit_bio() things will only be
very subtly broken and that won't be obvious.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:03                                   ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-05 21:03 UTC (permalink / raw)




On 05/09/18 02:36 PM, Jens Axboe wrote:
> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>
>>
>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>
>>>>
>>>> Ok, I'll remove it for v6.
>>>
>>> Since the drivers needs to know it's doing it right, it might not
>>> hurt to add a sanity check helper for that. Just have the driver
>>> call it, and don't add it in the normal IO submission path.
>>
>> I'm not sure I really see the value in that. It's the same principle in
>> asking the driver to do the WARN: if the developer knew enough to use
>> the special helper, they probably knew well enough to do the rest correctly.
> 
> I don't agree with that at all. It's a "is my request valid" helper,
> it's not some obscure and rarely used functionality. You're making up
> this API right now, if you really want it done for every IO, make it
> part of the p2p submission process. You could even hide it behind a
> debug thing, if you like.

There is no special p2p submission process. In the nvme-of case we are
using the existing process and with the code in blk-core it didn't
change it's process at all. Creating a helper will create one and I can
look at making a pci_p2pdma_submit_bio() for v6; but if the developer
screws up and still calls the regular submit_bio() things will only be
very subtly broken and that won't be obvious.

Logan

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 21:03                                   ` Logan Gunthorpe
                                                       ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 21:13                                     ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 21:13 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:13                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 21:13 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-pci,
	linux-nvme, linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:13                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 21:13 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:13                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 21:13 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Keith Busch, Alex Williamson, Sagi Grimberg,
	linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	Stephen Bates, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Dan Williams, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:13                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-05 21:13 UTC (permalink / raw)


On Wed, Sep 05, 2018@03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 21:03                                   ` Logan Gunthorpe
                                                       ` (2 preceding siblings ...)
  (?)
@ 2018-09-05 21:18                                     ` Jens Axboe
  -1 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 21:18 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 3:03 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:36 PM, Jens Axboe wrote:
>> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>>
>>>>>
>>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>>
>>>>>
>>>>> Ok, I'll remove it for v6.
>>>>
>>>> Since the drivers needs to know it's doing it right, it might not
>>>> hurt to add a sanity check helper for that. Just have the driver
>>>> call it, and don't add it in the normal IO submission path.
>>>
>>> I'm not sure I really see the value in that. It's the same principle in
>>> asking the driver to do the WARN: if the developer knew enough to use
>>> the special helper, they probably knew well enough to do the rest correctly.
>>
>> I don't agree with that at all. It's a "is my request valid" helper,
>> it's not some obscure and rarely used functionality. You're making up
>> this API right now, if you really want it done for every IO, make it
>> part of the p2p submission process. You could even hide it behind a
>> debug thing, if you like.
> 
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I'm very sure that something that basic will be caught in review. I
don't care if you wrap the submission or just require the caller to
call some validity helper check first, fwiw.

And I think we're done beating the dead horse at this point.

-- 
Jens Axboe

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:18                                     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 21:18 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: linux-kernel, linux-pci, linux-nvme, linux-rdma, linux-nvdimm,
	linux-block, Stephen Bates, Keith Busch, Sagi Grimberg,
	Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König

On 9/5/18 3:03 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:36 PM, Jens Axboe wrote:
>> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>>
>>>>>
>>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>>
>>>>>
>>>>> Ok, I'll remove it for v6.
>>>>
>>>> Since the drivers needs to know it's doing it right, it might not
>>>> hurt to add a sanity check helper for that. Just have the driver
>>>> call it, and don't add it in the normal IO submission path.
>>>
>>> I'm not sure I really see the value in that. It's the same principle in
>>> asking the driver to do the WARN: if the developer knew enough to use
>>> the special helper, they probably knew well enough to do the rest correctly.
>>
>> I don't agree with that at all. It's a "is my request valid" helper,
>> it's not some obscure and rarely used functionality. You're making up
>> this API right now, if you really want it done for every IO, make it
>> part of the p2p submission process. You could even hide it behind a
>> debug thing, if you like.
> 
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I'm very sure that something that basic will be caught in review. I
don't care if you wrap the submission or just require the caller to
call some validity helper check first, fwiw.

And I think we're done beating the dead horse at this point.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:18                                     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 21:18 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König

On 9/5/18 3:03 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:36 PM, Jens Axboe wrote:
>> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>>
>>>>>
>>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>>
>>>>>
>>>>> Ok, I'll remove it for v6.
>>>>
>>>> Since the drivers needs to know it's doing it right, it might not
>>>> hurt to add a sanity check helper for that. Just have the driver
>>>> call it, and don't add it in the normal IO submission path.
>>>
>>> I'm not sure I really see the value in that. It's the same principle in
>>> asking the driver to do the WARN: if the developer knew enough to use
>>> the special helper, they probably knew well enough to do the rest correctly.
>>
>> I don't agree with that at all. It's a "is my request valid" helper,
>> it's not some obscure and rarely used functionality. You're making up
>> this API right now, if you really want it done for every IO, make it
>> part of the p2p submission process. You could even hide it behind a
>> debug thing, if you like.
> 
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I'm very sure that something that basic will be caught in review. I
don't care if you wrap the submission or just require the caller to
call some validity helper check first, fwiw.

And I think we're done beating the dead horse at this point.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:18                                     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 21:18 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig
  Cc: Keith Busch, Alex Williamson, Sagi Grimberg, linux-nvdimm,
	linux-rdma, linux-pci, linux-kernel, linux-nvme, Stephen Bates,
	linux-block, Jérôme Glisse, Jason Gunthorpe,
	Benjamin Herrenschmidt, Bjorn Helgaas, Max Gurtovoy,
	Dan Williams, Christian König

On 9/5/18 3:03 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:36 PM, Jens Axboe wrote:
>> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>>
>>>>>
>>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>>
>>>>>
>>>>> Ok, I'll remove it for v6.
>>>>
>>>> Since the drivers needs to know it's doing it right, it might not
>>>> hurt to add a sanity check helper for that. Just have the driver
>>>> call it, and don't add it in the normal IO submission path.
>>>
>>> I'm not sure I really see the value in that. It's the same principle in
>>> asking the driver to do the WARN: if the developer knew enough to use
>>> the special helper, they probably knew well enough to do the rest correctly.
>>
>> I don't agree with that at all. It's a "is my request valid" helper,
>> it's not some obscure and rarely used functionality. You're making up
>> this API right now, if you really want it done for every IO, make it
>> part of the p2p submission process. You could even hide it behind a
>> debug thing, if you like.
> 
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I'm very sure that something that basic will be caught in review. I
don't care if you wrap the submission or just require the caller to
call some validity helper check first, fwiw.

And I think we're done beating the dead horse at this point.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-05 21:18                                     ` Jens Axboe
  0 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2018-09-05 21:18 UTC (permalink / raw)


On 9/5/18 3:03 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/09/18 02:36 PM, Jens Axboe wrote:
>> On 9/5/18 2:32 PM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 05/09/18 02:19 PM, Jens Axboe wrote:
>>>> On 9/5/18 2:18 PM, Logan Gunthorpe wrote:
>>>>>
>>>>>
>>>>> On 05/09/18 02:14 PM, Jens Axboe wrote:
>>>>>> But if the caller must absolutely know where the bio will end up, then
>>>>>> it seems super redundant. So I'd vote for killing this check, it buys
>>>>>> us absolutely nothing and isn't even exhaustive in its current form.
>>>>>
>>>>>
>>>>> Ok, I'll remove it for v6.
>>>>
>>>> Since the drivers needs to know it's doing it right, it might not
>>>> hurt to add a sanity check helper for that. Just have the driver
>>>> call it, and don't add it in the normal IO submission path.
>>>
>>> I'm not sure I really see the value in that. It's the same principle in
>>> asking the driver to do the WARN: if the developer knew enough to use
>>> the special helper, they probably knew well enough to do the rest correctly.
>>
>> I don't agree with that at all. It's a "is my request valid" helper,
>> it's not some obscure and rarely used functionality. You're making up
>> this API right now, if you really want it done for every IO, make it
>> part of the p2p submission process. You could even hide it behind a
>> debug thing, if you like.
> 
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I'm very sure that something that basic will be caught in review. I
don't care if you wrap the submission or just require the caller to
call some validity helper check first, fwiw.

And I think we're done beating the dead horse at this point.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-05 21:03                                   ` Logan Gunthorpe
                                                       ` (2 preceding siblings ...)
  (?)
@ 2018-09-10 16:41                                     ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-10 16:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 16:41                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-10 16:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-pci,
	linux-nvme, linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 16:41                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-10 16:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 16:41                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-10 16:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Keith Busch, Alex Williamson, Sagi Grimberg,
	linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	Stephen Bates, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Dan Williams, Christoph Hellwig

On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 16:41                                     ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-10 16:41 UTC (permalink / raw)


On Wed, Sep 05, 2018@03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-10 16:41                                     ` Christoph Hellwig
                                                         ` (2 preceding siblings ...)
  (?)
@ 2018-09-10 18:11                                       ` Logan Gunthorpe
  -1 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-10 18:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 10/09/18 10:41 AM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
>> There is no special p2p submission process. In the nvme-of case we are
>> using the existing process and with the code in blk-core it didn't
>> change it's process at all. Creating a helper will create one and I can
>> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
>> screws up and still calls the regular submit_bio() things will only be
>> very subtly broken and that won't be obvious.
> 
> I just saw you added that "helper" in your tree.  Please don't, it is
> a negative value add as it doesn't help anything with the checking.

Alright, so what's the consensus then? Just have a check in
nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate? Jens is
pretty dead set against adding to the common path.

Logan


P.S. Here's the commit in question for anyone else on the list:

https://github.com/sbates130272/linux-p2pmem/commit/eeabe0bc94491d3eec4fe872274a9e3b4cdea538
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 18:11                                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-10 18:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-kernel, linux-pci, linux-nvme, linux-rdma,
	linux-nvdimm, linux-block, Stephen Bates, Keith Busch,
	Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe, Max Gurtovoy,
	Dan Williams, Jérôme Glisse, Benjamin Herrenschmidt,
	Alex Williamson, Christian König



On 10/09/18 10:41 AM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
>> There is no special p2p submission process. In the nvme-of case we are
>> using the existing process and with the code in blk-core it didn't
>> change it's process at all. Creating a helper will create one and I can
>> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
>> screws up and still calls the regular submit_bio() things will only be
>> very subtly broken and that won't be obvious.
> 
> I just saw you added that "helper" in your tree.  Please don't, it is
> a negative value add as it doesn't help anything with the checking.

Alright, so what's the consensus then? Just have a check in
nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate? Jens is
pretty dead set against adding to the common path.

Logan


P.S. Here's the commit in question for anyone else on the list:

https://github.com/sbates130272/linux-p2pmem/commit/eeabe0bc94491d3eec4fe872274a9e3b4cdea538

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 18:11                                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-10 18:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Christian König



On 10/09/18 10:41 AM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
>> There is no special p2p submission process. In the nvme-of case we are
>> using the existing process and with the code in blk-core it didn't
>> change it's process at all. Creating a helper will create one and I can
>> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
>> screws up and still calls the regular submit_bio() things will only be
>> very subtly broken and that won't be obvious.
> 
> I just saw you added that "helper" in your tree.  Please don't, it is
> a negative value add as it doesn't help anything with the checking.

Alright, so what's the consensus then? Just have a check in
nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate? Jens is
pretty dead set against adding to the common path.

Logan


P.S. Here's the commit in question for anyone else on the list:

https://github.com/sbates130272/linux-p2pmem/commit/eeabe0bc94491d3eec4fe872274a9e3b4cdea538

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 18:11                                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-10 18:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Alex Williamson, Sagi Grimberg,
	linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	Stephen Bates, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
	Max Gurtovoy, Dan Williams, Christian König



On 10/09/18 10:41 AM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
>> There is no special p2p submission process. In the nvme-of case we are
>> using the existing process and with the code in blk-core it didn't
>> change it's process at all. Creating a helper will create one and I can
>> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
>> screws up and still calls the regular submit_bio() things will only be
>> very subtly broken and that won't be obvious.
> 
> I just saw you added that "helper" in your tree.  Please don't, it is
> a negative value add as it doesn't help anything with the checking.

Alright, so what's the consensus then? Just have a check in
nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate? Jens is
pretty dead set against adding to the common path.

Logan


P.S. Here's the commit in question for anyone else on the list:

https://github.com/sbates130272/linux-p2pmem/commit/eeabe0bc94491d3eec4fe872274a9e3b4cdea538

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-10 18:11                                       ` Logan Gunthorpe
  0 siblings, 0 replies; 265+ messages in thread
From: Logan Gunthorpe @ 2018-09-10 18:11 UTC (permalink / raw)




On 10/09/18 10:41 AM, Christoph Hellwig wrote:
> On Wed, Sep 05, 2018@03:03:18PM -0600, Logan Gunthorpe wrote:
>> There is no special p2p submission process. In the nvme-of case we are
>> using the existing process and with the code in blk-core it didn't
>> change it's process at all. Creating a helper will create one and I can
>> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
>> screws up and still calls the regular submit_bio() things will only be
>> very subtly broken and that won't be obvious.
> 
> I just saw you added that "helper" in your tree.  Please don't, it is
> a negative value add as it doesn't help anything with the checking.

Alright, so what's the consensus then? Just have a check in
nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate? Jens is
pretty dead set against adding to the common path.

Logan


P.S. Here's the commit in question for anyone else on the list:

https://github.com/sbates130272/linux-p2pmem/commit/eeabe0bc94491d3eec4fe872274a9e3b4cdea538

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
  2018-09-10 18:11                                       ` Logan Gunthorpe
  (?)
  (?)
@ 2018-09-11  7:10                                         ` Christoph Hellwig
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-11  7:10 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Alex Williamson, linux-nvdimm, linux-rdma, linux-pci,
	linux-kernel, linux-nvme, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Christoph Hellwig

On Mon, Sep 10, 2018 at 12:11:01PM -0600, Logan Gunthorpe wrote:
> > I just saw you added that "helper" in your tree.  Please don't, it is
> > a negative value add as it doesn't help anything with the checking.
> 
> Alright, so what's the consensus then? Just have a check in
> nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate?

Yes.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-11  7:10                                         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-11  7:10 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-pci,
	linux-nvme, linux-rdma, linux-nvdimm, linux-block, Stephen Bates,
	Keith Busch, Sagi Grimberg, Bjorn Helgaas, Jason Gunthorpe,
	Max Gurtovoy, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Mon, Sep 10, 2018 at 12:11:01PM -0600, Logan Gunthorpe wrote:
> > I just saw you added that "helper" in your tree.  Please don't, it is
> > a negative value add as it doesn't help anything with the checking.
> 
> Alright, so what's the consensus then? Just have a check in
> nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate?

Yes.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-11  7:10                                         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-11  7:10 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Keith Busch, Alex Williamson, Sagi Grimberg,
	linux-nvdimm, linux-rdma, linux-pci, linux-kernel, linux-nvme,
	Stephen Bates, linux-block, Jérôme Glisse,
	Jason Gunthorpe, Christian König, Benjamin Herrenschmidt,
	Bjorn Helgaas, Max Gurtovoy, Dan Williams, Christoph Hellwig

On Mon, Sep 10, 2018 at 12:11:01PM -0600, Logan Gunthorpe wrote:
> > I just saw you added that "helper" in your tree.  Please don't, it is
> > a negative value add as it doesn't help anything with the checking.
> 
> Alright, so what's the consensus then? Just have a check in
> nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate?

Yes.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests
@ 2018-09-11  7:10                                         ` Christoph Hellwig
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Hellwig @ 2018-09-11  7:10 UTC (permalink / raw)


On Mon, Sep 10, 2018@12:11:01PM -0600, Logan Gunthorpe wrote:
> > I just saw you added that "helper" in your tree.  Please don't, it is
> > a negative value add as it doesn't help anything with the checking.
> 
> Alright, so what's the consensus then? Just have a check in
> nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate?

Yes.

^ permalink raw reply	[flat|nested] 265+ messages in thread

end of thread, other threads:[~2018-09-11  7:10 UTC | newest]

Thread overview: 265+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 18:53 [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  8:04   ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31  8:04     ` Christian König
2018-08-31 15:48     ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-08-31 15:48       ` Logan Gunthorpe
2018-09-01  8:27       ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-09-01  8:27         ` Christoph Hellwig
2018-08-31 16:19   ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:19     ` Jonathan Cameron
2018-08-31 16:26     ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-31 16:26       ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 05/13] docs-rst: Add a new directory for PCI documentation Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:34   ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31  0:34     ` Randy Dunlap
2018-08-31 15:44     ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31 15:44       ` Logan Gunthorpe
2018-08-31  8:08   ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31  8:08     ` Christian König
2018-08-31 15:51     ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 15:51       ` Logan Gunthorpe
2018-08-31 17:38       ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 17:38         ` Christian König
2018-08-31 19:11         ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-31 19:11           ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 19:11   ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:11     ` Jens Axboe
2018-08-30 19:17     ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:17       ` Logan Gunthorpe
2018-08-30 19:19       ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-08-30 19:19         ` Jens Axboe
2018-09-01  8:28     ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-01  8:28       ` Christoph Hellwig
2018-09-03 22:26       ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-03 22:26         ` Logan Gunthorpe
2018-09-05 19:26         ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:26           ` Jens Axboe
2018-09-05 19:33           ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:33             ` Logan Gunthorpe
2018-09-05 19:45             ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:45               ` Jens Axboe
2018-09-05 19:53               ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:53                 ` Logan Gunthorpe
2018-09-05 19:56               ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:56                 ` Christoph Hellwig
2018-09-05 19:54                 ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 19:54                   ` Jens Axboe
2018-09-05 20:11                   ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:11                     ` Christoph Hellwig
2018-09-05 20:09                     ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:09                       ` Logan Gunthorpe
2018-09-05 20:14                       ` Jens Axboe
2018-09-05 20:14                         ` Jens Axboe
2018-09-05 20:14                         ` Jens Axboe
2018-09-05 20:18                         ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:18                           ` Logan Gunthorpe
2018-09-05 20:19                           ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:19                             ` Jens Axboe
2018-09-05 20:32                             ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:32                               ` Logan Gunthorpe
2018-09-05 20:36                               ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 20:36                                 ` Jens Axboe
2018-09-05 21:03                                 ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:03                                   ` Logan Gunthorpe
2018-09-05 21:13                                   ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:13                                     ` Christoph Hellwig
2018-09-05 21:18                                   ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-05 21:18                                     ` Jens Axboe
2018-09-10 16:41                                   ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 16:41                                     ` Christoph Hellwig
2018-09-10 18:11                                     ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-10 18:11                                       ` Logan Gunthorpe
2018-09-11  7:10                                       ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-09-11  7:10                                         ` Christoph Hellwig
2018-08-30 18:53 ` [PATCH v5 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:18   ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-31  0:18     ` Sagi Grimberg
2018-08-30 18:53 ` [PATCH v5 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-09-04 15:16   ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:16     ` Jason Gunthorpe
2018-09-04 15:47     ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-04 15:47       ` Logan Gunthorpe
2018-09-05 19:22       ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-09-05 19:22         ` Christoph Hellwig
2018-08-30 18:53 ` [PATCH v5 11/13] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53 ` [PATCH v5 12/13] nvmet: Introduce helper functions to allocate and free request SGLs Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:14   ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-31  0:14     ` Sagi Grimberg
2018-08-30 18:53 ` [PATCH v5 13/13] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-30 18:53   ` Logan Gunthorpe
2018-08-31  0:25   ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31  0:25     ` Sagi Grimberg
2018-08-31 15:41     ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-31 15:41       ` Logan Gunthorpe
2018-08-30 19:20 ` [PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:20   ` Jerome Glisse
2018-08-30 19:30   ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe
2018-08-30 19:30     ` Logan Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.