linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] Add new DMA mapping operation for P2PDMA
@ 2021-04-08 17:01 Logan Gunthorpe
  2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
                   ` (18 more replies)
  0 siblings, 19 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Hi,

This patchset continues my work to to add P2PDMA support to the common
dma map operations. This allows for creating SGLs that have both P2PDMA
and regular pages which is a necessary step to allowing P2PDMA pages in
userspace.

The earlier RFC[1] generated a lot of great feedback and I heard no show
stopping objections. Thus, I've incorporated all the feedback and have
decided to post this as a proper patch series with hopes of eventually
getting it in mainline.

I'm happy to do a few more passes if anyone has any further feedback
or better ideas.

This series is based on v5.12-rc6 and a git branch can be found here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_map_ops_v1

Thanks,

Logan

[1] https://lore.kernel.org/linux-block/20210311233142.7900-1-logang@deltatee.com/


Changes since the RFC:
 * Added comment and fixed up the pci_get_slot patch. (per Bjorn)
 * Fixed glaring sg_phys() double offset bug. (per Robin)
 * Created a new map operation (dma_map_sg_p2pdma()) with a new calling
   convention instead of modifying the calling convention of
   dma_map_sg(). (per Robin)
 * Integrated the two similar pci_p2pdma_dma_map_type() and
   pci_p2pdma_map_type() functions into one (per Ira)
 * Reworked some of the logic in the map_sg() implementations into
   helpers in the p2pdma code. (per Christoph)
 * Dropped a bunch of unnecessary symbol exports (per Christoph)
 * Expanded the code in dma_pci_p2pdma_supported() for clarity. (per
   Ira and Christoph)
 * Finished off using the new dma_map_sg_p2pdma() call in rdma_rw
   and removed the old pci_p2pdma_[un]map_sg(). (per Jason)

--

Logan Gunthorpe (16):
  PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  dma-mapping: Introduce dma_map_sg_p2pdma()
  lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  nvme-rdma: Ensure dma support when using p2pdma
  RDMA/rw: use dma_map_sg_p2pdma()
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

 drivers/infiniband/core/rw.c |  50 +++-------
 drivers/iommu/dma-iommu.c    |  66 ++++++++++--
 drivers/nvme/host/core.c     |   3 +-
 drivers/nvme/host/nvme.h     |   2 +-
 drivers/nvme/host/pci.c      |  39 ++++----
 drivers/nvme/target/rdma.c   |   3 +-
 drivers/pci/Kconfig          |   2 +-
 drivers/pci/p2pdma.c         | 188 +++++++++++++++++++----------------
 include/linux/dma-map-ops.h  |   3 +
 include/linux/dma-mapping.h  |  20 ++++
 include/linux/pci-p2pdma.h   |  53 ++++++----
 include/linux/scatterlist.h  |  49 ++++++++-
 include/rdma/ib_verbs.h      |  32 ++++++
 kernel/dma/direct.c          |  25 ++++-
 kernel/dma/mapping.c         |  70 +++++++++++--
 15 files changed, 416 insertions(+), 189 deletions(-)


base-commit: e49d033bddf5b565044e2abe4241353959bc9120
--
2.20.1

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02  3:58   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe, Bjorn Helgaas

In order to call upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. The only reason it does sleep is to allocate the seqbuf
to print which devices are within the ACS path.

Switch the kmalloc call to use a passed in gfp_mask and don't print that
message if the buffer fails to be allocated.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 196382630363..bd89437faf06 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
 
 static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
 {
-	if (!buf)
+	if (!buf || !buf->buffer)
 		return;
 
 	seq_buf_printf(buf, "%s;", pci_name(pdev));
@@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
 
 static enum pci_p2pdma_map_type
 upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
-			      int *dist)
+			      int *dist, gfp_t gfp_mask)
 {
 	struct seq_buf acs_list;
 	bool acs_redirects;
 	int ret;
 
-	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
-	if (!acs_list.buffer)
-		return -ENOMEM;
+	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
 
 	ret = upstream_bridge_distance(provider, client, dist, &acs_redirects,
 				       &acs_list);
 	if (acs_redirects) {
 		pci_warn(client, "ACS redirect is set between the client and provider (%s)\n",
 			 pci_name(provider));
-		/* Drop final semicolon */
-		acs_list.buffer[acs_list.len-1] = 0;
-		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
-			 acs_list.buffer);
+
+		if (acs_list.buffer) {
+			/* Drop final semicolon */
+			acs_list.buffer[acs_list.len - 1] = 0;
+			pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+				 acs_list.buffer);
+		}
 	}
 
 	if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
@@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 
 		if (verbose)
 			ret = upstream_bridge_distance_warn(provider,
-					pci_client, &distance);
+					pci_client, &distance, GFP_KERNEL);
 		else
 			ret = upstream_bridge_distance(provider, pci_client,
 						       &distance, NULL, NULL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
  2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02  5:35   ` John Hubbard
  2021-05-11 16:05   ` Don Dutile
  2021-04-08 17:01 ` [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

In order to use upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
might sleep.

In order to avoid this, try to get the host bridge's device from
bus->self, and if that is not set, just get the first element in the
device list. It should be impossible for the host bridge's device to
go away while references are held on child devices, so the first element
should not be able to change and, thus, this should be safe.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bd89437faf06..473a08940fbc 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
 static bool __host_bridge_whitelist(struct pci_host_bridge *host,
 				    bool same_host_bridge)
 {
-	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
 	const struct pci_p2pdma_whitelist_entry *entry;
+	struct pci_dev *root = host->bus->self;
 	unsigned short vendor, device;
 
+	/*
+	 * This makes the assumption that the first device on the bus is the
+	 * bridge itself and it has the devfn of 00.0. This assumption should
+	 * hold for the devices in the white list above, and if there are cases
+	 * where this isn't true they will have to be dealt with when such a
+	 * case is added to the whitelist.
+	 */
 	if (!root)
+		root = list_first_entry_or_null(&host->bus->devices,
+						struct pci_dev, bus_list);
+
+	if (!root || root->devfn)
 		return false;
 
 	vendor = root->vendor;
 	device = root->device;
-	pci_dev_put(root);
 
 	for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
 		if (vendor != entry->vendor || device != entry->device)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
  2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
  2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02 19:58   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device Logan Gunthorpe
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 473a08940fbc..2574a062a255 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -825,11 +825,18 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
 						    struct pci_dev *client)
 {
+	enum pci_p2pdma_map_type ret;
+
 	if (!provider->p2pdma)
 		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
-	return xa_to_value(xa_load(&provider->p2pdma->map_types,
-				   map_types_idx(client)));
+	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
+				  map_types_idx(client)));
+	if (ret != PCI_P2PDMA_MAP_UNKNOWN)
+		return ret;
+
+	return upstream_bridge_distance_warn(provider, client, NULL,
+					     GFP_ATOMIC);
 }
 
 static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
@@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	case PCI_P2PDMA_MAP_BUS_ADDR:
 		return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
 	default:
-		WARN_ON_ONCE(1);
 		return 0;
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (2 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02 20:41   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
struct device (of the client doing the DMA transfer). Thus move the
conversion to struct pci_devs for the provider and client into this
function.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 2574a062a255..bcb1a6d6119d 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -822,14 +822,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
-						    struct pci_dev *client)
+static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+						    struct device *dev)
 {
+	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
 	enum pci_p2pdma_map_type ret;
+	struct pci_dev *client;
 
 	if (!provider->p2pdma)
 		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
+	if (!dev_is_pci(dev))
+		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+
+	client = to_pci_dev(dev);
+
 	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
 				  map_types_idx(client)));
 	if (ret != PCI_P2PDMA_MAP_UNKNOWN)
@@ -871,14 +878,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 {
 	struct pci_p2pdma_pagemap *p2p_pgmap =
 		to_p2p_pgmap(sg_page(sg)->pgmap);
-	struct pci_dev *client;
-
-	if (WARN_ON_ONCE(!dev_is_pci(dev)))
-		return 0;
 
-	client = to_pci_dev(dev);
-
-	switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
+	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
 	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
 		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
 	case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -901,17 +902,9 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-	struct pci_p2pdma_pagemap *p2p_pgmap =
-		to_p2p_pgmap(sg_page(sg)->pgmap);
 	enum pci_p2pdma_map_type map_type;
-	struct pci_dev *client;
-
-	if (WARN_ON_ONCE(!dev_is_pci(dev)))
-		return;
-
-	client = to_pci_dev(dev);
 
-	map_type = pci_p2pdma_map_type(p2p_pgmap->provider, client);
+	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
 
 	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
 		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (3 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-27 19:22   ` Jason Gunthorpe
                     ` (3 more replies)
  2021-04-08 17:01 ` [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
                   ` (13 subsequent siblings)
  18 siblings, 4 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

dma_map_sg() either returns a positive number indicating the number
of entries mapped or zero indicating that resources were not available
to create the mapping. When zero is returned, it is always safe to retry
the mapping later once resources have been freed.

Once P2PDMA pages are mixed into the SGL there may be pages that may
never be successfully mapped with a given device because that device may
not actually be able to access those pages. Thus, multiple error
conditions will need to be distinguished to determine weather a retry
is safe.

Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
convention from dma_map_sg(). The function will return a positive
integer on success or a negative errno on failure.

ENOMEM will be used to indicate a resource failure and EREMOTEIO to
indicate that a P2PDMA page is not mappable.

The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
level implementations that P2PDMA pages are allowed and to warn if a
caller introduces them into the regular dma_map_sg() interface.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/dma-mapping.h | 15 +++++++++++
 kernel/dma/mapping.c        | 52 ++++++++++++++++++++++++++++++++-----
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2a984cb4d1e0..50b8f586cf59 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -60,6 +60,12 @@
  * at least read-only at lesser-privileged levels).
  */
 #define DMA_ATTR_PRIVILEGED		(1UL << 9)
+/*
+ * __DMA_ATTR_PCI_P2PDMA: This should not be used directly, use
+ * dma_map_sg_p2pdma() instead. Used internally to indicate that the
+ * caller is using the dma_map_sg_p2pdma() interface.
+ */
+#define __DMA_ATTR_PCI_P2PDMA		(1UL << 10)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
@@ -107,6 +113,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
 int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
+int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
+		int nents, enum dma_data_direction dir, unsigned long attrs);
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 				      int nents, enum dma_data_direction dir,
 				      unsigned long attrs);
@@ -160,6 +168,12 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 {
 	return 0;
 }
+static inline int dma_map_sg_p2pdma_attrs(struct device *dev,
+		struct scatterlist *sg, int nents, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	return 0;
+}
 static inline void dma_unmap_sg_attrs(struct device *dev,
 		struct scatterlist *sg, int nents, enum dma_data_direction dir,
 		unsigned long attrs)
@@ -392,6 +406,7 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
 #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
 #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
 #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
+#define dma_map_sg_p2pdma(d, s, n, r) dma_map_sg_p2pdma_attrs(d, s, n, r, 0)
 #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
 #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
 #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b6a633679933..923089c4267b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -177,12 +177,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
-/*
- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
- */
-int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir, unsigned long attrs)
+static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
+		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	int ents;
@@ -197,6 +193,20 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
+
+	return ents;
+}
+
+/*
+ * dma_maps_sg_attrs returns 0 on error and > 0 on success.
+ * It should never return a value < 0.
+ */
+int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	int ents;
+
+	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
 	BUG_ON(ents < 0);
 	debug_dma_map_sg(dev, sg, nents, ents, dir);
 
@@ -204,6 +214,36 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 }
 EXPORT_SYMBOL(dma_map_sg_attrs);
 
+/*
+ * like dma_map_sg_attrs, but returns a negative errno on error (and > 0
+ * on success). This function must be used if PCI P2PDMA pages might
+ * be in the scatterlist.
+ *
+ * On error this function may return:
+ *    -ENOMEM indicating that there was not enough resources available and
+ *      the transfer may be retried later
+ *    -EREMOTEIO indicating that P2PDMA pages were included but cannot
+ *      be mapped by the specified device, retries will always fail
+ *
+ * The scatterlist should be unmapped with the regular dma_unmap_sg[_attrs]().
+ */
+int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
+		int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+	int ents;
+
+	ents = __dma_map_sg_attrs(dev, sg, nents, dir,
+				  attrs | __DMA_ATTR_PCI_P2PDMA);
+	if (!ents)
+		ents = -ENOMEM;
+
+	if (ents > 0)
+		debug_dma_map_sg(dev, sg, nents, ents, dir);
+
+	return ents;
+}
+EXPORT_SYMBOL_GPL(dma_map_sg_p2pdma_attrs);
+
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 				      int nents, enum dma_data_direction dir,
 				      unsigned long attrs)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (4 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02 22:34   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Make use of the third free LSB in scatterlist's page_link on 64bit systems.

The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
given SGL segments dma_address points to a PCI bus address.
dma_unmap_sg_p2pdma() will need to perform different cleanup when a
segment is marked as P2PDMA.

Using this bit requires adding an additional dependency on CONFIG_64BIT to
CONFIG_PCI_P2PDMA. This should be acceptable as the majority of P2PDMA
use cases are restricted to newer root complexes and roughly require the
extra address space for memory BARs used in the transactions.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/Kconfig         |  2 +-
 include/linux/scatterlist.h | 49 ++++++++++++++++++++++++++++++++++---
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..90b4bddb3300 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -163,7 +163,7 @@ config PCI_PASID
 
 config PCI_P2PDMA
 	bool "PCI peer-to-peer transfer support"
-	depends on ZONE_DEVICE
+	depends on ZONE_DEVICE && 64BIT
 	select GENERIC_ALLOCATOR
 	help
 	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6f70572b2938..5525d3ebf36f 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -58,6 +58,21 @@ struct sg_table {
 #define SG_CHAIN	0x01UL
 #define SG_END		0x02UL
 
+/*
+ * bit 2 is the third free bit in the page_link on 64bit systems which
+ * is used by dma_unmap_sg() to determine if the dma_address is a PCI
+ * bus address when doing P2PDMA.
+ * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this.
+ */
+
+#ifdef CONFIG_PCI_P2PDMA
+#define SG_PCI_P2PDMA	0x04UL
+#else
+#define SG_PCI_P2PDMA	0x00UL
+#endif
+
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_PCI_P2PDMA)
+
 /*
  * We overload the LSB of the page pointer to indicate whether it's
  * a valid sg entry, or whether it points to the start of a new scatterlist.
@@ -65,8 +80,9 @@ struct sg_table {
  */
 #define sg_is_chain(sg)		((sg)->page_link & SG_CHAIN)
 #define sg_is_last(sg)		((sg)->page_link & SG_END)
+#define sg_is_pci_p2pdma(sg)	((sg)->page_link & SG_PCI_P2PDMA)
 #define sg_chain_ptr(sg)	\
-	((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END)))
+	((struct scatterlist *) ((sg)->page_link & ~SG_PAGE_LINK_MASK))
 
 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -80,13 +96,13 @@ struct sg_table {
  **/
 static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
 {
-	unsigned long page_link = sg->page_link & (SG_CHAIN | SG_END);
+	unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK;
 
 	/*
 	 * In order for the low bit stealing approach to work, pages
 	 * must be aligned at a 32-bit boundary as a minimum.
 	 */
-	BUG_ON((unsigned long) page & (SG_CHAIN | SG_END));
+	BUG_ON((unsigned long) page & SG_PAGE_LINK_MASK);
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg_is_chain(sg));
 #endif
@@ -120,7 +136,7 @@ static inline struct page *sg_page(struct scatterlist *sg)
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg_is_chain(sg));
 #endif
-	return (struct page *)((sg)->page_link & ~(SG_CHAIN | SG_END));
+	return (struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK);
 }
 
 /**
@@ -222,6 +238,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 	sg->page_link &= ~SG_END;
 }
 
+/**
+ * sg_mark_pci_p2pdma - Mark the scatterlist entry for PCI p2pdma
+ * @sg:		 SG entryScatterlist
+ *
+ * Description:
+ *   Marks the passed in sg entry to indicate that the dma_address is
+ *   a PCI bus address.
+ **/
+static inline void sg_mark_pci_p2pdma(struct scatterlist *sg)
+{
+	sg->page_link |= SG_PCI_P2PDMA;
+}
+
+/**
+ * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma
+ * @sg:		 SG entryScatterlist
+ *
+ * Description:
+ *   Clears the PCI P2PDMA mark
+ **/
+static inline void sg_unmark_pci_p2pdma(struct scatterlist *sg)
+{
+	sg->page_link &= ~SG_PCI_P2PDMA;
+}
+
 /**
  * sg_phys - Return physical address of an sg entry
  * @sg:	     SG entry
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (5 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02 22:44   ` John Hubbard
  2021-05-11 16:06   ` Don Dutile
  2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual iommu mapping.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 34 +++++++++++++++++++++++-----------
 include/linux/pci-p2pdma.h | 15 +++++++++++++++
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bcb1a6d6119d..38c93f57a941 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,13 +20,6 @@
 #include <linux/seq_buf.h>
 #include <linux/xarray.h>
 
-enum pci_p2pdma_map_type {
-	PCI_P2PDMA_MAP_UNKNOWN = 0,
-	PCI_P2PDMA_MAP_NOT_SUPPORTED,
-	PCI_P2PDMA_MAP_BUS_ADDR,
-	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
 struct pci_p2pdma {
 	struct gen_pool *pool;
 	bool p2pmem_published;
@@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
-						    struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ *	a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ * @dma_attrs: the attributes passed to the dma_map operation --
+ *	this is so they can be checked to ensure P2PDMA pages were not
+ *	introduced into an incorrect interface (like dma_map_sg). *
+ *
+ * Returns one of:
+ *	PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ *	PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ *	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+		struct device *dev, unsigned long dma_attrs)
 {
 	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
 	enum pci_p2pdma_map_type ret;
 	struct pci_dev *client;
 
+	WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
+		  "PCI P2PDMA pages were mapped with dma_map_sg!");
+
 	if (!provider->p2pdma)
 		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
@@ -879,7 +889,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	struct pci_p2pdma_pagemap *p2p_pgmap =
 		to_p2p_pgmap(sg_page(sg)->pgmap);
 
-	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
+	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+				    __DMA_ATTR_PCI_P2PDMA)) {
 	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
 		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
 	case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -904,7 +915,8 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 {
 	enum pci_p2pdma_map_type map_type;
 
-	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
+	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+				       __DMA_ATTR_PCI_P2PDMA);
 
 	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
 		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..a06072ac3a52 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -16,6 +16,13 @@
 struct block_device;
 struct scatterlist;
 
+enum pci_p2pdma_map_type {
+	PCI_P2PDMA_MAP_UNKNOWN = 0,
+	PCI_P2PDMA_MAP_NOT_SUPPORTED,
+	PCI_P2PDMA_MAP_BUS_ADDR,
+	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
 #ifdef CONFIG_PCI_P2PDMA
 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 		u64 offset);
@@ -30,6 +37,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 					 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+		struct device *dev, unsigned long dma_attrs);
 int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -83,6 +92,12 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
+		struct dev_pagemap *pgmap, struct device *dev,
+		unsigned long dma_attrs)
+{
+	return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
 static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
 		struct scatterlist *sg, int nents, enum dma_data_direction dir,
 		unsigned long attrs)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (6 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-02 22:52   ` John Hubbard
  2021-05-03  0:50   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 65 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci-p2pdma.h | 21 ++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 38c93f57a941..44ad7664e875 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
 
+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared on the stack outside of
+ *	the for_each_sg() loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ * @attrs: dma mapping attributes
+ *
+ * This is a helper to be used by non-iommu dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns 1 if the segment was mapped, 0 if the segment should be mapped
+ * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
+ * be mapped at all.
+ */
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+			   struct device *dev, struct scatterlist *sg,
+			   unsigned long dma_attrs)
+{
+	if (state->pgmap != sg_page(sg)->pgmap) {
+		state->pgmap = sg_page(sg)->pgmap;
+		state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
+		state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+	}
+
+	switch (state->map) {
+	case PCI_P2PDMA_MAP_BUS_ADDR:
+		sg->dma_address = sg_phys(sg) + state->bus_off;
+		sg_dma_len(sg) = sg->length;
+		sg_mark_pci_p2pdma(sg);
+		return 1;
+	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+		return 0;
+	default:
+		return -EREMOTEIO;
+	}
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ *	be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a dma address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the dma address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+				struct scatterlist *dma_sg)
+{
+	struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+	dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+	sg_dma_len(dma_sg) = pg_sg->length;
+	sg_mark_pci_p2pdma(dma_sg);
+}
+
 /**
  * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
  *		to enable p2pdma
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index a06072ac3a52..49e7679403cf 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -13,6 +13,12 @@
 
 #include <linux/pci.h>
 
+struct pci_p2pdma_map_state {
+	struct dev_pagemap *pgmap;
+	int map;
+	u64 bus_off;
+};
+
 struct block_device;
 struct scatterlist;
 
@@ -43,6 +49,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+		struct device *dev, struct scatterlist *sg,
+		unsigned long dma_attrs);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+				struct scatterlist *dma_sg);
 int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
 			    bool *use_p2pdma);
 ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
@@ -109,6 +120,16 @@ static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
 		unsigned long attrs)
 {
 }
+static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+		struct device *dev, struct scatterlist *sg,
+		unsigned long dma_attrs)
+{
+	return 0;
+}
+static inline void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+					      struct scatterlist *dma_sg)
+{
+}
 static inline int pci_p2pdma_enable_store(const char *page,
 		struct pci_dev **p2p_dev, bool *use_p2pdma)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (7 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-27 19:33   ` Jason Gunthorpe
  2021-05-02 23:28   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

SGL segments that contain PCI bus addresses are marked with
sg_mark_pci_p2pdma() and are ignored when unmapped.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 kernel/dma/direct.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..108dfb4ecbd5 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
 #include <linux/vmalloc.h>
 #include <linux/set_memory.h>
 #include <linux/slab.h>
+#include <linux/pci-p2pdma.h>
 #include "direct.h"
 
 /*
@@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 	struct scatterlist *sg;
 	int i;
 
-	for_each_sg(sgl, sg, nents, i)
+	for_each_sg(sgl, sg, nents, i) {
+		if (sg_is_pci_p2pdma(sg)) {
+			sg_unmark_pci_p2pdma(sg);
+			continue;
+		}
+
 		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 			     attrs);
+	}
 }
 #endif
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	int i;
+	struct pci_p2pdma_map_state p2pdma_state = {};
 	struct scatterlist *sg;
+	int i, ret = 0;
 
 	for_each_sg(sgl, sg, nents, i) {
+		if (is_pci_p2pdma_page(sg_page(sg))) {
+			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
+						     attrs);
+			if (ret < 0) {
+				goto out_unmap;
+			} else if (ret) {
+				ret = 0;
+				continue;
+			}
+		}
+
 		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
 				sg->offset, sg->length, dir, attrs);
 		if (sg->dma_address == DMA_MAPPING_ERROR)
@@ -411,7 +430,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 
 out_unmap:
 	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-	return 0;
+	return ret;
 }
 
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (8 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-03  0:32   ` John Hubbard
  2021-05-11 16:06   ` Don Dutile
  2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  5 +++++
 kernel/dma/mapping.c        | 18 ++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 51872e736e7b..481892822104 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -12,6 +12,9 @@
 struct cma;
 
 struct dma_map_ops {
+	unsigned int flags;
+#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
+
 	void *(*alloc)(struct device *dev, size_t size,
 			dma_addr_t *dma_handle, gfp_t gfp,
 			unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 50b8f586cf59..c31980ecca62 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 		unsigned long attrs);
 bool dma_can_mmap(struct device *dev);
 int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
@@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 mask)
 {
 	return 0;
 }
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+	return 0;
+}
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
 	return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 923089c4267b..ce44a0fcc4e5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	/* if ops is not set, dma direct will be used which supports P2PDMA */
+	if (!ops)
+		return true;
+
+	/*
+	 * Note: dma_ops_bypass is not checked here because P2PDMA should
+	 * not be used with dma mapping ops that do not have support even
+	 * if the specific device is bypassing them.
+	 */
+
+	return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
 #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
 void arch_dma_set_mask(struct device *dev, u64 mask);
 #else
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (9 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-27 19:43   ` Jason Gunthorpe
                     ` (2 more replies)
  2021-04-08 17:01 ` [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
                   ` (7 subsequent siblings)
  18 siblings, 3 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is
set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
P2PDMA pages.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/iommu/dma-iommu.c | 66 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..ef49635f9819 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/pci-p2pdma.h>
 #include <linux/swiotlb.h>
 #include <linux/scatterlist.h>
 #include <linux/vmalloc.h>
@@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 
+		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+			if (i > 0)
+				cur = sg_next(cur);
+
+			pci_p2pdma_map_bus_segment(s, cur);
+			count++;
+			cur_len = 0;
+			continue;
+		}
+
 		/*
 		 * Now fill in the real DMA data. If...
 		 * - there is a valid output segment to append to
@@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	struct iova_domain *iovad = &cookie->iovad;
 	struct scatterlist *s, *prev = NULL;
 	int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+	struct dev_pagemap *pgmap = NULL;
+	enum pci_p2pdma_map_type map_type;
 	dma_addr_t iova;
 	size_t iova_len = 0;
 	unsigned long mask = dma_get_seg_boundary(dev);
-	int i;
+	int i, ret = 0;
 
 	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
 	    iommu_deferred_attach(dev, domain))
@@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		s_length = iova_align(iovad, s_length + s_iova_off);
 		s->length = s_length;
 
+		if (is_pci_p2pdma_page(sg_page(s))) {
+			if (sg_page(s)->pgmap != pgmap) {
+				pgmap = sg_page(s)->pgmap;
+				map_type = pci_p2pdma_map_type(pgmap, dev,
+							       attrs);
+			}
+
+			switch (map_type) {
+			case PCI_P2PDMA_MAP_BUS_ADDR:
+				/*
+				 * A zero length will be ignored by
+				 * iommu_map_sg() and then can be detected
+				 * in __finalise_sg() to actually map the
+				 * bus address.
+				 */
+				s->length = 0;
+				continue;
+			case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+				break;
+			default:
+				ret = -EREMOTEIO;
+				goto out_restore_sg;
+			}
+		}
+
 		/*
 		 * Due to the alignment of our single IOVA allocation, we can
 		 * depend on these assumptions about the segment boundary mask:
@@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		prev = s;
 	}
 
+	if (!iova_len)
+		return __finalise_sg(dev, sg, nents, 0);
+
 	iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
 	if (!iova)
 		goto out_restore_sg;
@@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	iommu_dma_free_iova(cookie, iova, iova_len, NULL);
 out_restore_sg:
 	__invalidate_sg(sg, nents);
-	return 0;
+	return ret;
 }
 
 static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-	dma_addr_t start, end;
+	dma_addr_t end, start = DMA_MAPPING_ERROR;
 	struct scatterlist *tmp;
 	int i;
 
@@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	 * The scatterlist segments are mapped into a single
 	 * contiguous IOVA allocation, so this is incredibly easy.
 	 */
-	start = sg_dma_address(sg);
-	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
+	for_each_sg(sg, tmp, nents, i) {
+		if (sg_is_pci_p2pdma(tmp)) {
+			sg_unmark_pci_p2pdma(tmp);
+			continue;
+		}
 		if (sg_dma_len(tmp) == 0)
 			break;
-		sg = tmp;
+
+		if (start == DMA_MAPPING_ERROR)
+			start = sg_dma_address(tmp);
+
+		end = sg_dma_address(tmp) + sg_dma_len(tmp);
 	}
-	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(dev, start, end - start);
+
+	if (start != DMA_MAPPING_ERROR)
+		__iommu_dma_unmap(dev, start, end - start);
 }
 
 static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
@@ -1254,6 +1303,7 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
 }
 
 static const struct dma_map_ops iommu_dma_ops = {
+	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
 	.alloc			= iommu_dma_alloc,
 	.free			= iommu_dma_free,
 	.alloc_pages		= dma_common_alloc_pages,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (10 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-03  1:29   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages Logan Gunthorpe
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 11 +++++++++--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..223419454516 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3907,7 +3907,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
 		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
-	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+	if (ctrl->ops->supports_pci_p2pdma &&
+	    ctrl->ops->supports_pci_p2pdma(ctrl))
 		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
 
 	ns->queue->queuedata = ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..9c04df982d2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -473,7 +473,6 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
-#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -481,6 +480,7 @@ struct nvme_ctrl_ops {
 	void (*submit_async_event)(struct nvme_ctrl *ctrl);
 	void (*delete_ctrl)(struct nvme_ctrl *ctrl);
 	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+	bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
 };
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7249ae74f71f..14f092973792 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2759,17 +2759,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 	return snprintf(buf, size, "%s\n", dev_name(&pdev->dev));
 }
 
+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+	struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+	return dma_pci_p2pdma_supported(dev->dev);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED |
-				  NVME_F_PCI_P2PDMA,
+	.flags			= NVME_F_METADATA_SUPPORTED,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
 	.free_ctrl		= nvme_pci_free_ctrl,
 	.submit_async_event	= nvme_pci_submit_async_event,
 	.get_address		= nvme_pci_get_address,
+	.supports_pci_p2pdma	= nvme_pci_supports_pci_p2pdma,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (11 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-05-03  1:34   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Convert to using dma_map_sg_p2pdma() for PCI p2pdma pages.

This should be equivalent but allows for heterogeneous scatterlists
with both P2PDMA and regular pages. However, P2PDMA support will be
slightly more restricted (only dma-direct and dma-iommu are currently
supported).

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/host/pci.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 14f092973792..a1ed07ff38b7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -577,17 +577,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
 
 }
 
-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
-	if (is_pci_p2pdma_page(sg_page(iod->sg)))
-		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
-				    rq_dma_dir(req));
-	else
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -600,7 +589,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 
 	WARN_ON_ONCE(!iod->nents);
 
-	nvme_unmap_sg(dev, req);
+	dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
 	if (iod->npages == 0)
 		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			      iod->first_dma);
@@ -868,14 +857,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	if (!iod->nents)
 		goto out_free_sg;
 
-	if (is_pci_p2pdma_page(sg_page(iod->sg)))
-		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
-				iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
-	else
-		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
-	if (!nr_mapped)
+	nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
+				     rq_dma_dir(req), DMA_ATTR_NO_WARN);
+	if (nr_mapped < 0) {
+		if (nr_mapped != -ENOMEM)
+			ret = BLK_STS_TARGET;
 		goto out_free_sg;
+	}
 
 	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 	if (iod->use_sgl)
@@ -887,7 +875,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	return BLK_STS_OK;
 
 out_unmap_sg:
-	nvme_unmap_sg(dev, req);
+	dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
 out_free_sg:
 	mempool_free(iod->sg, dev->iod_mempool);
 	return ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (12 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-27 19:47   ` Jason Gunthorpe
  2021-05-03  1:37   ` John Hubbard
  2021-04-08 17:01 ` [PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma() Logan Gunthorpe
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Ensure the dma operations support p2pdma before using the RDMA
device for P2PDMA. This allows switching the RDMA driver from
pci_p2pdma_map_sg() to dma_map_sg_p2pdma().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/rdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 6c1f3ab7649c..3ec7e77e5416 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
 	if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
 		goto out_free_rsp;
 
-	if (!ib_uses_virt_dma(ndev->device))
+	if (!ib_uses_virt_dma(ndev->device) &&
+	    dma_pci_p2pdma_supported(&ndev->device->dev))
 		r->req.p2p_client = &ndev->device->dev;
 	r->send_sge.length = sizeof(*r->req.cqe);
 	r->send_sge.lkey = ndev->pd->local_dma_lkey;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma()
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (13 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-08 17:01 ` [PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() Logan Gunthorpe
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

Drop the use of pci_p2pdma_map_sg() in favour of dma_map_sg_p2pdma().

The new interface allows mapping scatterlists that mix both regular
and P2PDMA pages and will verify that the dma device can communicate
with the device the pages are on.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/infiniband/core/rw.c | 50 ++++++++++--------------------------
 include/rdma/ib_verbs.h      | 32 +++++++++++++++++++++++
 2 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 31156e22d3e7..0c6213d9b044 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -273,26 +273,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 	return 1;
 }
 
-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
-			     u32 sg_cnt, enum dma_data_direction dir)
-{
-	if (is_pci_p2pdma_page(sg_page(sg)))
-		pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
-	else
-		ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
-			  u32 sg_cnt, enum dma_data_direction dir)
-{
-	if (is_pci_p2pdma_page(sg_page(sg))) {
-		if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
-			return 0;
-		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
-	}
-	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
-}
-
 /**
  * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
  * @ctx:	context to initialize
@@ -315,9 +295,9 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	struct ib_device *dev = qp->pd->device;
 	int ret;
 
-	ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-	if (!ret)
-		return -ENOMEM;
+	ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+	if (ret < 0)
+		return ret;
 	sg_cnt = ret;
 
 	/*
@@ -354,7 +334,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 	return ret;
 
 out_unmap_sg:
-	rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+	ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
 	return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -394,17 +374,15 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 		return -EINVAL;
 	}
 
-	ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-	if (!ret)
-		return -ENOMEM;
+	ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+	if (ret < 0)
+		return ret;
 	sg_cnt = ret;
 
 	if (prot_sg_cnt) {
-		ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
-		if (!ret) {
-			ret = -ENOMEM;
+		ret = ib_dma_map_sg_p2pdma(dev, prot_sg, prot_sg_cnt, dir);
+		if (ret < 0)
 			goto out_unmap_sg;
-		}
 		prot_sg_cnt = ret;
 	}
 
@@ -469,9 +447,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 	kfree(ctx->reg);
 out_unmap_prot_sg:
 	if (prot_sg_cnt)
-		rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+		ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
 out_unmap_sg:
-	rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+	ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
 	return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -603,7 +581,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		break;
 	}
 
-	rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
@@ -631,8 +609,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 	kfree(ctx->reg);
 
 	if (prot_sg_cnt)
-		rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
-	rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+		ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ca28fca5736b..a541ed1702f5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4028,6 +4028,17 @@ static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
 				dma_attrs);
 }
 
+static inline int ib_dma_map_sg_p2pdma_attrs(struct ib_device *dev,
+					     struct scatterlist *sg, int nents,
+					     enum dma_data_direction direction,
+					     unsigned long dma_attrs)
+{
+	if (ib_uses_virt_dma(dev))
+		return ib_dma_virt_map_sg(dev, sg, nents);
+	return dma_map_sg_p2pdma_attrs(dev->dma_device, sg, nents, direction,
+				       dma_attrs);
+}
+
 static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
 					 struct scatterlist *sg, int nents,
 					 enum dma_data_direction direction,
@@ -4052,6 +4063,27 @@ static inline int ib_dma_map_sg(struct ib_device *dev,
 	return ib_dma_map_sg_attrs(dev, sg, nents, direction, 0);
 }
 
+/**
+ * ib_dma_map_sg_p2pdma - Map a scatter/gather list to DMA addresses
+ * @dev: The device for which the DMA addresses are to be created
+ * @sg: The array of scatter/gather entries
+ * @nents: The number of scatter/gather entries
+ * @direction: The direction of the DMA
+ *
+ * Map an scatter/gather list that might contain P2PDMA pages.
+ * Unlike ib_dma_map_sg() it will return either a negative errno or
+ * a positive value indicating the number of dma segments. See
+ * dma_map_sg_p2pdma_attrs() for details.
+ *
+ * The resulting list should be unmapped with ib_dma_unmap_sg().
+ */
+static inline int ib_dma_map_sg_p2pdma(struct ib_device *dev,
+				       struct scatterlist *sg, int nents,
+				       enum dma_data_direction direction)
+{
+	return ib_dma_map_sg_p2pdma_attrs(dev, sg, nents, direction, 0);
+}
+
 /**
  * ib_dma_unmap_sg - Unmap a scatter/gather list of DMA addresses
  * @dev: The device for which the DMA addresses were created
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (14 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma() Logan Gunthorpe
@ 2021-04-08 17:01 ` Logan Gunthorpe
  2021-04-27 19:28 ` [PATCH 00/16] Add new DMA mapping operation for P2PDMA Jason Gunthorpe
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-08 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Logan Gunthorpe

This interface is superseded by the new dma_map_sg_p2pdma() interface
which supports heterogeneous scatterlists. There are no longer
any users, so remove it.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c       | 67 --------------------------------------
 include/linux/pci-p2pdma.h | 27 ---------------
 2 files changed, 94 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 44ad7664e875..2f2adcccfa11 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -856,73 +856,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
 					     GFP_ATOMIC);
 }
 
-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
-		struct device *dev, struct scatterlist *sg, int nents)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i) {
-		s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
-		sg_dma_len(s) = s->length;
-	}
-
-	return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-	struct pci_p2pdma_pagemap *p2p_pgmap =
-		to_p2p_pgmap(sg_page(sg)->pgmap);
-
-	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-				    __DMA_ATTR_PCI_P2PDMA)) {
-	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
-	case PCI_P2PDMA_MAP_BUS_ADDR:
-		return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
-	default:
-		return 0;
-	}
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- *	mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-	enum pci_p2pdma_map_type map_type;
-
-	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-				       __DMA_ATTR_PCI_P2PDMA);
-
-	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
-		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
 /**
  * pci_p2pdma_map_segment - map an sg segment determining the mapping type
  * @state: State structure that should be declared on the stack outside of
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 49e7679403cf..2ec9c75fa097 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -45,10 +45,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
 		struct device *dev, unsigned long dma_attrs);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, unsigned long attrs);
 int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
 		struct device *dev, struct scatterlist *sg,
 		unsigned long dma_attrs);
@@ -109,17 +105,6 @@ static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
 {
 	return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 }
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
-		struct scatterlist *sg, int nents, enum dma_data_direction dir,
-		unsigned long attrs)
-{
-	return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
-		struct scatterlist *sg, int nents, enum dma_data_direction dir,
-		unsigned long attrs)
-{
-}
 static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
 		struct device *dev, struct scatterlist *sg,
 		unsigned long dma_attrs)
@@ -155,16 +140,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct device *client)
 	return pci_p2pmem_find_many(&client, 1);
 }
 
-static inline int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg,
-				    int nents, enum dma_data_direction dir)
-{
-	return pci_p2pdma_map_sg_attrs(dev, sg, nents, dir, 0);
-}
-
-static inline void pci_p2pdma_unmap_sg(struct device *dev,
-		struct scatterlist *sg, int nents, enum dma_data_direction dir)
-{
-	pci_p2pdma_unmap_sg_attrs(dev, sg, nents, dir, 0);
-}
-
 #endif /* _LINUX_PCI_P2P_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
@ 2021-04-27 19:22   ` Jason Gunthorpe
  2021-04-27 22:49     ` Logan Gunthorpe
  2021-04-27 19:31   ` Jason Gunthorpe
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:22 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:12AM -0600, Logan Gunthorpe wrote:
> dma_map_sg() either returns a positive number indicating the number
> of entries mapped or zero indicating that resources were not available
> to create the mapping. When zero is returned, it is always safe to retry
> the mapping later once resources have been freed.
> 
> Once P2PDMA pages are mixed into the SGL there may be pages that may
> never be successfully mapped with a given device because that device may
> not actually be able to access those pages. Thus, multiple error
> conditions will need to be distinguished to determine weather a retry
> is safe.
> 
> Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
> convention from dma_map_sg(). The function will return a positive
> integer on success or a negative errno on failure.
> 
> ENOMEM will be used to indicate a resource failure and EREMOTEIO to
> indicate that a P2PDMA page is not mappable.
> 
> The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
> level implementations that P2PDMA pages are allowed and to warn if a
> caller introduces them into the regular dma_map_sg() interface.

So this new API is all about being able to return an error code
because auditing the old API is basically terrifying?

OK, but why name everything new P2PDMA? It seems nicer to give this
some generic name and have some general program to gradually deprecate
normal non-error-capable dma_map_sg() ?

I think that will raise less questions when subsystem people see the
changes, as I was wondering why RW was being moved to use what looked
like a p2pdma only API.

dma_map_sg_or_err() would have been clearer

The flag is also clearer as to the purpose if it is named
__DMA_ATTR_ERROR_ALLOWED

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/16] Add new DMA mapping operation for P2PDMA
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (15 preceding siblings ...)
  2021-04-08 17:01 ` [PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() Logan Gunthorpe
@ 2021-04-27 19:28 ` Jason Gunthorpe
  2021-04-27 20:21   ` John Hubbard
  2021-05-02  1:22 ` John Hubbard
  2021-05-11 16:05 ` Don Dutile
  18 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:28 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:07AM -0600, Logan Gunthorpe wrote:
> Hi,
> 
> This patchset continues my work to to add P2PDMA support to the common
> dma map operations. This allows for creating SGLs that have both P2PDMA
> and regular pages which is a necessary step to allowing P2PDMA pages in
> userspace.
> 
> The earlier RFC[1] generated a lot of great feedback and I heard no show
> stopping objections. Thus, I've incorporated all the feedback and have
> decided to post this as a proper patch series with hopes of eventually
> getting it in mainline.
>
> I'm happy to do a few more passes if anyone has any further feedback
> or better ideas.

For the user of the DMA API the idea seems reasonable enough, the next
steps to integrate with pin_user_pages() seem fairly straightfoward
too

Was there no feedback on this at all?

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
  2021-04-27 19:22   ` Jason Gunthorpe
@ 2021-04-27 19:31   ` Jason Gunthorpe
  2021-04-27 22:55     ` Logan Gunthorpe
  2021-05-02 21:23   ` John Hubbard
  2021-05-11 16:05   ` Don Dutile
  3 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:31 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:12AM -0600, Logan Gunthorpe wrote:
> +/*
> + * dma_maps_sg_attrs returns 0 on error and > 0 on success.
> + * It should never return a value < 0.
> + */

Also it is weird a function that can't return 0 is returning an int type

> +int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> +		enum dma_data_direction dir, unsigned long attrs)
> +{
> +	int ents;
> +
> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>  	BUG_ON(ents < 0);

if (WARN_ON(ents < 0))
     return 0;

instead of bug on?

Also, I see only 8 users of this function. How about just fix them all
to support negative returns and use this as the p2p API instead of
adding new API?

Add the opposite logic flag, 'DMA_ATTRS_NO_ERROR' and pass it through
the other api entry callers that can't handle it?

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
@ 2021-04-27 19:33   ` Jason Gunthorpe
  2021-04-27 19:40     ` Jason Gunthorpe
  2021-05-02 23:28   ` John Hubbard
  1 sibling, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:33 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote:
> Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
> PCI P2PDMA pages directly without a hack in the callers. This allows
> for heterogeneous SGLs that contain both P2PDMA and regular pages.
> 
> SGL segments that contain PCI bus addresses are marked with
> sg_mark_pci_p2pdma() and are ignored when unmapped.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>  kernel/dma/direct.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 002268262c9a..108dfb4ecbd5 100644
> +++ b/kernel/dma/direct.c
> @@ -13,6 +13,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/set_memory.h>
>  #include <linux/slab.h>
> +#include <linux/pci-p2pdma.h>
>  #include "direct.h"
>  
>  /*
> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  	struct scatterlist *sg;
>  	int i;
>  
> -	for_each_sg(sgl, sg, nents, i)
> +	for_each_sg(sgl, sg, nents, i) {
> +		if (sg_is_pci_p2pdma(sg)) {
> +			sg_unmark_pci_p2pdma(sg);

This doesn't seem nice, the DMA layer should only alter the DMA
portion of the SG, not the other portions. Is it necessary?

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-04-27 19:33   ` Jason Gunthorpe
@ 2021-04-27 19:40     ` Jason Gunthorpe
  2021-04-27 22:56       ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:40 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Tue, Apr 27, 2021 at 04:33:51PM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote:
> > Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
> > PCI P2PDMA pages directly without a hack in the callers. This allows
> > for heterogeneous SGLs that contain both P2PDMA and regular pages.
> > 
> > SGL segments that contain PCI bus addresses are marked with
> > sg_mark_pci_p2pdma() and are ignored when unmapped.
> > 
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >  kernel/dma/direct.c | 25 ++++++++++++++++++++++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 002268262c9a..108dfb4ecbd5 100644
> > +++ b/kernel/dma/direct.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/vmalloc.h>
> >  #include <linux/set_memory.h>
> >  #include <linux/slab.h>
> > +#include <linux/pci-p2pdma.h>
> >  #include "direct.h"
> >  
> >  /*
> > @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >  	struct scatterlist *sg;
> >  	int i;
> >  
> > -	for_each_sg(sgl, sg, nents, i)
> > +	for_each_sg(sgl, sg, nents, i) {
> > +		if (sg_is_pci_p2pdma(sg)) {
> > +			sg_unmark_pci_p2pdma(sg);
> 
> This doesn't seem nice, the DMA layer should only alter the DMA
> portion of the SG, not the other portions. Is it necessary?

Oh, I got it completely wrong what this is for.

This should be named sg_dma_mark_pci_p2p() and similar for other
functions to make it clear it is part of the DMA side of the SG
interface (eg it is like sg_dma_address, sg_dma_len, etc)

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
@ 2021-04-27 19:43   ` Jason Gunthorpe
  2021-04-27 22:59     ` Logan Gunthorpe
  2021-05-03  1:14   ` John Hubbard
  2021-05-11 16:06   ` Don Dutile
  2 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:43 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:18AM -0600, Logan Gunthorpe wrote:
> When a PCI P2PDMA page is seen, set the IOVA length of the segment
> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
> apply the appropriate bus address to the segment. The IOVA is not
> created if the scatterlist only consists of P2PDMA pages.

I expect P2P to work with systems that use ATS, so we'd want to see
those systems have the IOMMU programmed with the bus address.

Is it OK like this because the other logic prohibits all PCI cases
that would lean on the IOMMU, like ATS, hairpinning through the root
port, or transiting the root complex?

If yes, the code deserves a big comment explaining this is incomplete,
and I'd want to know we can finish this to include ATS at least based
on this series.

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma
  2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
@ 2021-04-27 19:47   ` Jason Gunthorpe
  2021-04-27 22:59     ` Logan Gunthorpe
  2021-05-03  1:37   ` John Hubbard
  1 sibling, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 19:47 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Thu, Apr 08, 2021 at 11:01:21AM -0600, Logan Gunthorpe wrote:
> Ensure the dma operations support p2pdma before using the RDMA
> device for P2PDMA. This allows switching the RDMA driver from
> pci_p2pdma_map_sg() to dma_map_sg_p2pdma().
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>  drivers/nvme/target/rdma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 6c1f3ab7649c..3ec7e77e5416 100644
> +++ b/drivers/nvme/target/rdma.c
> @@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
>  	if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
>  		goto out_free_rsp;
>  
> -	if (!ib_uses_virt_dma(ndev->device))
> +	if (!ib_uses_virt_dma(ndev->device) &&
> +	    dma_pci_p2pdma_supported(&ndev->device->dev))

ib_uses_virt_dma() should not be called by nvme and this is using the
wrong device pointer to query for DMA related properties.

I suspect this wants a ib_dma_pci_p2p_dma_supported() wrapper like
everything else.

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/16] Add new DMA mapping operation for P2PDMA
  2021-04-27 19:28 ` [PATCH 00/16] Add new DMA mapping operation for P2PDMA Jason Gunthorpe
@ 2021-04-27 20:21   ` John Hubbard
  2021-04-27 20:48     ` Dan Williams
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-04-27 20:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/27/21 12:28 PM, Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:07AM -0600, Logan Gunthorpe wrote:
>> Hi,
>>
>> This patchset continues my work to to add P2PDMA support to the common
>> dma map operations. This allows for creating SGLs that have both P2PDMA
>> and regular pages which is a necessary step to allowing P2PDMA pages in
>> userspace.
>>
>> The earlier RFC[1] generated a lot of great feedback and I heard no show
>> stopping objections. Thus, I've incorporated all the feedback and have
>> decided to post this as a proper patch series with hopes of eventually
>> getting it in mainline.
>>
>> I'm happy to do a few more passes if anyone has any further feedback
>> or better ideas.
> 
> For the user of the DMA API the idea seems reasonable enough, the next
> steps to integrate with pin_user_pages() seem fairly straightfoward
> too
> 
> Was there no feedback on this at all?
> 

oops, I meant to review this a lot sooner, because this whole p2pdma thing is
actually very interesting and important...somehow it slipped but I'll take
a look now.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/16] Add new DMA mapping operation for P2PDMA
  2021-04-27 20:21   ` John Hubbard
@ 2021-04-27 20:48     ` Dan Williams
  0 siblings, 0 replies; 99+ messages in thread
From: Dan Williams @ 2021-04-27 20:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, Logan Gunthorpe, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, iommu, Stephen Bates,
	Christoph Hellwig, Christian König, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Jakowski Andrzej, Minturn Dave B,
	Jason Ekstrand, Dave Hansen, Xiong Jianxin, Bjorn Helgaas,
	Ira Weiny, Robin Murphy

On Tue, Apr 27, 2021 at 1:22 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 4/27/21 12:28 PM, Jason Gunthorpe wrote:
> > On Thu, Apr 08, 2021 at 11:01:07AM -0600, Logan Gunthorpe wrote:
> >> Hi,
> >>
> >> This patchset continues my work to to add P2PDMA support to the common
> >> dma map operations. This allows for creating SGLs that have both P2PDMA
> >> and regular pages which is a necessary step to allowing P2PDMA pages in
> >> userspace.
> >>
> >> The earlier RFC[1] generated a lot of great feedback and I heard no show
> >> stopping objections. Thus, I've incorporated all the feedback and have
> >> decided to post this as a proper patch series with hopes of eventually
> >> getting it in mainline.
> >>
> >> I'm happy to do a few more passes if anyone has any further feedback
> >> or better ideas.
> >
> > For the user of the DMA API the idea seems reasonable enough, the next
> > steps to integrate with pin_user_pages() seem fairly straightfoward
> > too
> >
> > Was there no feedback on this at all?
> >
>
> oops, I meant to review this a lot sooner, because this whole p2pdma thing is
> actually very interesting and important...somehow it slipped but I'll take
> a look now.

Still in my queue as well behind Joao's memmap consolidation series,
and a recent copy_mc_to_iter() fix series from Al.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-27 19:22   ` Jason Gunthorpe
@ 2021-04-27 22:49     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-27 22:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-04-27 1:22 p.m., Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:12AM -0600, Logan Gunthorpe wrote:
>> dma_map_sg() either returns a positive number indicating the number
>> of entries mapped or zero indicating that resources were not available
>> to create the mapping. When zero is returned, it is always safe to retry
>> the mapping later once resources have been freed.
>>
>> Once P2PDMA pages are mixed into the SGL there may be pages that may
>> never be successfully mapped with a given device because that device may
>> not actually be able to access those pages. Thus, multiple error
>> conditions will need to be distinguished to determine weather a retry
>> is safe.
>>
>> Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
>> convention from dma_map_sg(). The function will return a positive
>> integer on success or a negative errno on failure.
>>
>> ENOMEM will be used to indicate a resource failure and EREMOTEIO to
>> indicate that a P2PDMA page is not mappable.
>>
>> The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
>> level implementations that P2PDMA pages are allowed and to warn if a
>> caller introduces them into the regular dma_map_sg() interface.
> 
> So this new API is all about being able to return an error code
> because auditing the old API is basically terrifying?
> 
> OK, but why name everything new P2PDMA? It seems nicer to give this
> some generic name and have some general program to gradually deprecate
> normal non-error-capable dma_map_sg() ?
> 
> I think that will raise less questions when subsystem people see the
> changes, as I was wondering why RW was being moved to use what looked
> like a p2pdma only API.
> 
> dma_map_sg_or_err() would have been clearer
> 
> The flag is also clearer as to the purpose if it is named
> __DMA_ATTR_ERROR_ALLOWED

I'm not opposed to these names. I can use them for v2 if there are no
other opinions.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-27 19:31   ` Jason Gunthorpe
@ 2021-04-27 22:55     ` Logan Gunthorpe
  2021-04-27 23:01       ` Jason Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-27 22:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-04-27 1:31 p.m., Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:12AM -0600, Logan Gunthorpe wrote:
>> +/*
>> + * dma_maps_sg_attrs returns 0 on error and > 0 on success.
>> + * It should never return a value < 0.
>> + */
> 
> Also it is weird a function that can't return 0 is returning an int type

Yes, Christoph mentioned in the last series that this should probably
change to an unsigned but I wasn't really sure if that change should be
a part of the P2PDMA series.

>> +int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>> +		enum dma_data_direction dir, unsigned long attrs)
>> +{
>> +	int ents;
>> +
>> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>>  	BUG_ON(ents < 0);
> 
> if (WARN_ON(ents < 0))
>      return 0;
> 
> instead of bug on?

It was BUG_ON in the original code. So I felt I should leave it.

> Also, I see only 8 users of this function. How about just fix them all
> to support negative returns and use this as the p2p API instead of
> adding new API?

Well there might be 8 users of dma_map_sg_attrs() but there are a very
large number of dma_map_sg(). Seems odd to me to single out the first as
requiring these changes, but leave the latter.

> Add the opposite logic flag, 'DMA_ATTRS_NO_ERROR' and pass it through
> the other api entry callers that can't handle it?

I'm not that opposed to this. But it will make this series a fair bit
longer to change the 8 map_sg_attrs() usages.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-04-27 19:40     ` Jason Gunthorpe
@ 2021-04-27 22:56       ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-27 22:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-04-27 1:40 p.m., Jason Gunthorpe wrote:
> On Tue, Apr 27, 2021 at 04:33:51PM -0300, Jason Gunthorpe wrote:
>> On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote:
>>> Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
>>> PCI P2PDMA pages directly without a hack in the callers. This allows
>>> for heterogeneous SGLs that contain both P2PDMA and regular pages.
>>>
>>> SGL segments that contain PCI bus addresses are marked with
>>> sg_mark_pci_p2pdma() and are ignored when unmapped.
>>>
>>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>>  kernel/dma/direct.c | 25 ++++++++++++++++++++++---
>>>  1 file changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 002268262c9a..108dfb4ecbd5 100644
>>> +++ b/kernel/dma/direct.c
>>> @@ -13,6 +13,7 @@
>>>  #include <linux/vmalloc.h>
>>>  #include <linux/set_memory.h>
>>>  #include <linux/slab.h>
>>> +#include <linux/pci-p2pdma.h>
>>>  #include "direct.h"
>>>  
>>>  /*
>>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>>  	struct scatterlist *sg;
>>>  	int i;
>>>  
>>> -	for_each_sg(sgl, sg, nents, i)
>>> +	for_each_sg(sgl, sg, nents, i) {
>>> +		if (sg_is_pci_p2pdma(sg)) {
>>> +			sg_unmark_pci_p2pdma(sg);
>>
>> This doesn't seem nice, the DMA layer should only alter the DMA
>> portion of the SG, not the other portions. Is it necessary?
> 
> Oh, I got it completely wrong what this is for.
> 
> This should be named sg_dma_mark_pci_p2p() and similar for other
> functions to make it clear it is part of the DMA side of the SG
> interface (eg it is like sg_dma_address, sg_dma_len, etc)

Fair point. Yes, I'll rename this for the next version.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-04-27 19:43   ` Jason Gunthorpe
@ 2021-04-27 22:59     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-27 22:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-04-27 1:43 p.m., Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:18AM -0600, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
> 
> I expect P2P to work with systems that use ATS, so we'd want to see
> those systems have the IOMMU programmed with the bus address.

Oh, the paragraph you quote isn't quite as clear as it could be. The bus
address is only used in specific circumstances depending on how the
P2PDMA core code figures the addresses should be mapped (see the
documentation for (upstream_bridge_distance()). The P2PDMA code
currently doesn't have any provisions for ATS (I haven't had access to
any such hardware) but I'm sure it wouldn't be too hard to add.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma
  2021-04-27 19:47   ` Jason Gunthorpe
@ 2021-04-27 22:59     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-04-27 22:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-04-27 1:47 p.m., Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 11:01:21AM -0600, Logan Gunthorpe wrote:
>> Ensure the dma operations support p2pdma before using the RDMA
>> device for P2PDMA. This allows switching the RDMA driver from
>> pci_p2pdma_map_sg() to dma_map_sg_p2pdma().
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>  drivers/nvme/target/rdma.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 6c1f3ab7649c..3ec7e77e5416 100644
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
>>  	if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
>>  		goto out_free_rsp;
>>  
>> -	if (!ib_uses_virt_dma(ndev->device))
>> +	if (!ib_uses_virt_dma(ndev->device) &&
>> +	    dma_pci_p2pdma_supported(&ndev->device->dev))
> 
> ib_uses_virt_dma() should not be called by nvme and this is using the
> wrong device pointer to query for DMA related properties.
> 
> I suspect this wants a ib_dma_pci_p2p_dma_supported() wrapper like
> everything else.

Makes sense. Will add for v2.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-27 22:55     ` Logan Gunthorpe
@ 2021-04-27 23:01       ` Jason Gunthorpe
  2021-05-03 18:28         ` Christoph Hellwig
  0 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-04-27 23:01 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Christian König, John Hubbard, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Tue, Apr 27, 2021 at 04:55:45PM -0600, Logan Gunthorpe wrote:

> > Also, I see only 8 users of this function. How about just fix them all
> > to support negative returns and use this as the p2p API instead of
> > adding new API?
> 
> Well there might be 8 users of dma_map_sg_attrs() but there are a very
> large number of dma_map_sg(). Seems odd to me to single out the first as
> requiring these changes, but leave the latter.

At a high level I'm OK with it. dma_map_sg_attrs() is the extra
extended version of dma_map_sg(), it already has a different
signature, a different return code is not out of the question.

dma_map_sg() is just the simple easy to use interface that can't do
advanced stuff.

> I'm not that opposed to this. But it will make this series a fair bit
> longer to change the 8 map_sg_attrs() usages.

Yes, but the result seems much nicer to not grow the DMA API further.

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/16] Add new DMA mapping operation for P2PDMA
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (16 preceding siblings ...)
  2021-04-27 19:28 ` [PATCH 00/16] Add new DMA mapping operation for P2PDMA Jason Gunthorpe
@ 2021-05-02  1:22 ` John Hubbard
  2021-05-11 16:05 ` Don Dutile
  18 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02  1:22 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Hi,
> 
> This patchset continues my work to to add P2PDMA support to the common
> dma map operations. This allows for creating SGLs that have both P2PDMA
> and regular pages which is a necessary step to allowing P2PDMA pages in
> userspace.
> 
> The earlier RFC[1] generated a lot of great feedback and I heard no show
> stopping objections. Thus, I've incorporated all the feedback and have
> decided to post this as a proper patch series with hopes of eventually
> getting it in mainline.
> 
> I'm happy to do a few more passes if anyone has any further feedback
> or better ideas.
> 

After an initial pass through these, I think I like the approach. And I
don't have any huge structural comments or new ideas, just smaller comments
and notes.

I'll respond to each patch, but just wanted to say up front that this is
looking promising, in my opinion.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
@ 2021-05-02  3:58   ` John Hubbard
  2021-05-03 15:57     ` Logan Gunthorpe
  2021-05-11 16:05     ` Don Dutile
  0 siblings, 2 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02  3:58 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> In order to call upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. The only reason it does sleep is to allocate the seqbuf
> to print which devices are within the ACS path.
> 
> Switch the kmalloc call to use a passed in gfp_mask and don't print that
> message if the buffer fails to be allocated.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/pci/p2pdma.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 196382630363..bd89437faf06 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>   
>   static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
>   {
> -	if (!buf)
> +	if (!buf || !buf->buffer)

This is not great, sort of from an overall design point of view, even though
it makes the rest of the patch work. See below for other ideas, that will
avoid the need for this sort of odd point fix.

>   		return;
>   
>   	seq_buf_printf(buf, "%s;", pci_name(pdev));
> @@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
>   
>   static enum pci_p2pdma_map_type
>   upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
> -			      int *dist)
> +			      int *dist, gfp_t gfp_mask)
>   {
>   	struct seq_buf acs_list;
>   	bool acs_redirects;
>   	int ret;
>   
> -	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> -	if (!acs_list.buffer)
> -		return -ENOMEM;

Another odd thing: this used to check for memory failure and just give
up, and now it doesn't. Yes, I realize that it all still works at the
moment, but this is quirky and we shouldn't stop here.

Instead, a cleaner approach would be to push the memory allocation
slightly higher up the call stack, out to the
pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
the kmalloc() call, and fail out if it can't get a page for the seq_buf
buffer. Then you don't have to do all this odd stuff.

Furthermore, the call sites can then decide for themselves which GFP
flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().

A related thing: this whole exercise would go better if there were a
preparatory patch or two that changed the return codes in this file to
something less crazy. There are too many functions that can fail, but
are treated as if they sort-of-mostly-would-never-fail, in the hopes of
using the return value directly for counting and such. This is badly
mistaken, and it leads developers to try to avoid returning -ENOMEM
(which is what we need here).

Really, these functions should all be doing "0 for success, -ERRNO for
failure, and pass other values, including results, in the arg list".


> +	seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
>   
>   	ret = upstream_bridge_distance(provider, client, dist, &acs_redirects,
>   				       &acs_list);
>   	if (acs_redirects) {
>   		pci_warn(client, "ACS redirect is set between the client and provider (%s)\n",
>   			 pci_name(provider));
> -		/* Drop final semicolon */
> -		acs_list.buffer[acs_list.len-1] = 0;
> -		pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> -			 acs_list.buffer);
> +
> +		if (acs_list.buffer) {
> +			/* Drop final semicolon */
> +			acs_list.buffer[acs_list.len - 1] = 0;
> +			pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +				 acs_list.buffer);
> +		}
>   	}
>   
>   	if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
> @@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>   
>   		if (verbose)
>   			ret = upstream_bridge_distance_warn(provider,
> -					pci_client, &distance);
> +					pci_client, &distance, GFP_KERNEL);
>   		else
>   			ret = upstream_bridge_distance(provider, pci_client,
>   						       &distance, NULL, NULL);
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
@ 2021-05-02  5:35   ` John Hubbard
  2021-05-03 16:08     ` Logan Gunthorpe
  2021-05-11 16:05     ` Don Dutile
  2021-05-11 16:05   ` Don Dutile
  1 sibling, 2 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02  5:35 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> In order to use upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
> might sleep.
> 
> In order to avoid this, try to get the host bridge's device from
> bus->self, and if that is not set, just get the first element in the
> device list. It should be impossible for the host bridge's device to
> go away while references are held on child devices, so the first element
> should not be able to change and, thus, this should be safe.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bd89437faf06..473a08940fbc 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
>   static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>   				    bool same_host_bridge)
>   {
> -	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>   	const struct pci_p2pdma_whitelist_entry *entry;
> +	struct pci_dev *root = host->bus->self;
>   	unsigned short vendor, device;
>   
> +	/*
> +	 * This makes the assumption that the first device on the bus is the
> +	 * bridge itself and it has the devfn of 00.0. This assumption should
> +	 * hold for the devices in the white list above, and if there are cases
> +	 * where this isn't true they will have to be dealt with when such a
> +	 * case is added to the whitelist.

Actually, it makes the assumption that the first device *in the list*
(the host->bus-devices list) is 00.0.  The previous code made the
assumption that you wrote.

By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems
really short. From a naive point of view, I'd expect that there must be
a lot more CPUs/chipsets that can do pci p2p, what do you think? I
wonder if we have to be so super strict, anyway. It just seems extremely
limited, and I suspect there will be some additions to the list as soon
as we start to use this.


> +	 */
>   	if (!root)
> +		root = list_first_entry_or_null(&host->bus->devices,
> +						struct pci_dev, bus_list);

OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating.
Why is it OK to avoid taking any locks in order to retrieve the
first entry from the list, but in order to retrieve any other entry, you
have to aquire the pci_bus_sem, and get a reference as well? Something
is inconsistent there.

The new version here also no longer takes a reference on the device,
which is also cheating. But I'm guessing that the unstated assumption
here is that there is always at least one entry in the list. But if
that's true, then it's better to show clearly that assumption, instead
of hiding it in an implicit call that skips both locking and reference
counting.

You could add a new function, which is a cut-down version of pci_get_slot(),
like this, and call this from __host_bridge_whitelist():

/*
  * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem
  * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is
  * up, it is safe to call this, because there will always be a top-level PCI
  * root device.
  *
  * Other assumptions: the root device is the first device in the list, and the
  * root device is numbered 00.0.
  */
struct pci_dev *pci_get_root_slot(struct pci_bus *bus)
{
	struct pci_dev *root;
	unsigned devfn = PCI_DEVFN(0, 0);

	root = list_first_entry_or_null(&bus->devices, struct pci_dev,
					bus_list);
	if (root->devfn == devfn)
		goto out;

	root = NULL;
  out:
	pci_dev_get(root);
	return root;
}
EXPORT_SYMBOL(pci_get_root_slot);

...I think that's a lot clearer to the reader, about what's going on here.

Note that I'm not really sure if it *is* safe, I would need to ask other
PCIe subsystem developers with more experience. But I don't think anyone
is trying to make p2pdma calls so early that PCIe buses are uninitialized.


> +
> +	if (!root || root->devfn)
>   		return false;
>   
>   	vendor = root->vendor;
>   	device = root->device;
> -	pci_dev_put(root);
>   
>   	for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
>   		if (vendor != entry->vendor || device != entry->device)
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-04-08 17:01 ` [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
@ 2021-05-02 19:58   ` John Hubbard
  2021-05-03 16:17     ` Logan Gunthorpe
  2021-05-11 16:05     ` Don Dutile
  0 siblings, 2 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02 19:58 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
> 
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 473a08940fbc..2574a062a255 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -825,11 +825,18 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>   static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
>   						    struct pci_dev *client)
>   {
> +	enum pci_p2pdma_map_type ret;
> +
>   	if (!provider->p2pdma)
>   		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   
> -	return xa_to_value(xa_load(&provider->p2pdma->map_types,
> -				   map_types_idx(client)));
> +	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
> +				  map_types_idx(client)));
> +	if (ret != PCI_P2PDMA_MAP_UNKNOWN)
> +		return ret;
> +
> +	return upstream_bridge_distance_warn(provider, client, NULL,
> +					     GFP_ATOMIC);

Returning a "bridge distance" from a "get map type" routine is jarring,
and I think it is because of a pre-existing problem: the above function
is severely misnamed. Let's try renaming it (and the other one) to
approximately:

     upstream_bridge_map_type_warn()
     upstream_bridge_map_type()

...and that should fix that. Well, that, plus tweaking the kernel doc
comments, which are also confused. I think someone started off thinking
about distances through PCIe, but in the end, the routine boils down to
just a few situations that are not distances at all.

Also, the above will read a little better if it is written like this:

	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
				  map_types_idx(client)));

	if (ret == PCI_P2PDMA_MAP_UNKNOWN)
		ret = upstream_bridge_map_type_warn(provider, client, NULL,
						    GFP_ATOMIC);
	
	return ret;


>   }
>   
>   static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
> @@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	case PCI_P2PDMA_MAP_BUS_ADDR:
>   		return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>   	default:
> -		WARN_ON_ONCE(1);

Why? Or at least, why, in this patch? It looks like an accidental
leftover from something, seeing as how it is not directly related to the
patch, and is not mentioned at all.


thanks,
-- 
John Hubbard
NVIDIA

>   		return 0;
>   	}
>   }
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-04-08 17:01 ` [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device Logan Gunthorpe
@ 2021-05-02 20:41   ` John Hubbard
  2021-05-03 16:30     ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-02 20:41 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
> struct device (of the client doing the DMA transfer). Thus move the
> conversion to struct pci_devs for the provider and client into this
> function.

Actually, this is the wrong direction to go! All of these pre-existing
pci_*() functions have a small problem already: they are dealing with
struct device, instead of struct pci_dev. And so refactoring should be
pushing the conversion to pci_dev *up* the calling stack, not lower as
the patch here proposes.

Also, there is no improvement in clarity by passing in (pgmap, dev)
instead of the previous (provider, client). Now you have to do more type
checking in the leaf function, which is another indication of a problem.

Let's go that direction, please? Just convert to pci_dev much higher in
the calling stack, and you'll find that everything fits together better.
And it's OK to pass in extra params if that turns out to be necessary,
after all.

thanks,
-- 
John Hubbard
NVIDIA

> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c | 29 +++++++++++------------------
>   1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 2574a062a255..bcb1a6d6119d 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -822,14 +822,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>   }
>   EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>   
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
> -						    struct pci_dev *client)
> +static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +						    struct device *dev)
>   {
> +	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   	enum pci_p2pdma_map_type ret;
> +	struct pci_dev *client;
>   
>   	if (!provider->p2pdma)
>   		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   
> +	if (!dev_is_pci(dev))
> +		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> +
> +	client = to_pci_dev(dev);
> +
>   	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
>   				  map_types_idx(client)));
>   	if (ret != PCI_P2PDMA_MAP_UNKNOWN)
> @@ -871,14 +878,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   {
>   	struct pci_p2pdma_pagemap *p2p_pgmap =
>   		to_p2p_pgmap(sg_page(sg)->pgmap);
> -	struct pci_dev *client;
> -
> -	if (WARN_ON_ONCE(!dev_is_pci(dev)))
> -		return 0;
>   
> -	client = to_pci_dev(dev);
> -
> -	switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
> +	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
>   	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>   		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>   	case PCI_P2PDMA_MAP_BUS_ADDR:
> @@ -901,17 +902,9 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
>   void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	struct pci_p2pdma_pagemap *p2p_pgmap =
> -		to_p2p_pgmap(sg_page(sg)->pgmap);
>   	enum pci_p2pdma_map_type map_type;
> -	struct pci_dev *client;
> -
> -	if (WARN_ON_ONCE(!dev_is_pci(dev)))
> -		return;
> -
> -	client = to_pci_dev(dev);
>   
> -	map_type = pci_p2pdma_map_type(p2p_pgmap->provider, client);
> +	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
>   
>   	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
>   		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
  2021-04-27 19:22   ` Jason Gunthorpe
  2021-04-27 19:31   ` Jason Gunthorpe
@ 2021-05-02 21:23   ` John Hubbard
  2021-05-03 16:38     ` Logan Gunthorpe
  2021-05-11 16:05   ` Don Dutile
  3 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-02 21:23 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> dma_map_sg() either returns a positive number indicating the number
> of entries mapped or zero indicating that resources were not available
> to create the mapping. When zero is returned, it is always safe to retry
> the mapping later once resources have been freed.
> 
> Once P2PDMA pages are mixed into the SGL there may be pages that may
> never be successfully mapped with a given device because that device may
> not actually be able to access those pages. Thus, multiple error
> conditions will need to be distinguished to determine weather a retry
> is safe.
> 
> Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
> convention from dma_map_sg(). The function will return a positive
> integer on success or a negative errno on failure.
> 
> ENOMEM will be used to indicate a resource failure and EREMOTEIO to
> indicate that a P2PDMA page is not mappable.
> 
> The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
> level implementations that P2PDMA pages are allowed and to warn if a
> caller introduces them into the regular dma_map_sg() interface.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   include/linux/dma-mapping.h | 15 +++++++++++
>   kernel/dma/mapping.c        | 52 ++++++++++++++++++++++++++++++++-----
>   2 files changed, 61 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 2a984cb4d1e0..50b8f586cf59 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -60,6 +60,12 @@
>    * at least read-only at lesser-privileged levels).
>    */
>   #define DMA_ATTR_PRIVILEGED		(1UL << 9)
> +/*
> + * __DMA_ATTR_PCI_P2PDMA: This should not be used directly, use
> + * dma_map_sg_p2pdma() instead. Used internally to indicate that the
> + * caller is using the dma_map_sg_p2pdma() interface.
> + */
> +#define __DMA_ATTR_PCI_P2PDMA		(1UL << 10)
>

As mentioned near the top of this file,
Documentation/core-api/dma-attributes.rst also needs to be updated, for
this new item.


>   /*
>    * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
> @@ -107,6 +113,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs);
>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
> +int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   				      int nents, enum dma_data_direction dir,
>   				      unsigned long attrs);
> @@ -160,6 +168,12 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   {
>   	return 0;
>   }
> +static inline int dma_map_sg_p2pdma_attrs(struct device *dev,
> +		struct scatterlist *sg, int nents, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	return 0;
> +}
>   static inline void dma_unmap_sg_attrs(struct device *dev,
>   		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   		unsigned long attrs)
> @@ -392,6 +406,7 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
>   #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
>   #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
>   #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
> +#define dma_map_sg_p2pdma(d, s, n, r) dma_map_sg_p2pdma_attrs(d, s, n, r, 0)

This hunk is fine, of course.

But, about pre-existing issues: note to self, or to anyone: send a patch to turn
these into inline functions. The macro redirection here is not adding value, but
it does make things just a little bit worse.


>   #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
>   #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
>   #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index b6a633679933..923089c4267b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -177,12 +177,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   }
>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>   
> -/*
> - * dma_maps_sg_attrs returns 0 on error and > 0 on success.
> - * It should never return a value < 0.
> - */

It would be better to leave the comment in place, given the non-standard
return values. However, looking around here, it would be better if we go
with the standard -ERRNO for error, and >0 for sucess.

There are pre-existing BUG_ON() and WARN_ON_ONCE() items that are partly
an attempt to compensate for not being able to return proper -ERRNO
codes. For example, this:

	    BUG_ON(!valid_dma_direction(dir));

...arguably should be more like this:

         if(WARN_ON_ONCE(!valid_dma_direction(dir)))
                 return -EINVAL;


> -int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> -		enum dma_data_direction dir, unsigned long attrs)
> +static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	int ents;
> @@ -197,6 +193,20 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>   	else
>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> +
> +	return ents;
> +}
> +
> +/*
> + * dma_maps_sg_attrs returns 0 on error and > 0 on success.
> + * It should never return a value < 0.
> + */
> +int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> +		enum dma_data_direction dir, unsigned long attrs)
> +{
> +	int ents;

Pre-existing note, feel free to ignore: the ents and nents in the same
routines together, are way too close to the each other in naming. Maybe
using "requested_nents", or "nents_arg", for the incoming value, would
help.

> +
> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>   	BUG_ON(ents < 0);
>   	debug_dma_map_sg(dev, sg, nents, ents, dir);
>   
> @@ -204,6 +214,36 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   }
>   EXPORT_SYMBOL(dma_map_sg_attrs);
>   
> +/*
> + * like dma_map_sg_attrs, but returns a negative errno on error (and > 0
> + * on success). This function must be used if PCI P2PDMA pages might
> + * be in the scatterlist.

Let's turn this into a kernel doc comment block, seeing as how it clearly
wants to be--you're almost there already. You've even reinvented @Return,
below. :)

> + *
> + * On error this function may return:
> + *    -ENOMEM indicating that there was not enough resources available and
> + *      the transfer may be retried later
> + *    -EREMOTEIO indicating that P2PDMA pages were included but cannot
> + *      be mapped by the specified device, retries will always fail
> + *
> + * The scatterlist should be unmapped with the regular dma_unmap_sg[_attrs]().

How about:

"The scatterlist should be unmapped via dma_unmap_sg[_attrs]()."

> + */
> +int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	int ents;
> +
> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir,
> +				  attrs | __DMA_ATTR_PCI_P2PDMA);
> +	if (!ents)
> +		ents = -ENOMEM;
> +
> +	if (ents > 0)
> +		debug_dma_map_sg(dev, sg, nents, ents, dir);
> +
> +	return ents;
> +}
> +EXPORT_SYMBOL_GPL(dma_map_sg_p2pdma_attrs);
> +
>   void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   				      int nents, enum dma_data_direction dir,
>   				      unsigned long attrs)
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  2021-04-08 17:01 ` [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
@ 2021-05-02 22:34   ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02 22:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Make use of the third free LSB in scatterlist's page_link on 64bit systems.
> 
> The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
> given SGL segments dma_address points to a PCI bus address.
> dma_unmap_sg_p2pdma() will need to perform different cleanup when a
> segment is marked as P2PDMA.
> 
> Using this bit requires adding an additional dependency on CONFIG_64BIT to
> CONFIG_PCI_P2PDMA. This should be acceptable as the majority of P2PDMA
> use cases are restricted to newer root complexes and roughly require the
> extra address space for memory BARs used in the transactions.

Totally agree with the CONFIG_64BIT call.

Also, I have failed to find anything wrong with this patch. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/Kconfig         |  2 +-
>   include/linux/scatterlist.h | 49 ++++++++++++++++++++++++++++++++++---
>   2 files changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 0c473d75e625..90b4bddb3300 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -163,7 +163,7 @@ config PCI_PASID
>   
>   config PCI_P2PDMA
>   	bool "PCI peer-to-peer transfer support"
> -	depends on ZONE_DEVICE
> +	depends on ZONE_DEVICE && 64BIT
>   	select GENERIC_ALLOCATOR
>   	help
>   	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 6f70572b2938..5525d3ebf36f 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -58,6 +58,21 @@ struct sg_table {
>   #define SG_CHAIN	0x01UL
>   #define SG_END		0x02UL
>   
> +/*
> + * bit 2 is the third free bit in the page_link on 64bit systems which
> + * is used by dma_unmap_sg() to determine if the dma_address is a PCI
> + * bus address when doing P2PDMA.
> + * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this.
> + */
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +#define SG_PCI_P2PDMA	0x04UL
> +#else
> +#define SG_PCI_P2PDMA	0x00UL
> +#endif
> +
> +#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_PCI_P2PDMA)
> +
>   /*
>    * We overload the LSB of the page pointer to indicate whether it's
>    * a valid sg entry, or whether it points to the start of a new scatterlist.
> @@ -65,8 +80,9 @@ struct sg_table {
>    */
>   #define sg_is_chain(sg)		((sg)->page_link & SG_CHAIN)
>   #define sg_is_last(sg)		((sg)->page_link & SG_END)
> +#define sg_is_pci_p2pdma(sg)	((sg)->page_link & SG_PCI_P2PDMA)
>   #define sg_chain_ptr(sg)	\
> -	((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END)))
> +	((struct scatterlist *) ((sg)->page_link & ~SG_PAGE_LINK_MASK))
>   
>   /**
>    * sg_assign_page - Assign a given page to an SG entry
> @@ -80,13 +96,13 @@ struct sg_table {
>    **/
>   static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
>   {
> -	unsigned long page_link = sg->page_link & (SG_CHAIN | SG_END);
> +	unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK;
>   
>   	/*
>   	 * In order for the low bit stealing approach to work, pages
>   	 * must be aligned at a 32-bit boundary as a minimum.
>   	 */
> -	BUG_ON((unsigned long) page & (SG_CHAIN | SG_END));
> +	BUG_ON((unsigned long) page & SG_PAGE_LINK_MASK);
>   #ifdef CONFIG_DEBUG_SG
>   	BUG_ON(sg_is_chain(sg));
>   #endif
> @@ -120,7 +136,7 @@ static inline struct page *sg_page(struct scatterlist *sg)
>   #ifdef CONFIG_DEBUG_SG
>   	BUG_ON(sg_is_chain(sg));
>   #endif
> -	return (struct page *)((sg)->page_link & ~(SG_CHAIN | SG_END));
> +	return (struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK);
>   }
>   
>   /**
> @@ -222,6 +238,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>   	sg->page_link &= ~SG_END;
>   }
>   
> +/**
> + * sg_mark_pci_p2pdma - Mark the scatterlist entry for PCI p2pdma
> + * @sg:		 SG entryScatterlist
> + *
> + * Description:
> + *   Marks the passed in sg entry to indicate that the dma_address is
> + *   a PCI bus address.
> + **/
> +static inline void sg_mark_pci_p2pdma(struct scatterlist *sg)
> +{
> +	sg->page_link |= SG_PCI_P2PDMA;
> +}
> +
> +/**
> + * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma
> + * @sg:		 SG entryScatterlist
> + *
> + * Description:
> + *   Clears the PCI P2PDMA mark
> + **/
> +static inline void sg_unmark_pci_p2pdma(struct scatterlist *sg)
> +{
> +	sg->page_link &= ~SG_PCI_P2PDMA;
> +}
> +
>   /**
>    * sg_phys - Return physical address of an sg entry
>    * @sg:	     SG entry
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
@ 2021-05-02 22:44   ` John Hubbard
  2021-05-03 16:39     ` Logan Gunthorpe
  2021-05-11 16:06   ` Don Dutile
  1 sibling, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-02 22:44 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual iommu mapping.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c       | 34 +++++++++++++++++++++++-----------
>   include/linux/pci-p2pdma.h | 15 +++++++++++++++
>   2 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bcb1a6d6119d..38c93f57a941 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -20,13 +20,6 @@
>   #include <linux/seq_buf.h>
>   #include <linux/xarray.h>
>   
> -enum pci_p2pdma_map_type {
> -	PCI_P2PDMA_MAP_UNKNOWN = 0,
> -	PCI_P2PDMA_MAP_NOT_SUPPORTED,
> -	PCI_P2PDMA_MAP_BUS_ADDR,
> -	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> -};
> -
>   struct pci_p2pdma {
>   	struct gen_pool *pool;
>   	bool p2pmem_published;
> @@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>   }
>   EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>   
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> -						    struct device *dev)
> +/**
> + * pci_p2pdma_map_type - return the type of mapping that should be used for
> + *	a given device and pgmap
> + * @pgmap: the pagemap of a page to determine the mapping type for
> + * @dev: device that is mapping the page
> + * @dma_attrs: the attributes passed to the dma_map operation --
> + *	this is so they can be checked to ensure P2PDMA pages were not
> + *	introduced into an incorrect interface (like dma_map_sg). *
> + *
> + * Returns one of:
> + *	PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
> + *	PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
> + *	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
> + */
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +		struct device *dev, unsigned long dma_attrs)
>   {
>   	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   	enum pci_p2pdma_map_type ret;
>   	struct pci_dev *client;
>   
> +	WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
> +		  "PCI P2PDMA pages were mapped with dma_map_sg!");

This really ought to also return -EINVAL, assuming that my review suggestions
about return types, in earlier patches, are acceptable.

> +
>   	if (!provider->p2pdma)
>   		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   
> @@ -879,7 +889,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	struct pci_p2pdma_pagemap *p2p_pgmap =
>   		to_p2p_pgmap(sg_page(sg)->pgmap);
>   
> -	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
> +	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
> +				    __DMA_ATTR_PCI_P2PDMA)) {
>   	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>   		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>   	case PCI_P2PDMA_MAP_BUS_ADDR:
> @@ -904,7 +915,8 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   {
>   	enum pci_p2pdma_map_type map_type;
>   
> -	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
> +	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
> +				       __DMA_ATTR_PCI_P2PDMA);

These areas might end up looking a bit different, if my suggestion about
applying pci_dev type safety throughout are accepted.

The patch looks generally correct, aside from these details.

thanks,
-- 
John Hubbard
NVIDIA

>   
>   	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
>   		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..a06072ac3a52 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -16,6 +16,13 @@
>   struct block_device;
>   struct scatterlist;
>   
> +enum pci_p2pdma_map_type {
> +	PCI_P2PDMA_MAP_UNKNOWN = 0,
> +	PCI_P2PDMA_MAP_NOT_SUPPORTED,
> +	PCI_P2PDMA_MAP_BUS_ADDR,
> +	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> +};
> +
>   #ifdef CONFIG_PCI_P2PDMA
>   int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   		u64 offset);
> @@ -30,6 +37,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
>   					 unsigned int *nents, u32 length);
>   void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
>   void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +		struct device *dev, unsigned long dma_attrs);
>   int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> @@ -83,6 +92,12 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
>   static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>   {
>   }
> +static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
> +		struct dev_pagemap *pgmap, struct device *dev,
> +		unsigned long dma_attrs)
> +{
> +	return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> +}
>   static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
>   		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   		unsigned long attrs)
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
@ 2021-05-02 22:52   ` John Hubbard
  2021-05-03  0:50   ` John Hubbard
  1 sibling, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02 22:52 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
> implementations. It takes an scatterlist segment that must point to a
> pci_p2pdma struct page and will map it if the mapping requires a bus
> address.
> 
> The return value indicates whether the mapping required a bus address
> or whether the caller still needs to map the segment normally. If the
> segment should not be mapped, -EREMOTEIO is returned.
> 
> This helper uses a state structure to track the changes to the
> pgmap across calls and avoid needing to lookup into the xarray for
> every page.
> 
> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
> dma_map_sg() implementations where the sg segment containing the page
> differs from the sg segment containing the DMA address.
> 

Hard to properly review this patch by itself, because it doesn't show
any callers of the new routine. If you end up shuffling patches and/or
refactoring for other reasons, it would be nice if the next version of
the series included a caller here. In particular, the new
pci_p2pdma_map_state concept is something I want to double-check, to
see if it hits any common pitfalls. I'm sure it doesn't, but still. :)

Meanwhile, I'll keep working through the series, and come back to this
one when I have seen the callers.

thanks,
-- 
John Hubbard
NVIDIA

> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c       | 65 ++++++++++++++++++++++++++++++++++++++
>   include/linux/pci-p2pdma.h | 21 ++++++++++++
>   2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 38c93f57a941..44ad7664e875 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   }
>   EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>   
> +/**
> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
> + * @state: State structure that should be declared on the stack outside of
> + *	the for_each_sg() loop and initialized to zero.
> + * @dev: DMA device that's doing the mapping operation
> + * @sg: scatterlist segment to map
> + * @attrs: dma mapping attributes
> + *
> + * This is a helper to be used by non-iommu dma_map_sg() implementations where
> + * the sg segment is the same for the page_link and the dma_address.
> + *
> + * Attempt to map a single segment in an SGL with the PCI bus address.
> + * The segment must point to a PCI P2PDMA page and thus must be
> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
> + *
> + * Returns 1 if the segment was mapped, 0 if the segment should be mapped
> + * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
> + * be mapped at all.
> + */
> +int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +			   struct device *dev, struct scatterlist *sg,
> +			   unsigned long dma_attrs)
> +{
> +	if (state->pgmap != sg_page(sg)->pgmap) {
> +		state->pgmap = sg_page(sg)->pgmap;
> +		state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
> +		state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> +	}
> +
> +	switch (state->map) {
> +	case PCI_P2PDMA_MAP_BUS_ADDR:
> +		sg->dma_address = sg_phys(sg) + state->bus_off;
> +		sg_dma_len(sg) = sg->length;
> +		sg_mark_pci_p2pdma(sg);
> +		return 1;
> +	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> +		return 0;
> +	default:
> +		return -EREMOTEIO;
> +	}
> +}
> +
> +/**
> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
> + *	be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> + * @pg_sg: scatterlist segment with the page to map
> + * @dma_sg: scatterlist segment to assign a dma address to
> + *
> + * This is a helper for iommu dma_map_sg() implementations when the
> + * segment for the dma address differs from the segment containing the
> + * source page.
> + *
> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
> + * returned PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +				struct scatterlist *dma_sg)
> +{
> +	struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
> +
> +	dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
> +	sg_dma_len(dma_sg) = pg_sg->length;
> +	sg_mark_pci_p2pdma(dma_sg);
> +}
> +
>   /**
>    * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>    *		to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index a06072ac3a52..49e7679403cf 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -13,6 +13,12 @@
>   
>   #include <linux/pci.h>
>   
> +struct pci_p2pdma_map_state {
> +	struct dev_pagemap *pgmap;
> +	int map;
> +	u64 bus_off;
> +};
> +
>   struct block_device;
>   struct scatterlist;
>   
> @@ -43,6 +49,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
> +int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +		struct device *dev, struct scatterlist *sg,
> +		unsigned long dma_attrs);
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +				struct scatterlist *dma_sg);
>   int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
>   			    bool *use_p2pdma);
>   ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
> @@ -109,6 +120,16 @@ static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
>   		unsigned long attrs)
>   {
>   }
> +static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +		struct device *dev, struct scatterlist *sg,
> +		unsigned long dma_attrs)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +					      struct scatterlist *dma_sg)
> +{
> +}
>   static inline int pci_p2pdma_enable_store(const char *page,
>   		struct pci_dev **p2p_dev, bool *use_p2pdma)
>   {
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
  2021-04-27 19:33   ` Jason Gunthorpe
@ 2021-05-02 23:28   ` John Hubbard
  2021-05-02 23:32     ` John Hubbard
                       ` (2 more replies)
  1 sibling, 3 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-02 23:28 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
> PCI P2PDMA pages directly without a hack in the callers. This allows
> for heterogeneous SGLs that contain both P2PDMA and regular pages.
> 
> SGL segments that contain PCI bus addresses are marked with
> sg_mark_pci_p2pdma() and are ignored when unmapped.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   kernel/dma/direct.c | 25 ++++++++++++++++++++++---
>   1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 002268262c9a..108dfb4ecbd5 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -13,6 +13,7 @@
>   #include <linux/vmalloc.h>
>   #include <linux/set_memory.h>
>   #include <linux/slab.h>
> +#include <linux/pci-p2pdma.h>
>   #include "direct.h"
>   
>   /*
> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,

This routine now deserves a little bit of commenting, now that it is
doing less obvious things. How about something like this:

/*
  * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the
  * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the
  * SG_PCI_P2PDMA mark
  */
void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
		int nents, enum dma_data_direction dir, unsigned long attrs)
{


>   	struct scatterlist *sg;
>   	int i;
>   
> -	for_each_sg(sgl, sg, nents, i)
> +	for_each_sg(sgl, sg, nents, i) {
> +		if (sg_is_pci_p2pdma(sg)) {
> +			sg_unmark_pci_p2pdma(sg);
> +			continue;
> +		}
> +
>   		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>   			     attrs);
> +	}

The same thing can be achieved with fewer lines and a bit more clarity.
Can we please do it like this instead:

	for_each_sg(sgl, sg, nents, i) {
		if (sg_is_pci_p2pdma(sg))
			sg_unmark_pci_p2pdma(sg);
		else
			dma_direct_unmap_page(dev, sg->dma_address,
					      sg_dma_len(sg), dir, attrs);
	}


>   }
>   #endif
>   

Also here, a block comment for the function would be nice. How about
approximately this:

/*
  * Maps each SG segment. Returns the number of entries mapped, or 0 upon
  * failure. If any entry could not be mapped, then no entries are mapped.
  */

I'll stop complaining about the pre-existing return code conventions,
since by now you know what I was thinking of saying. :)

>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
> -	int i;
> +	struct pci_p2pdma_map_state p2pdma_state = {};

Is it worth putting this stuff on the stack--is there a noticeable
performance improvement from caching the state? Because if it's
invisible, then simplicity is better. I suspect you're right, and that
it *is* worth it, but it's good to know for real.

>   	struct scatterlist *sg;
> +	int i, ret = 0;
>   
>   	for_each_sg(sgl, sg, nents, i) {
> +		if (is_pci_p2pdma_page(sg_page(sg))) {
> +			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
> +						     attrs);
> +			if (ret < 0) {
> +				goto out_unmap;
> +			} else if (ret) {
> +				ret = 0;
> +				continue;

Is this a bug? If neither of those "if" branches fires (ret == 0), then
the code (probably unintentionally) falls through and continues on to
attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page!

See below for suggestions:

> +			}
> +		}
> +
>   		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
>   				sg->offset, sg->length, dir, attrs);
>   		if (sg->dma_address == DMA_MAPPING_ERROR)

This is another case in which "continue" is misleading and not as good
as "else". Because unless I'm wrong above, you really only want to take
one path *or* the other.

Also, the "else if (ret)" can be simplified to just setting ret = 0
unconditionally.

Given all that, here's a suggested alternative, which is both shorter
and clearer, IMHO:

	for_each_sg(sgl, sg, nents, i) {
		if (is_pci_p2pdma_page(sg_page(sg))) {
			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
						     attrs);
			if (ret < 0)
				goto out_unmap;
			else
				ret = 0;
		} else {
			sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
					sg->offset, sg->length, dir, attrs);
			if (sg->dma_address == DMA_MAPPING_ERROR)
				goto out_unmap;
			sg_dma_len(sg) = sg->length;
		}
	}

thanks,
-- 
John Hubbard
NVIDIA

> @@ -411,7 +430,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>   
>   out_unmap:
>   	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
> -	return 0;
> +	return ret;
>   }
>   
>   dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
> 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-02 23:28   ` John Hubbard
@ 2021-05-02 23:32     ` John Hubbard
  2021-05-03 17:06       ` Logan Gunthorpe
  2021-05-03 16:55     ` Logan Gunthorpe
  2021-05-03 17:04     ` Logan Gunthorpe
  2 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-02 23:32 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/2/21 4:28 PM, John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
...
>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> 
> This routine now deserves a little bit of commenting, now that it is
> doing less obvious things. How about something like this:
> 
> /*
> * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the
> * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the
> * SG_PCI_P2PDMA mark
> */

I got that kind of wrong. They *were* mapped, but need to be left mostly
alone...maybe you can word it better. Here's my second draft:

/*
  * Unmaps pages, except for PCI_P2PDMA pages, which should not be unmapped at
  * this point. Instead of unmapping PCI_P2PDMA entries, simply remove the
  * SG_PCI_P2PDMA mark.
  */

...am I getting close? :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
@ 2021-05-03  0:32   ` John Hubbard
  2021-05-03 17:09     ` Logan Gunthorpe
  2021-05-11 16:06   ` Don Dutile
  1 sibling, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03  0:32 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Add a flags member to the dma_map_ops structure with one flag to
> indicate support for PCI P2PDMA.
> 
> Also, add a helper to check if a device supports PCI P2PDMA.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   include/linux/dma-map-ops.h |  3 +++
>   include/linux/dma-mapping.h |  5 +++++
>   kernel/dma/mapping.c        | 18 ++++++++++++++++++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 51872e736e7b..481892822104 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -12,6 +12,9 @@
>   struct cma;
>   
>   struct dma_map_ops {
> +	unsigned int flags;
> +#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
> +

Can we move this up and out of the struct member area, so that it looks
more like this:

/*
  * Values for struct dma_map_ops.flags:
  *
  * DMA_F_PCI_P2PDMA_SUPPORTED: <documentation here...this is a good place to
  * explain exactly what this flag is for.>
  */
#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)

struct dma_map_ops {
	unsigned int flags;


>   	void *(*alloc)(struct device *dev, size_t size,
>   			dma_addr_t *dma_handle, gfp_t gfp,
>   			unsigned long attrs);
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 50b8f586cf59..c31980ecca62 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>   		unsigned long attrs);
>   bool dma_can_mmap(struct device *dev);
>   int dma_supported(struct device *dev, u64 mask);
> +bool dma_pci_p2pdma_supported(struct device *dev);
>   int dma_set_mask(struct device *dev, u64 mask);
>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>   u64 dma_get_required_mask(struct device *dev);
> @@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 mask)
>   {
>   	return 0;
>   }
> +static inline bool dma_pci_p2pdma_supported(struct device *dev)
> +{
> +	return 0;

Should be:
	
	return false;

> +}
>   static inline int dma_set_mask(struct device *dev, u64 mask)
>   {
>   	return -EIO;
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 923089c4267b..ce44a0fcc4e5 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
>   }
>   EXPORT_SYMBOL(dma_supported);
>   
> +bool dma_pci_p2pdma_supported(struct device *dev)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> +	/* if ops is not set, dma direct will be used which supports P2PDMA */
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Note: dma_ops_bypass is not checked here because P2PDMA should
> +	 * not be used with dma mapping ops that do not have support even
> +	 * if the specific device is bypassing them.
> +	 */
> +
> +	return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;

Wow, rather unusual combination of things in order decide this. It feels
a bit over-complicated to have flags and ops and a bool function all
dealing with the same 1-bit answer, but there is no caller shown here,
so I'll have to come back to this after reviewing subsequent patches.

thanks,
-- 
John Hubbard
NVIDIA

> +}
> +EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
> +
>   #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
>   void arch_dma_set_mask(struct device *dev, u64 mask);
>   #else
> 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
  2021-05-02 22:52   ` John Hubbard
@ 2021-05-03  0:50   ` John Hubbard
  2021-05-03 17:15     ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03  0:50 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
> implementations. It takes an scatterlist segment that must point to a
> pci_p2pdma struct page and will map it if the mapping requires a bus
> address.
> 
> The return value indicates whether the mapping required a bus address
> or whether the caller still needs to map the segment normally. If the
> segment should not be mapped, -EREMOTEIO is returned.
> 
> This helper uses a state structure to track the changes to the
> pgmap across calls and avoid needing to lookup into the xarray for
> every page.
> 

OK, coming back to this patch, after seeing how it is used later in
the series...

> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
> dma_map_sg() implementations where the sg segment containing the page
> differs from the sg segment containing the DMA address.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c       | 65 ++++++++++++++++++++++++++++++++++++++
>   include/linux/pci-p2pdma.h | 21 ++++++++++++
>   2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 38c93f57a941..44ad7664e875 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   }
>   EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>   
> +/**
> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
> + * @state: State structure that should be declared on the stack outside of
> + *	the for_each_sg() loop and initialized to zero.

Silly fine point for the docs here: it doesn't actually have to be on
the stack, so I don't think you need to write that constraint in the
documentation. It just has be be somehow allocated and zeroed.


> + * @dev: DMA device that's doing the mapping operation
> + * @sg: scatterlist segment to map
> + * @attrs: dma mapping attributes
> + *
> + * This is a helper to be used by non-iommu dma_map_sg() implementations where
> + * the sg segment is the same for the page_link and the dma_address.
> + *
> + * Attempt to map a single segment in an SGL with the PCI bus address.
> + * The segment must point to a PCI P2PDMA page and thus must be
> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.

Should this be backed up with actual checks in the function, that
the prerequisites are met?

> + *
> + * Returns 1 if the segment was mapped, 0 if the segment should be mapped
> + * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
> + * be mapped at all.
> + */
> +int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +			   struct device *dev, struct scatterlist *sg,
> +			   unsigned long dma_attrs)
> +{
> +	if (state->pgmap != sg_page(sg)->pgmap) {
> +		state->pgmap = sg_page(sg)->pgmap;
> +		state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
> +		state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> +	}

I'll quote myself from patch 9, because I had a comment there that actually
was meant for this patch:

Is it worth putting this stuff on the caller's stack? I mean, is there a
noticeable performance improvement from caching the state? Because if
it's invisible, then simplicity is better. I suspect you're right, and
that it *is* worth it, but it's good to know for real.


> +
> +	switch (state->map) {
> +	case PCI_P2PDMA_MAP_BUS_ADDR:
> +		sg->dma_address = sg_phys(sg) + state->bus_off;
> +		sg_dma_len(sg) = sg->length;
> +		sg_mark_pci_p2pdma(sg);
> +		return 1;
> +	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> +		return 0;
> +	default:
> +		return -EREMOTEIO;
> +	}
> +}
> +
> +/**
> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
> + *	be mapped with PCI_P2PDMA_MAP_BUS_ADDR

Or:

  * pci_p2pdma_map_bus_segment - map an SG segment that is already known
  * to be mapped with PCI_P2PDMA_MAP_BUS_ADDR

Also, should that prerequisite be backed up with checks in the function?

> + * @pg_sg: scatterlist segment with the page to map
> + * @dma_sg: scatterlist segment to assign a dma address to
> + *
> + * This is a helper for iommu dma_map_sg() implementations when the
> + * segment for the dma address differs from the segment containing the
> + * source page.
> + *
> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
> + * returned PCI_P2PDMA_MAP_BUS_ADDR.

Another prerequisite, so same question: do you think that the code should
also check that this prerequisite is met?

thanks,
-- 
John Hubbard
NVIDIA

> + */
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +				struct scatterlist *dma_sg)
> +{
> +	struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
> +
> +	dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
> +	sg_dma_len(dma_sg) = pg_sg->length;
> +	sg_mark_pci_p2pdma(dma_sg);
> +}
> +
>   /**
>    * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>    *		to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index a06072ac3a52..49e7679403cf 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -13,6 +13,12 @@
>   
>   #include <linux/pci.h>
>   
> +struct pci_p2pdma_map_state {
> +	struct dev_pagemap *pgmap;
> +	int map;
> +	u64 bus_off;
> +};
> +
>   struct block_device;
>   struct scatterlist;
>   
> @@ -43,6 +49,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
> +int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +		struct device *dev, struct scatterlist *sg,
> +		unsigned long dma_attrs);
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +				struct scatterlist *dma_sg);
>   int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
>   			    bool *use_p2pdma);
>   ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
> @@ -109,6 +120,16 @@ static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
>   		unsigned long attrs)
>   {
>   }
> +static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
> +		struct device *dev, struct scatterlist *sg,
> +		unsigned long dma_attrs)
> +{
> +	return 0;
> +}
> +static inline void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> +					      struct scatterlist *dma_sg)
> +{
> +}
>   static inline int pci_p2pdma_enable_store(const char *page,
>   		struct pci_dev **p2p_dev, bool *use_p2pdma)
>   {
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
  2021-04-27 19:43   ` Jason Gunthorpe
@ 2021-05-03  1:14   ` John Hubbard
  2021-05-06 23:59     ` Logan Gunthorpe
  2021-05-11 16:06   ` Don Dutile
  2 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03  1:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> When a PCI P2PDMA page is seen, set the IOVA length of the segment
> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
> apply the appropriate bus address to the segment. The IOVA is not
> created if the scatterlist only consists of P2PDMA pages.
> 
> Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
> indicate bus address segments. On unmap, P2PDMA segments are skipped
> over when determining the start and end IOVA addresses.
> 
> With this change, the flags variable in the dma_map_ops is
> set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
> P2PDMA pages.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/iommu/dma-iommu.c | 66 ++++++++++++++++++++++++++++++++++-----
>   1 file changed, 58 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index af765c813cc8..ef49635f9819 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -20,6 +20,7 @@
>   #include <linux/mm.h>
>   #include <linux/mutex.h>
>   #include <linux/pci.h>
> +#include <linux/pci-p2pdma.h>
>   #include <linux/swiotlb.h>
>   #include <linux/scatterlist.h>
>   #include <linux/vmalloc.h>
> @@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
>   		sg_dma_address(s) = DMA_MAPPING_ERROR;
>   		sg_dma_len(s) = 0;
>   
> +		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {

Newbie question: I'm in the dark as to why the !s_iova_len check is there,
can you please enlighten me?

> +			if (i > 0)
> +				cur = sg_next(cur);
> +
> +			pci_p2pdma_map_bus_segment(s, cur);
> +			count++;
> +			cur_len = 0;
> +			continue;
> +		}
> +

This is really an if/else condition. And arguably, it would be better
to split out two subroutines, and call one or the other depending on
the result of if is_pci_p2pdma_page(), instead of this "continue" approach.

>   		/*
>   		 * Now fill in the real DMA data. If...
>   		 * - there is a valid output segment to append to
> @@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct scatterlist *s, *prev = NULL;
>   	int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
> +	struct dev_pagemap *pgmap = NULL;
> +	enum pci_p2pdma_map_type map_type;
>   	dma_addr_t iova;
>   	size_t iova_len = 0;
>   	unsigned long mask = dma_get_seg_boundary(dev);
> -	int i;
> +	int i, ret = 0;
>   
>   	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
>   	    iommu_deferred_attach(dev, domain))
> @@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		s_length = iova_align(iovad, s_length + s_iova_off);
>   		s->length = s_length;
>   
> +		if (is_pci_p2pdma_page(sg_page(s))) {
> +			if (sg_page(s)->pgmap != pgmap) {
> +				pgmap = sg_page(s)->pgmap;
> +				map_type = pci_p2pdma_map_type(pgmap, dev,
> +							       attrs);
> +			}
> +
> +			switch (map_type) {
> +			case PCI_P2PDMA_MAP_BUS_ADDR:
> +				/*
> +				 * A zero length will be ignored by
> +				 * iommu_map_sg() and then can be detected
> +				 * in __finalise_sg() to actually map the
> +				 * bus address.
> +				 */
> +				s->length = 0;
> +				continue;
> +			case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> +				break;
> +			default:
> +				ret = -EREMOTEIO;
> +				goto out_restore_sg;
> +			}
> +		}
> +
>   		/*
>   		 * Due to the alignment of our single IOVA allocation, we can
>   		 * depend on these assumptions about the segment boundary mask:
> @@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		prev = s;
>   	}
>   
> +	if (!iova_len)
> +		return __finalise_sg(dev, sg, nents, 0);
> +

ohhh, we're really slicing up this function pretty severely, what with the
continue and the early out and several other control flow changes. I think
it would be better to spend some time factoring this function into two
cases, now that you're adding a second case for PCI P2PDMA. Roughly,
two subroutines would do it.

As it is, this leaves behind a routine that is extremely hard to mentally
verify as correct.


thanks,
-- 
John Hubbard
NVIDIA

>   	iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
>   	if (!iova)
>   		goto out_restore_sg;
> @@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	iommu_dma_free_iova(cookie, iova, iova_len, NULL);
>   out_restore_sg:
>   	__invalidate_sg(sg, nents);
> -	return 0;
> +	return ret;
>   }
>   
>   static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	dma_addr_t start, end;
> +	dma_addr_t end, start = DMA_MAPPING_ERROR;
>   	struct scatterlist *tmp;
>   	int i;
>   
> @@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   	 * The scatterlist segments are mapped into a single
>   	 * contiguous IOVA allocation, so this is incredibly easy.
>   	 */
> -	start = sg_dma_address(sg);
> -	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
> +	for_each_sg(sg, tmp, nents, i) {
> +		if (sg_is_pci_p2pdma(tmp)) {
> +			sg_unmark_pci_p2pdma(tmp);
> +			continue;
> +		}
>   		if (sg_dma_len(tmp) == 0)
>   			break;
> -		sg = tmp;
> +
> +		if (start == DMA_MAPPING_ERROR)
> +			start = sg_dma_address(tmp);
> +
> +		end = sg_dma_address(tmp) + sg_dma_len(tmp);
>   	}
> -	end = sg_dma_address(sg) + sg_dma_len(sg);
> -	__iommu_dma_unmap(dev, start, end - start);
> +
> +	if (start != DMA_MAPPING_ERROR)
> +		__iommu_dma_unmap(dev, start, end - start);
>   }
>   
>   static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
> @@ -1254,6 +1303,7 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
>   }
>   
>   static const struct dma_map_ops iommu_dma_ops = {
> +	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
>   	.alloc			= iommu_dma_alloc,
>   	.free			= iommu_dma_free,
>   	.alloc_pages		= dma_common_alloc_pages,
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  2021-04-08 17:01 ` [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
@ 2021-05-03  1:29   ` John Hubbard
  2021-05-03 17:17     ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03  1:29 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
> replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
> flags can be checked for PCI P2PDMA support.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/nvme/host/core.c |  3 ++-
>   drivers/nvme/host/nvme.h |  2 +-
>   drivers/nvme/host/pci.c  | 11 +++++++++--
>   3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0896e21642be..223419454516 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3907,7 +3907,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
>   		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
>   
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> -	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +	if (ctrl->ops->supports_pci_p2pdma &&
> +	    ctrl->ops->supports_pci_p2pdma(ctrl))

This is a little excessive, as I suspected. How about providing a
default .supports_pci_p2pdma routine that returns false, so that
the op is always available (non-null)? By "default", maybe that
means either requiring an init_the_ops_struct() routine to be
used, and/or checking all the users of struct nvme_ctrl_ops.

Another idea: maybe you don't really need a bool .supports_pci_p2pdma()
routine at all, because the existing .flags really is about right.
You just need the flags to be filled in dynamically. So, do that
during nvme_pci setup/init time: that's when this module would call
dma_pci_p2pdma_supported().

Actually, I think that second idea simplifies things quite a
bit, but only if it's possible. I haven't worked through the
startup order of calls in nvme_pci.

thanks,
-- 
John Hubbard
NVIDIA

>   		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
>   
>   	ns->queue->queuedata = ns;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 07b34175c6ce..9c04df982d2c 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -473,7 +473,6 @@ struct nvme_ctrl_ops {
>   	unsigned int flags;
>   #define NVME_F_FABRICS			(1 << 0)
>   #define NVME_F_METADATA_SUPPORTED	(1 << 1)
> -#define NVME_F_PCI_P2PDMA		(1 << 2)
>   	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>   	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>   	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> @@ -481,6 +480,7 @@ struct nvme_ctrl_ops {
>   	void (*submit_async_event)(struct nvme_ctrl *ctrl);
>   	void (*delete_ctrl)(struct nvme_ctrl *ctrl);
>   	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
> +	bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
>   };
>   
>   #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 7249ae74f71f..14f092973792 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2759,17 +2759,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
>   	return snprintf(buf, size, "%s\n", dev_name(&pdev->dev));
>   }
>   
> +static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_dev *dev = to_nvme_dev(ctrl);
> +
> +	return dma_pci_p2pdma_supported(dev->dev);
> +}
> +
>   static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
>   	.name			= "pcie",
>   	.module			= THIS_MODULE,
> -	.flags			= NVME_F_METADATA_SUPPORTED |
> -				  NVME_F_PCI_P2PDMA,
> +	.flags			= NVME_F_METADATA_SUPPORTED,
>   	.reg_read32		= nvme_pci_reg_read32,
>   	.reg_write32		= nvme_pci_reg_write32,
>   	.reg_read64		= nvme_pci_reg_read64,
>   	.free_ctrl		= nvme_pci_free_ctrl,
>   	.submit_async_event	= nvme_pci_submit_async_event,
>   	.get_address		= nvme_pci_get_address,
> +	.supports_pci_p2pdma	= nvme_pci_supports_pci_p2pdma,
>   };
>   
>   static int nvme_dev_map(struct nvme_dev *dev)
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  2021-04-08 17:01 ` [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages Logan Gunthorpe
@ 2021-05-03  1:34   ` John Hubbard
  2021-05-03 17:19     ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03  1:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Convert to using dma_map_sg_p2pdma() for PCI p2pdma pages.
> 
> This should be equivalent but allows for heterogeneous scatterlists
> with both P2PDMA and regular pages. However, P2PDMA support will be
> slightly more restricted (only dma-direct and dma-iommu are currently
> supported).
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/nvme/host/pci.c | 28 ++++++++--------------------
>   1 file changed, 8 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 14f092973792..a1ed07ff38b7 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -577,17 +577,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
>   
>   }
>   
> -static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
> -{
> -	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -
> -	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> -		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
> -				    rq_dma_dir(req));
> -	else
> -		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
> -}
> -
>   static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> @@ -600,7 +589,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   
>   	WARN_ON_ONCE(!iod->nents);
>   
> -	nvme_unmap_sg(dev, req);
> +	dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));


Nice simplification!


>   	if (iod->npages == 0)
>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>   			      iod->first_dma);
> @@ -868,14 +857,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	if (!iod->nents)
>   		goto out_free_sg;
>   
> -	if (is_pci_p2pdma_page(sg_page(iod->sg)))
> -		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> -				iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> -	else
> -		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> -					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
> -	if (!nr_mapped)
> +	nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
> +				     rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +	if (nr_mapped < 0) {
> +		if (nr_mapped != -ENOMEM)
> +			ret = BLK_STS_TARGET;
>   		goto out_free_sg;
> +	}

But now the "nr_mapped == 0" case is no longer doing an early out_free_sg.
Is that OK?

>   
>   	iod->use_sgl = nvme_pci_use_sgls(dev, req);
>   	if (iod->use_sgl)
> @@ -887,7 +875,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	return BLK_STS_OK;
>   
>   out_unmap_sg:
> -	nvme_unmap_sg(dev, req);
> +	dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
>   out_free_sg:
>   	mempool_free(iod->sg, dev->iod_mempool);
>   	return ret;
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma
  2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
  2021-04-27 19:47   ` Jason Gunthorpe
@ 2021-05-03  1:37   ` John Hubbard
  1 sibling, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03  1:37 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> Ensure the dma operations support p2pdma before using the RDMA
> device for P2PDMA. This allows switching the RDMA driver from
> pci_p2pdma_map_sg() to dma_map_sg_p2pdma().

Tentatively, this looks right, but it really should be combined
with a following patch that uses it. Then you don't have to try
to explain, above, why it's needed. :)

> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/nvme/target/rdma.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 6c1f3ab7649c..3ec7e77e5416 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
>   	if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
>   		goto out_free_rsp;
>   
> -	if (!ib_uses_virt_dma(ndev->device))
> +	if (!ib_uses_virt_dma(ndev->device) &&
> +	    dma_pci_p2pdma_supported(&ndev->device->dev))
>   		r->req.p2p_client = &ndev->device->dev;
>   	r->send_sge.length = sizeof(*r->req.cqe);
>   	r->send_sge.lkey = ndev->pd->local_dma_lkey;
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-02  3:58   ` John Hubbard
@ 2021-05-03 15:57     ` Logan Gunthorpe
  2021-05-03 18:17       ` John Hubbard
  2021-05-11 16:05     ` Don Dutile
  1 sibling, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 15:57 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas



On 2021-05-01 9:58 p.m., John Hubbard wrote:
> Another odd thing: this used to check for memory failure and just give
> up, and now it doesn't. Yes, I realize that it all still works at the
> moment, but this is quirky and we shouldn't stop here.
> 
> Instead, a cleaner approach would be to push the memory allocation
> slightly higher up the call stack, out to the
> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
> the kmalloc() call, and fail out if it can't get a page for the seq_buf
> buffer. Then you don't have to do all this odd stuff.

I don't really agree with this assessment. If kmalloc fails to
initialize the seq_buf() (which should be very rare), the only thing
that is lost is the one warning print that tells the user the command
line parameter needed disable the ACS. Everything else works fine,
nothing else can fail. I don't see the need to add extra complexity just
so the code errors out in no-mem instead of just skipping the one,
slightly more informative, warning line.

Also, keep in mind the result of all these functions are cached so it
only ever happens once. So for this to matter, the user would have to do
their first transaction between two devices exactly at the time memory
allocations would fail.


> Furthermore, the call sites can then decide for themselves which GFP
> flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().
> 
> A related thing: this whole exercise would go better if there were a
> preparatory patch or two that changed the return codes in this file to
> something less crazy. There are too many functions that can fail, but
> are treated as if they sort-of-mostly-would-never-fail, in the hopes of
> using the return value directly for counting and such. This is badly
> mistaken, and it leads developers to try to avoid returning -ENOMEM
> (which is what we need here).

Hmm? Which functions can fail? and how?

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-02  5:35   ` John Hubbard
@ 2021-05-03 16:08     ` Logan Gunthorpe
  2021-05-03 18:20       ` John Hubbard
  2021-05-03 18:25       ` Christoph Hellwig
  2021-05-11 16:05     ` Don Dutile
  1 sibling, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:08 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-01 11:35 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> In order to use upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
>> might sleep.
>>
>> In order to avoid this, try to get the host bridge's device from
>> bus->self, and if that is not set, just get the first element in the
>> device list. It should be impossible for the host bridge's device to
>> go away while references are held on child devices, so the first element
>> should not be able to change and, thus, this should be safe.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bd89437faf06..473a08940fbc 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
>>   static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>>   				    bool same_host_bridge)
>>   {
>> -	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>>   	const struct pci_p2pdma_whitelist_entry *entry;
>> +	struct pci_dev *root = host->bus->self;
>>   	unsigned short vendor, device;
>>   
>> +	/*
>> +	 * This makes the assumption that the first device on the bus is the
>> +	 * bridge itself and it has the devfn of 00.0. This assumption should
>> +	 * hold for the devices in the white list above, and if there are cases
>> +	 * where this isn't true they will have to be dealt with when such a
>> +	 * case is added to the whitelist.
> 
> Actually, it makes the assumption that the first device *in the list*
> (the host->bus-devices list) is 00.0.  The previous code made the
> assumption that you wrote.

The comment notes two assumptions (although the grammar is poor, which I
will fix). Yes, the previous code made the second assumption, the new
code makes both assumptions.

> By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems
> really short. From a naive point of view, I'd expect that there must be
> a lot more CPUs/chipsets that can do pci p2p, what do you think? I
> wonder if we have to be so super strict, anyway. It just seems extremely
> limited, and I suspect there will be some additions to the list as soon
> as we start to use this.

Yes, well unfortunately we have no other way to determine what host
bridges can communicate with P2P. We settled on a whitelist when the
series was first patch. Nobody likes that situation, but nobody has
found anything better. We've been hoping standards bodies would give us
a flag but I haven't heard anything about that. At least AMD has been
able to guarantee us that all CPUs newer than Zen will support so that
covers a large swath. It would be nice if we could say something similar
for Intel.

> OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating.
> Why is it OK to avoid taking any locks in order to retrieve the
> first entry from the list, but in order to retrieve any other entry, you
> have to aquire the pci_bus_sem, and get a reference as well? Something
> is inconsistent there.
> 
> The new version here also no longer takes a reference on the device,
> which is also cheating. But I'm guessing that the unstated assumption
> here is that there is always at least one entry in the list. But if
> that's true, then it's better to show clearly that assumption, instead
> of hiding it in an implicit call that skips both locking and reference
> counting.

Because we hold a reference to a child device of the bus. So the host
bus device can't go away until the child device has been released. An
earlier version of the P2PDMA patchset had a lot more extraneous get
device calls until someone else pointed this out.

> You could add a new function, which is a cut-down version of pci_get_slot(),
> like this, and call this from __host_bridge_whitelist():
> 
> /*
>   * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem
>   * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is
>   * up, it is safe to call this, because there will always be a top-level PCI
>   * root device.
>   *
>   * Other assumptions: the root device is the first device in the list, and the
>   * root device is numbered 00.0.
>   */
> struct pci_dev *pci_get_root_slot(struct pci_bus *bus)
> {
> 	struct pci_dev *root;
> 	unsigned devfn = PCI_DEVFN(0, 0);
> 
> 	root = list_first_entry_or_null(&bus->devices, struct pci_dev,
> 					bus_list);
> 	if (root->devfn == devfn)
> 		goto out;
> 
> 	root = NULL;
>   out:
> 	pci_dev_get(root);
> 	return root;
> }
> EXPORT_SYMBOL(pci_get_root_slot);
> 
> ...I think that's a lot clearer to the reader, about what's going on here.

Per above, I think the reference count is unnecessary. But I could wrap
it in a static function for clarity. (There's no reason to export this
function).

> Note that I'm not really sure if it *is* safe, I would need to ask other
> PCIe subsystem developers with more experience. But I don't think anyone
> is trying to make p2pdma calls so early that PCIe buses are uninitialized.

Yeah, it's impossible to make a p2pdma call before the PCIe bus is
initialized. They have to have access to at least one PCI device before
they can even attempt it.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-05-02 19:58   ` John Hubbard
@ 2021-05-03 16:17     ` Logan Gunthorpe
  2021-05-03 18:22       ` John Hubbard
  2021-05-03 18:35       ` Christoph Hellwig
  2021-05-11 16:05     ` Don Dutile
  1 sibling, 2 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:17 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 1:58 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> Attempt to find the mapping type for P2PDMA pages on the first
>> DMA map attempt if it has not been done ahead of time.
>>
>> Previously, the mapping type was expected to be calculated ahead of
>> time, but if pages are to come from userspace then there's no
>> way to ensure the path was checked ahead of time.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c | 12 +++++++++---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 473a08940fbc..2574a062a255 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -825,11 +825,18 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>>   static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
>>   						    struct pci_dev *client)
>>   {
>> +	enum pci_p2pdma_map_type ret;
>> +
>>   	if (!provider->p2pdma)
>>   		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>>   
>> -	return xa_to_value(xa_load(&provider->p2pdma->map_types,
>> -				   map_types_idx(client)));
>> +	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
>> +				  map_types_idx(client)));
>> +	if (ret != PCI_P2PDMA_MAP_UNKNOWN)
>> +		return ret;
>> +
>> +	return upstream_bridge_distance_warn(provider, client, NULL,
>> +					     GFP_ATOMIC);
> 
> Returning a "bridge distance" from a "get map type" routine is jarring,
> and I think it is because of a pre-existing problem: the above function
> is severely misnamed. Let's try renaming it (and the other one) to
> approximately:
> 
>      upstream_bridge_map_type_warn()
>      upstream_bridge_map_type()
> 
> ...and that should fix that. Well, that, plus tweaking the kernel doc
> comments, which are also confused. I think someone started off thinking
> about distances through PCIe, but in the end, the routine boils down to
> just a few situations that are not distances at all.
> 
> Also, the above will read a little better if it is written like this:
> 
> 	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
> 				  map_types_idx(client)));
> 
> 	if (ret == PCI_P2PDMA_MAP_UNKNOWN)
> 		ret = upstream_bridge_map_type_warn(provider, client, NULL,
> 						    GFP_ATOMIC);
> 	
> 	return ret;
> 
> 
>>   }

I agree that some of this has evolved in a way that some of the names
are a bit odd now. Could definitely use a cleanup, but that's not really
part of this series. When I have some time I can look at doing a cleanup
series to help with some of this.

>>   static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
>> @@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>   	case PCI_P2PDMA_MAP_BUS_ADDR:
>>   		return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>>   	default:
>> -		WARN_ON_ONCE(1);
> 
> Why? Or at least, why, in this patch? It looks like an accidental
> leftover from something, seeing as how it is not directly related to the
> patch, and is not mentioned at all.

Before this patch, it was required that users of P2PDMA call
pci_p2pdma_distance_many() in some form before calling
pci_p2pdma_map_sg(). So, by convention, a usable map type had to already
be in the cache. The warning was there to yell at anyone who wrote code
that violated that convention.

This patch removes that convention and allows users to map P2PDMA pages
sight unseen and if the mapping type isn't in the cache, then it will
determine the mapping type at dma mapping time. Thus, the warning can be
removed and the function can fail normally if the mapping is unsupported.

Logan


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-02 20:41   ` John Hubbard
@ 2021-05-03 16:30     ` Logan Gunthorpe
  2021-05-03 18:31       ` John Hubbard
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:30 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 2:41 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
>> struct device (of the client doing the DMA transfer). Thus move the
>> conversion to struct pci_devs for the provider and client into this
>> function.
> 
> Actually, this is the wrong direction to go! All of these pre-existing
> pci_*() functions have a small problem already: they are dealing with
> struct device, instead of struct pci_dev. And so refactoring should be
> pushing the conversion to pci_dev *up* the calling stack, not lower as
> the patch here proposes.
> 
> Also, there is no improvement in clarity by passing in (pgmap, dev)
> instead of the previous (provider, client). Now you have to do more type
> checking in the leaf function, which is another indication of a problem.
> 
> Let's go that direction, please? Just convert to pci_dev much higher in
> the calling stack, and you'll find that everything fits together better.
> And it's OK to pass in extra params if that turns out to be necessary,
> after all.

No, I disagree with this and it seems a bit confused. This change is
allowing callers to call the function with what they have and doing more
checks inside the called function. This allows for *less* checks in the
leaf function, not more checks. (I mean, look at the patch itself, it
puts a bunch of checks in both call sites into the callee and makes the
code a lot cleaner -- it's removing more lines than it adds).

Similar argument can be made with the pci_p2pdma_distance_many() (which
I assume you are referring to). If the function took struct pci_dev
instead of struct device, every caller would need to do all checks and
conversions to struct pci_dev. That is not an improvement.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-05-02 21:23   ` John Hubbard
@ 2021-05-03 16:38     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:38 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 3:23 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> dma_map_sg() either returns a positive number indicating the number
>> of entries mapped or zero indicating that resources were not available
>> to create the mapping. When zero is returned, it is always safe to retry
>> the mapping later once resources have been freed.
>>
>> Once P2PDMA pages are mixed into the SGL there may be pages that may
>> never be successfully mapped with a given device because that device may
>> not actually be able to access those pages. Thus, multiple error
>> conditions will need to be distinguished to determine weather a retry
>> is safe.
>>
>> Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
>> convention from dma_map_sg(). The function will return a positive
>> integer on success or a negative errno on failure.
>>
>> ENOMEM will be used to indicate a resource failure and EREMOTEIO to
>> indicate that a P2PDMA page is not mappable.
>>
>> The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
>> level implementations that P2PDMA pages are allowed and to warn if a
>> caller introduces them into the regular dma_map_sg() interface.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   include/linux/dma-mapping.h | 15 +++++++++++
>>   kernel/dma/mapping.c        | 52 ++++++++++++++++++++++++++++++++-----
>>   2 files changed, 61 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 2a984cb4d1e0..50b8f586cf59 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -60,6 +60,12 @@
>>    * at least read-only at lesser-privileged levels).
>>    */
>>   #define DMA_ATTR_PRIVILEGED		(1UL << 9)
>> +/*
>> + * __DMA_ATTR_PCI_P2PDMA: This should not be used directly, use
>> + * dma_map_sg_p2pdma() instead. Used internally to indicate that the
>> + * caller is using the dma_map_sg_p2pdma() interface.
>> + */
>> +#define __DMA_ATTR_PCI_P2PDMA		(1UL << 10)
>>
> 
> As mentioned near the top of this file,
> Documentation/core-api/dma-attributes.rst also needs to be updated, for
> this new item.

As this attribute is not meant to be used by anyone outside the dma
functions, I don't think it should be documented here. (That's why it
has a double underscource prefix).

>>   /*
>>    * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
>> @@ -107,6 +113,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>>   		enum dma_data_direction dir, unsigned long attrs);
>>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>>   		enum dma_data_direction dir, unsigned long attrs);
>> +int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
>> +		int nents, enum dma_data_direction dir, unsigned long attrs);
>>   void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>>   				      int nents, enum dma_data_direction dir,
>>   				      unsigned long attrs);
>> @@ -160,6 +168,12 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>   {
>>   	return 0;
>>   }
>> +static inline int dma_map_sg_p2pdma_attrs(struct device *dev,
>> +		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>> +		unsigned long attrs)
>> +{
>> +	return 0;
>> +}
>>   static inline void dma_unmap_sg_attrs(struct device *dev,
>>   		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>>   		unsigned long attrs)
>> @@ -392,6 +406,7 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
>>   #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
>>   #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
>>   #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
>> +#define dma_map_sg_p2pdma(d, s, n, r) dma_map_sg_p2pdma_attrs(d, s, n, r, 0)
> 
> This hunk is fine, of course.
> 
> But, about pre-existing issues: note to self, or to anyone: send a patch to turn
> these into inline functions. The macro redirection here is not adding value, but
> it does make things just a little bit worse.
> 
> 
>>   #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
>>   #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
>>   #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>> index b6a633679933..923089c4267b 100644
>> --- a/kernel/dma/mapping.c
>> +++ b/kernel/dma/mapping.c
>> @@ -177,12 +177,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>>   }
>>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>>   
>> -/*
>> - * dma_maps_sg_attrs returns 0 on error and > 0 on success.
>> - * It should never return a value < 0.
>> - */
> 
> It would be better to leave the comment in place, given the non-standard
> return values. However, looking around here, it would be better if we go
> with the standard -ERRNO for error, and >0 for sucess.

The comment is actually left in place. The diff just makes it look like
it was removed. It is added back lower down in the diff.

> There are pre-existing BUG_ON() and WARN_ON_ONCE() items that are partly
> an attempt to compensate for not being able to return proper -ERRNO
> codes. For example, this:
> 
> 	    BUG_ON(!valid_dma_direction(dir));
> 
> ...arguably should be more like this:
> 
>          if(WARN_ON_ONCE(!valid_dma_direction(dir)))
>                  return -EINVAL;

Yes, but you'll have to see the discussion in the RFC. The complaint was
that the calling convention for dma_map_sg() is not expected to return
anything other than 0 or the number of entries mapped. It can't return a
negative error code. That's why BUG_ON(ents < 0) is in the existing
code. That's also why this series introduces the new dma_map_sg_p2pdma()
function. (Though, Jason has made some suggestions to further change this).

> 
>> -int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>> -		enum dma_data_direction dir, unsigned long attrs)
>> +static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>> +		int nents, enum dma_data_direction dir, unsigned long attrs)
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   	int ents;
>> @@ -197,6 +193,20 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>>   	else
>>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
>> +
>> +	return ents;
>> +}
>> +
>> +/*
>> + * dma_maps_sg_attrs returns 0 on error and > 0 on success.
>> + * It should never return a value < 0.
>> + */
>> +int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>> +		enum dma_data_direction dir, unsigned long attrs)
>> +{
>> +	int ents;
> 
> Pre-existing note, feel free to ignore: the ents and nents in the same
> routines together, are way too close to the each other in naming. Maybe
> using "requested_nents", or "nents_arg", for the incoming value, would
> help.

Ok, will change.

>> +
>> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>>   	BUG_ON(ents < 0);
>>   	debug_dma_map_sg(dev, sg, nents, ents, dir);
>>   
>> @@ -204,6 +214,36 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>>   }
>>   EXPORT_SYMBOL(dma_map_sg_attrs);
>>   
>> +/*
>> + * like dma_map_sg_attrs, but returns a negative errno on error (and > 0
>> + * on success). This function must be used if PCI P2PDMA pages might
>> + * be in the scatterlist.
> 
> Let's turn this into a kernel doc comment block, seeing as how it clearly
> wants to be--you're almost there already. You've even reinvented @Return,
> below. :)

Just trying to follow the convention in this file. But I can make it a
kernel doc.

>> + *
>> + * On error this function may return:
>> + *    -ENOMEM indicating that there was not enough resources available and
>> + *      the transfer may be retried later
>> + *    -EREMOTEIO indicating that P2PDMA pages were included but cannot
>> + *      be mapped by the specified device, retries will always fail
>> + *
>> + * The scatterlist should be unmapped with the regular dma_unmap_sg[_attrs]().
> 
> How about:
> 
> "The scatterlist should be unmapped via dma_unmap_sg[_attrs]()."

Ok

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  2021-05-02 22:44   ` John Hubbard
@ 2021-05-03 16:39     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:39 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 4:44 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
>> implementation because it will need to determine the mapping type
>> ahead of actually doing the mapping to create the actual iommu mapping.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c       | 34 +++++++++++++++++++++++-----------
>>   include/linux/pci-p2pdma.h | 15 +++++++++++++++
>>   2 files changed, 38 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bcb1a6d6119d..38c93f57a941 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -20,13 +20,6 @@
>>   #include <linux/seq_buf.h>
>>   #include <linux/xarray.h>
>>   
>> -enum pci_p2pdma_map_type {
>> -	PCI_P2PDMA_MAP_UNKNOWN = 0,
>> -	PCI_P2PDMA_MAP_NOT_SUPPORTED,
>> -	PCI_P2PDMA_MAP_BUS_ADDR,
>> -	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
>> -};
>> -
>>   struct pci_p2pdma {
>>   	struct gen_pool *pool;
>>   	bool p2pmem_published;
>> @@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>>   }
>>   EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>>   
>> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>> -						    struct device *dev)
>> +/**
>> + * pci_p2pdma_map_type - return the type of mapping that should be used for
>> + *	a given device and pgmap
>> + * @pgmap: the pagemap of a page to determine the mapping type for
>> + * @dev: device that is mapping the page
>> + * @dma_attrs: the attributes passed to the dma_map operation --
>> + *	this is so they can be checked to ensure P2PDMA pages were not
>> + *	introduced into an incorrect interface (like dma_map_sg). *
>> + *
>> + * Returns one of:
>> + *	PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
>> + *	PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
>> + *	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
>> + */
>> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>> +		struct device *dev, unsigned long dma_attrs)
>>   {
>>   	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>>   	enum pci_p2pdma_map_type ret;
>>   	struct pci_dev *client;
>>   
>> +	WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
>> +		  "PCI P2PDMA pages were mapped with dma_map_sg!");
> 
> This really ought to also return -EINVAL, assuming that my review suggestions
> about return types, in earlier patches, are acceptable.

That can't happen because, by convention, dma_map_sg() cannot return
-EINVAL. I think the best we can do is proceed normally and just warn
loudly.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-02 23:28   ` John Hubbard
  2021-05-02 23:32     ` John Hubbard
@ 2021-05-03 16:55     ` Logan Gunthorpe
  2021-05-04  0:12       ` John Hubbard
  2021-05-03 17:04     ` Logan Gunthorpe
  2 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 16:55 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 5:28 p.m., John Hubbard wrote:
>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> 
> This routine now deserves a little bit of commenting, now that it is
> doing less obvious things. How about something like this:
> 
> /*
>   * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the
>   * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the
>   * SG_PCI_P2PDMA mark
>   */
> void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> 		int nents, enum dma_data_direction dir, unsigned long attrs)
> {
> 

Ok.

>>   	struct scatterlist *sg;
>>   	int i;
>>   
>> -	for_each_sg(sgl, sg, nents, i)
>> +	for_each_sg(sgl, sg, nents, i) {
>> +		if (sg_is_pci_p2pdma(sg)) {
>> +			sg_unmark_pci_p2pdma(sg);
>> +			continue;
>> +		}
>> +
>>   		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>>   			     attrs);
>> +	}
> 
> The same thing can be achieved with fewer lines and a bit more clarity.
> Can we please do it like this instead:
> 
> 	for_each_sg(sgl, sg, nents, i) {
> 		if (sg_is_pci_p2pdma(sg))
> 			sg_unmark_pci_p2pdma(sg);
> 		else
> 			dma_direct_unmap_page(dev, sg->dma_address,
> 					      sg_dma_len(sg), dir, attrs);
> 	}
> 
> 

That's debatable (the way I did it emphasizes the common case). But I'll
consider changing it.

> 
> Also here, a block comment for the function would be nice. How about
> approximately this:
> 
> /*
>   * Maps each SG segment. Returns the number of entries mapped, or 0 upon
>   * failure. If any entry could not be mapped, then no entries are mapped.
>   */
> 
> I'll stop complaining about the pre-existing return code conventions,
> since by now you know what I was thinking of saying. :)

Not really part of this patchset... Seems like if you think there should
be a comment like that here, you should send a patch. But this patch
starts returning a negative value here.

>>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>   		enum dma_data_direction dir, unsigned long attrs)
>>   {
>> -	int i;
>> +	struct pci_p2pdma_map_state p2pdma_state = {};
> 
> Is it worth putting this stuff on the stack--is there a noticeable
> performance improvement from caching the state? Because if it's
> invisible, then simplicity is better. I suspect you're right, and that
> it *is* worth it, but it's good to know for real.
> 
>>   	struct scatterlist *sg;
>> +	int i, ret = 0;
>>   
>>   	for_each_sg(sgl, sg, nents, i) {
>> +		if (is_pci_p2pdma_page(sg_page(sg))) {
>> +			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
>> +						     attrs);
>> +			if (ret < 0) {
>> +				goto out_unmap;
>> +			} else if (ret) {
>> +				ret = 0;
>> +				continue;
> 
> Is this a bug? If neither of those "if" branches fires (ret == 0), then
> the code (probably unintentionally) falls through and continues on to
> attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page!

No, it's not a bug. Per the documentation of pci_p2pdma_map_segment(),
if it returns zero the segment should be mapped normally. P2PDMA pages
must be mapped with physical addresses (or IOVA addresses) if the TLPS
for the transaction will go through the host bridge.

> See below for suggestions:
> 
>> +			}
>> +		}
>> +
>>   		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
>>   				sg->offset, sg->length, dir, attrs);
>>   		if (sg->dma_address == DMA_MAPPING_ERROR)
> 
> This is another case in which "continue" is misleading and not as good
> as "else". Because unless I'm wrong above, you really only want to take
> one path *or* the other.

No, per above, it's not one path or the other. If it's a P2PDMA page it
may still need to be mapped normally.

> Also, the "else if (ret)" can be simplified to just setting ret = 0
> unconditionally.

I don't follow. If ret is set, we need to unset it before the end of the
loop.

> Given all that, here's a suggested alternative, which is both shorter
> and clearer, IMHO:
> 
> 	for_each_sg(sgl, sg, nents, i) {
> 		if (is_pci_p2pdma_page(sg_page(sg))) {
> 			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
> 						     attrs);
> 			if (ret < 0)
> 				goto out_unmap;
> 			else
> 				ret = 0;
> 		} else {
> 			sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
> 					sg->offset, sg->length, dir, attrs);
> 			if (sg->dma_address == DMA_MAPPING_ERROR)
> 				goto out_unmap;
> 			sg_dma_len(sg) = sg->length;
> 		}
> 	}

No, per the comments above, this does not accomplish the same thing and
is not correct.

I'll try to add a comment to the code to make it more clearer. But the
kernel doc on pci_p2pdma_map_segment() does mention what must be done
for different return values explicitly.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-02 23:28   ` John Hubbard
  2021-05-02 23:32     ` John Hubbard
  2021-05-03 16:55     ` Logan Gunthorpe
@ 2021-05-03 17:04     ` Logan Gunthorpe
  2021-05-04  0:01       ` John Hubbard
  2 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:04 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

Oops missed a comment:

On 2021-05-02 5:28 p.m., John Hubbard wrote:
>>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>   		enum dma_data_direction dir, unsigned long attrs)
>>   {
>> -	int i;
>> +	struct pci_p2pdma_map_state p2pdma_state = {};
> 
> Is it worth putting this stuff on the stack--is there a noticeable
> performance improvement from caching the state? Because if it's
> invisible, then simplicity is better. I suspect you're right, and that
> it *is* worth it, but it's good to know for real.

I haven't measured it (it would be hard to measure), but I think it's
fairly clear here. Without the state, xa_load() would need to be called
on *every* page in an SGL that maps only P2PDMA memory from one device.
With the state, it only needs to be called once. xa_load() is cheap, but
it is not that cheap.

There's essentially the same optimization in get_user_pages for
ZONE_DEVICE pages. So, if it is necessary there, it should be necessary
here.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-02 23:32     ` John Hubbard
@ 2021-05-03 17:06       ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:06 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 5:32 p.m., John Hubbard wrote:
> On 5/2/21 4:28 PM, John Hubbard wrote:
>> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
> ...
>>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>
>> This routine now deserves a little bit of commenting, now that it is
>> doing less obvious things. How about something like this:
>>
>> /*
>> * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the
>> * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the
>> * SG_PCI_P2PDMA mark
>> */
> 
> I got that kind of wrong. They *were* mapped, but need to be left mostly
> alone...maybe you can word it better. Here's my second draft:
> 
> /*
>   * Unmaps pages, except for PCI_P2PDMA pages, which should not be unmapped at
>   * this point. Instead of unmapping PCI_P2PDMA entries, simply remove the
>   * SG_PCI_P2PDMA mark.
>   */
> 
> ...am I getting close? :)

I don't think your original comment was wrong per se. But I guess it
depends on your definition of "mapped". In dma-direct the physical
address is added to the SGL and, on some arches, the address has to be
synced on unmap. With P2PDMA, the PCI bus address is sometimes added to
the SGL and no sync is necessary at the end.

Logan


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  2021-05-03  0:32   ` John Hubbard
@ 2021-05-03 17:09     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:09 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 6:32 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> Add a flags member to the dma_map_ops structure with one flag to
>> indicate support for PCI P2PDMA.
>>
>> Also, add a helper to check if a device supports PCI P2PDMA.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   include/linux/dma-map-ops.h |  3 +++
>>   include/linux/dma-mapping.h |  5 +++++
>>   kernel/dma/mapping.c        | 18 ++++++++++++++++++
>>   3 files changed, 26 insertions(+)
>>
>> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
>> index 51872e736e7b..481892822104 100644
>> --- a/include/linux/dma-map-ops.h
>> +++ b/include/linux/dma-map-ops.h
>> @@ -12,6 +12,9 @@
>>   struct cma;
>>   
>>   struct dma_map_ops {
>> +	unsigned int flags;
>> +#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
>> +
> 
> Can we move this up and out of the struct member area, so that it looks
> more like this:
> 
> /*
>   * Values for struct dma_map_ops.flags:
>   *
>   * DMA_F_PCI_P2PDMA_SUPPORTED: <documentation here...this is a good place to
>   * explain exactly what this flag is for.>
>   */
> #define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
> 
> struct dma_map_ops {
> 	unsigned int flags;
> 

Sure, I don't care that much. I was just following the style in nvme.h.

>>   	void *(*alloc)(struct device *dev, size_t size,
>>   			dma_addr_t *dma_handle, gfp_t gfp,
>>   			unsigned long attrs);
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 50b8f586cf59..c31980ecca62 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>>   		unsigned long attrs);
>>   bool dma_can_mmap(struct device *dev);
>>   int dma_supported(struct device *dev, u64 mask);
>> +bool dma_pci_p2pdma_supported(struct device *dev);
>>   int dma_set_mask(struct device *dev, u64 mask);
>>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>>   u64 dma_get_required_mask(struct device *dev);
>> @@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 mask)
>>   {
>>   	return 0;
>>   }
>> +static inline bool dma_pci_p2pdma_supported(struct device *dev)
>> +{
>> +	return 0;
> 
> Should be:
> 	
> 	return false;

Yup, will fix.

>> +}
>>   static inline int dma_set_mask(struct device *dev, u64 mask)
>>   {
>>   	return -EIO;
>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>> index 923089c4267b..ce44a0fcc4e5 100644
>> --- a/kernel/dma/mapping.c
>> +++ b/kernel/dma/mapping.c
>> @@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
>>   }
>>   EXPORT_SYMBOL(dma_supported);
>>   
>> +bool dma_pci_p2pdma_supported(struct device *dev)
>> +{
>> +	const struct dma_map_ops *ops = get_dma_ops(dev);
>> +
>> +	/* if ops is not set, dma direct will be used which supports P2PDMA */
>> +	if (!ops)
>> +		return true;
>> +
>> +	/*
>> +	 * Note: dma_ops_bypass is not checked here because P2PDMA should
>> +	 * not be used with dma mapping ops that do not have support even
>> +	 * if the specific device is bypassing them.
>> +	 */
>> +
>> +	return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
> 
> Wow, rather unusual combination of things in order decide this. It feels
> a bit over-complicated to have flags and ops and a bool function all
> dealing with the same 1-bit answer, but there is no caller shown here,
> so I'll have to come back to this after reviewing subsequent patches.

Yeah, I originally had it much simpler, but it confused Ira and it was
clear it had to be written more explicitly and commented better.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  2021-05-03  0:50   ` John Hubbard
@ 2021-05-03 17:15     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:15 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 6:50 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
>> implementations. It takes an scatterlist segment that must point to a
>> pci_p2pdma struct page and will map it if the mapping requires a bus
>> address.
>>
>> The return value indicates whether the mapping required a bus address
>> or whether the caller still needs to map the segment normally. If the
>> segment should not be mapped, -EREMOTEIO is returned.
>>
>> This helper uses a state structure to track the changes to the
>> pgmap across calls and avoid needing to lookup into the xarray for
>> every page.
>>
> 
> OK, coming back to this patch, after seeing how it is used later in
> the series...
> 
>> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
>> dma_map_sg() implementations where the sg segment containing the page
>> differs from the sg segment containing the DMA address.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c       | 65 ++++++++++++++++++++++++++++++++++++++
>>   include/linux/pci-p2pdma.h | 21 ++++++++++++
>>   2 files changed, 86 insertions(+)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 38c93f57a941..44ad7664e875 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>>   }
>>   EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>>   
>> +/**
>> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
>> + * @state: State structure that should be declared on the stack outside of
>> + *	the for_each_sg() loop and initialized to zero.
> 
> Silly fine point for the docs here: it doesn't actually have to be on
> the stack, so I don't think you need to write that constraint in the
> documentation. It just has be be somehow allocated and zeroed.

Yeah, that's true, but there's really no reason it would ever not be
allocated on the stack.

> 
>> + * @dev: DMA device that's doing the mapping operation
>> + * @sg: scatterlist segment to map
>> + * @attrs: dma mapping attributes
>> + *
>> + * This is a helper to be used by non-iommu dma_map_sg() implementations where
>> + * the sg segment is the same for the page_link and the dma_address.
>> + *
>> + * Attempt to map a single segment in an SGL with the PCI bus address.
>> + * The segment must point to a PCI P2PDMA page and thus must be
>> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
> 
> Should this be backed up with actual checks in the function, that
> the prerequisites are met?

I think that would be unnecessary. All callers are going to call this
inside an is_pci_p2pdma_page() check, otherwise it would slow down the
fast path.

>> + *
>> + * Returns 1 if the segment was mapped, 0 if the segment should be mapped
>> + * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
>> + * be mapped at all.
>> + */
>> +int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
>> +			   struct device *dev, struct scatterlist *sg,
>> +			   unsigned long dma_attrs)
>> +{
>> +	if (state->pgmap != sg_page(sg)->pgmap) {
>> +		state->pgmap = sg_page(sg)->pgmap;
>> +		state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
>> +		state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
>> +	}
> 
> I'll quote myself from patch 9, because I had a comment there that actually
> was meant for this patch:
> 
> Is it worth putting this stuff on the caller's stack? I mean, is there a
> noticeable performance improvement from caching the state? Because if
> it's invisible, then simplicity is better. I suspect you're right, and
> that it *is* worth it, but it's good to know for real.

Yeah, I responded to this in another email.

> 
>> +
>> +	switch (state->map) {
>> +	case PCI_P2PDMA_MAP_BUS_ADDR:
>> +		sg->dma_address = sg_phys(sg) + state->bus_off;
>> +		sg_dma_len(sg) = sg->length;
>> +		sg_mark_pci_p2pdma(sg);
>> +		return 1;
>> +	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>> +		return 0;
>> +	default:
>> +		return -EREMOTEIO;
>> +	}
>> +}
>> +
>> +/**
>> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
>> + *	be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> 
> Or:
> 
>   * pci_p2pdma_map_bus_segment - map an SG segment that is already known
>   * to be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> 
> Also, should that prerequisite be backed up with checks in the function?

No, this function only really exists for the needs of iommu_dma_map_sg().

>> + * @pg_sg: scatterlist segment with the page to map
>> + * @dma_sg: scatterlist segment to assign a dma address to
>> + *
>> + * This is a helper for iommu dma_map_sg() implementations when the
>> + * segment for the dma address differs from the segment containing the
>> + * source page.
>> + *
>> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
>> + * returned PCI_P2PDMA_MAP_BUS_ADDR.
> 
> Another prerequisite, so same question: do you think that the code should
> also check that this prerequisite is met?

Again, no, simply because it's this way because of what's required by
iommu_dma.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  2021-05-03  1:29   ` John Hubbard
@ 2021-05-03 17:17     ` Logan Gunthorpe
  2021-05-04  0:17       ` John Hubbard
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:17 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 7:29 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
>> replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
>> flags can be checked for PCI P2PDMA support.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/nvme/host/core.c |  3 ++-
>>   drivers/nvme/host/nvme.h |  2 +-
>>   drivers/nvme/host/pci.c  | 11 +++++++++--
>>   3 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 0896e21642be..223419454516 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3907,7 +3907,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
>>   		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
>>   
>>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
>> -	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
>> +	if (ctrl->ops->supports_pci_p2pdma &&
>> +	    ctrl->ops->supports_pci_p2pdma(ctrl))
> 
> This is a little excessive, as I suspected. How about providing a
> default .supports_pci_p2pdma routine that returns false, so that
> the op is always available (non-null)? By "default", maybe that
> means either requiring an init_the_ops_struct() routine to be
> used, and/or checking all the users of struct nvme_ctrl_ops.

Honestly that sounds much more messy to me than simply checking if it's
NULL before using it (which is a common, accepted pattern for ops).

> Another idea: maybe you don't really need a bool .supports_pci_p2pdma()
> routine at all, because the existing .flags really is about right.
> You just need the flags to be filled in dynamically. So, do that
> during nvme_pci setup/init time: that's when this module would call
> dma_pci_p2pdma_supported().

If the flag is filled in dynamically, then the ops struct would have to
be non-constant. Ops structs should be constant for security reasons.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  2021-05-03  1:34   ` John Hubbard
@ 2021-05-03 17:19     ` Logan Gunthorpe
  2021-05-04  0:26       ` John Hubbard
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 17:19 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-02 7:34 p.m., John Hubbard wrote:
>>   	if (iod->npages == 0)
>>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>>   			      iod->first_dma);
>> @@ -868,14 +857,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>>   	if (!iod->nents)
>>   		goto out_free_sg;
>>   
>> -	if (is_pci_p2pdma_page(sg_page(iod->sg)))
>> -		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
>> -				iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>> -	else
>> -		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
>> -					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
>> -	if (!nr_mapped)
>> +	nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
>> +				     rq_dma_dir(req), DMA_ATTR_NO_WARN);
>> +	if (nr_mapped < 0) {
>> +		if (nr_mapped != -ENOMEM)
>> +			ret = BLK_STS_TARGET;
>>   		goto out_free_sg;
>> +	}
> 
> But now the "nr_mapped == 0" case is no longer doing an early out_free_sg.
> Is that OK?

dma_map_sg_p2pdma_attrs() never returns zero. It will return -ENOMEM in
the same situation and results in the same goto out_free_sg.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-03 15:57     ` Logan Gunthorpe
@ 2021-05-03 18:17       ` John Hubbard
  2021-05-03 18:20         ` Logan Gunthorpe
  2021-05-03 18:24         ` Christoph Hellwig
  0 siblings, 2 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03 18:17 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas

On 5/3/21 8:57 AM, Logan Gunthorpe wrote:
> 
> 
> On 2021-05-01 9:58 p.m., John Hubbard wrote:
>> Another odd thing: this used to check for memory failure and just give
>> up, and now it doesn't. Yes, I realize that it all still works at the
>> moment, but this is quirky and we shouldn't stop here.
>>
>> Instead, a cleaner approach would be to push the memory allocation
>> slightly higher up the call stack, out to the
>> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
>> the kmalloc() call, and fail out if it can't get a page for the seq_buf
>> buffer. Then you don't have to do all this odd stuff.
> 
> I don't really agree with this assessment. If kmalloc fails to
> initialize the seq_buf() (which should be very rare), the only thing
> that is lost is the one warning print that tells the user the command
> line parameter needed disable the ACS. Everything else works fine,
> nothing else can fail. I don't see the need to add extra complexity just
> so the code errors out in no-mem instead of just skipping the one,
> slightly more informative, warning line.

That's the thing: memory failure should be exceedingly rare for this.
Therefore, just fail out entirely (which I don't expect we'll likely
ever see), instead of doing all this weird stuff to try to continue
on if you cannot allocate a single page. If you are in that case, the
system is not in a state that is going to run your dma p2p setup well
anyway.

I think it's *less* complexity to allocate up front, fail early if
allocation fails, and then not have to deal with these really odd
quirks at the lower levels.

> 
> Also, keep in mind the result of all these functions are cached so it
> only ever happens once. So for this to matter, the user would have to do
> their first transaction between two devices exactly at the time memory
> allocations would fail.
> 
> 
>> Furthermore, the call sites can then decide for themselves which GFP
>> flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().
>>
>> A related thing: this whole exercise would go better if there were a
>> preparatory patch or two that changed the return codes in this file to
>> something less crazy. There are too many functions that can fail, but
>> are treated as if they sort-of-mostly-would-never-fail, in the hopes of
>> using the return value directly for counting and such. This is badly
>> mistaken, and it leads developers to try to avoid returning -ENOMEM
>> (which is what we need here).
> 
> Hmm? Which functions can fail? and how?
> 

Let's defer that to the other patches, I was sort of looking ahead to
those, sorry.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-03 16:08     ` Logan Gunthorpe
@ 2021-05-03 18:20       ` John Hubbard
  2021-05-03 18:25       ` Christoph Hellwig
  1 sibling, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03 18:20 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 9:08 AM, Logan Gunthorpe wrote:
...
>> By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems
>> really short. From a naive point of view, I'd expect that there must be
>> a lot more CPUs/chipsets that can do pci p2p, what do you think? I
>> wonder if we have to be so super strict, anyway. It just seems extremely
>> limited, and I suspect there will be some additions to the list as soon
>> as we start to use this.
> 
> Yes, well unfortunately we have no other way to determine what host
> bridges can communicate with P2P. We settled on a whitelist when the
> series was first patch. Nobody likes that situation, but nobody has
> found anything better. We've been hoping standards bodies would give us
> a flag but I haven't heard anything about that. At least AMD has been
> able to guarantee us that all CPUs newer than Zen will support so that
> covers a large swath. It would be nice if we could say something similar
> for Intel.

Thanks for explaining the situation!

> 
>> OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating.
>> Why is it OK to avoid taking any locks in order to retrieve the
>> first entry from the list, but in order to retrieve any other entry, you
>> have to aquire the pci_bus_sem, and get a reference as well? Something
>> is inconsistent there.
>>
>> The new version here also no longer takes a reference on the device,
>> which is also cheating. But I'm guessing that the unstated assumption
>> here is that there is always at least one entry in the list. But if
>> that's true, then it's better to show clearly that assumption, instead
>> of hiding it in an implicit call that skips both locking and reference
>> counting.
> 
> Because we hold a reference to a child device of the bus. So the host
> bus device can't go away until the child device has been released. An
> earlier version of the P2PDMA patchset had a lot more extraneous get
> device calls until someone else pointed this out.
> 
>> You could add a new function, which is a cut-down version of pci_get_slot(),
>> like this, and call this from __host_bridge_whitelist():
>>
>> /*
>>    * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem
>>    * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is
>>    * up, it is safe to call this, because there will always be a top-level PCI
>>    * root device.
>>    *
>>    * Other assumptions: the root device is the first device in the list, and the
>>    * root device is numbered 00.0.
>>    */
>> struct pci_dev *pci_get_root_slot(struct pci_bus *bus)
>> {
>> 	struct pci_dev *root;
>> 	unsigned devfn = PCI_DEVFN(0, 0);
>>
>> 	root = list_first_entry_or_null(&bus->devices, struct pci_dev,
>> 					bus_list);
>> 	if (root->devfn == devfn)
>> 		goto out;
>>
>> 	root = NULL;
>>    out:
>> 	pci_dev_get(root);
>> 	return root;
>> }
>> EXPORT_SYMBOL(pci_get_root_slot);
>>
>> ...I think that's a lot clearer to the reader, about what's going on here.
> 
> Per above, I think the reference count is unnecessary. But I could wrap
> it in a static function for clarity. (There's no reason to export this
> function).
> 

Yes, please.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-03 18:17       ` John Hubbard
@ 2021-05-03 18:20         ` Logan Gunthorpe
  2021-05-03 18:23           ` John Hubbard
  2021-05-03 18:24         ` Christoph Hellwig
  1 sibling, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 18:20 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas



On 2021-05-03 12:17 p.m., John Hubbard wrote:
> On 5/3/21 8:57 AM, Logan Gunthorpe wrote:
>>
>>
>> On 2021-05-01 9:58 p.m., John Hubbard wrote:
>>> Another odd thing: this used to check for memory failure and just give
>>> up, and now it doesn't. Yes, I realize that it all still works at the
>>> moment, but this is quirky and we shouldn't stop here.
>>>
>>> Instead, a cleaner approach would be to push the memory allocation
>>> slightly higher up the call stack, out to the
>>> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
>>> the kmalloc() call, and fail out if it can't get a page for the seq_buf
>>> buffer. Then you don't have to do all this odd stuff.
>>
>> I don't really agree with this assessment. If kmalloc fails to
>> initialize the seq_buf() (which should be very rare), the only thing
>> that is lost is the one warning print that tells the user the command
>> line parameter needed disable the ACS. Everything else works fine,
>> nothing else can fail. I don't see the need to add extra complexity just
>> so the code errors out in no-mem instead of just skipping the one,
>> slightly more informative, warning line.
> 
> That's the thing: memory failure should be exceedingly rare for this.
> Therefore, just fail out entirely (which I don't expect we'll likely
> ever see), instead of doing all this weird stuff to try to continue
> on if you cannot allocate a single page. If you are in that case, the
> system is not in a state that is going to run your dma p2p setup well
> anyway.
> 
> I think it's *less* complexity to allocate up front, fail early if
> allocation fails, and then not have to deal with these really odd
> quirks at the lower levels.
>

I don't see how it's all that weird. We're skipping a warning if we
can't allocate memory to calculate part of the message. It's really not
necessary. If the memory really can't be allocated then something else
will fail, but we really don't need to fail here because we couldn't
print a verbose warning message.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-05-03 16:17     ` Logan Gunthorpe
@ 2021-05-03 18:22       ` John Hubbard
  2021-05-03 18:35       ` Christoph Hellwig
  1 sibling, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03 18:22 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 9:17 AM, Logan Gunthorpe wrote:
>> Returning a "bridge distance" from a "get map type" routine is jarring,
>> and I think it is because of a pre-existing problem: the above function
>> is severely misnamed. Let's try renaming it (and the other one) to
>> approximately:
>>
>>       upstream_bridge_map_type_warn()
>>       upstream_bridge_map_type()
>>
>> ...and that should fix that. Well, that, plus tweaking the kernel doc
>> comments, which are also confused. I think someone started off thinking
>> about distances through PCIe, but in the end, the routine boils down to
>> just a few situations that are not distances at all.
>>
>> Also, the above will read a little better if it is written like this:
>>
>> 	ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
>> 				  map_types_idx(client)));
>>
>> 	if (ret == PCI_P2PDMA_MAP_UNKNOWN)
>> 		ret = upstream_bridge_map_type_warn(provider, client, NULL,
>> 						    GFP_ATOMIC);
>> 	
>> 	return ret;
>>
>>
>>>    }
> 
> I agree that some of this has evolved in a way that some of the names
> are a bit odd now. Could definitely use a cleanup, but that's not really
> part of this series. When I have some time I can look at doing a cleanup
> series to help with some of this.

I'm OK with doing cleanup later. I just tend to call it out when I see it.

> 
>>>    static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
>>> @@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>>    	case PCI_P2PDMA_MAP_BUS_ADDR:
>>>    		return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>>>    	default:
>>> -		WARN_ON_ONCE(1);
>>
>> Why? Or at least, why, in this patch? It looks like an accidental
>> leftover from something, seeing as how it is not directly related to the
>> patch, and is not mentioned at all.
> 
> Before this patch, it was required that users of P2PDMA call
> pci_p2pdma_distance_many() in some form before calling
> pci_p2pdma_map_sg(). So, by convention, a usable map type had to already
> be in the cache. The warning was there to yell at anyone who wrote code
> that violated that convention.
> 
> This patch removes that convention and allows users to map P2PDMA pages
> sight unseen and if the mapping type isn't in the cache, then it will
> determine the mapping type at dma mapping time. Thus, the warning can be
> removed and the function can fail normally if the mapping is unsupported.
> 

Let's add some of those words to the commit description, perhaps, it's nice
to have. Obviously a minor point though.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-03 18:20         ` Logan Gunthorpe
@ 2021-05-03 18:23           ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03 18:23 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas

On 5/3/21 11:20 AM, Logan Gunthorpe wrote:
...
>> That's the thing: memory failure should be exceedingly rare for this.
>> Therefore, just fail out entirely (which I don't expect we'll likely
>> ever see), instead of doing all this weird stuff to try to continue
>> on if you cannot allocate a single page. If you are in that case, the
>> system is not in a state that is going to run your dma p2p setup well
>> anyway.
>>
>> I think it's *less* complexity to allocate up front, fail early if
>> allocation fails, and then not have to deal with these really odd
>> quirks at the lower levels.
>>
> 
> I don't see how it's all that weird. We're skipping a warning if we
> can't allocate memory to calculate part of the message. It's really not
> necessary. If the memory really can't be allocated then something else
> will fail, but we really don't need to fail here because we couldn't
> print a verbose warning message.
> 

Well, I really dislike the result we have in this particular patch, but
I won't stand in the way of progress if that's how you really are going
to do it.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-03 18:17       ` John Hubbard
  2021-05-03 18:20         ` Logan Gunthorpe
@ 2021-05-03 18:24         ` Christoph Hellwig
  1 sibling, 0 replies; 99+ messages in thread
From: Christoph Hellwig @ 2021-05-03 18:24 UTC (permalink / raw)
  To: John Hubbard
  Cc: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu, Stephen Bates, Christoph Hellwig,
	Dan Williams, Jason Gunthorpe, Christian König, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Jakowski Andrzej, Minturn Dave B,
	Jason Ekstrand, Dave Hansen, Xiong Jianxin, Bjorn Helgaas,
	Ira Weiny, Robin Murphy, Bjorn Helgaas

On Mon, May 03, 2021 at 11:17:31AM -0700, John Hubbard wrote:
> That's the thing: memory failure should be exceedingly rare for this.
> Therefore, just fail out entirely (which I don't expect we'll likely
> ever see), instead of doing all this weird stuff to try to continue
> on if you cannot allocate a single page. If you are in that case, the
> system is not in a state that is going to run your dma p2p setup well
> anyway.
>
> I think it's *less* complexity to allocate up front, fail early if
> allocation fails, and then not have to deal with these really odd
> quirks at the lower levels.

Agreed.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-03 16:08     ` Logan Gunthorpe
  2021-05-03 18:20       ` John Hubbard
@ 2021-05-03 18:25       ` Christoph Hellwig
  1 sibling, 0 replies; 99+ messages in thread
From: Christoph Hellwig @ 2021-05-03 18:25 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Jason Gunthorpe, Christian König, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Jakowski Andrzej, Minturn Dave B,
	Jason Ekstrand, Dave Hansen, Xiong Jianxin, Bjorn Helgaas,
	Ira Weiny, Robin Murphy

On Mon, May 03, 2021 at 10:08:34AM -0600, Logan Gunthorpe wrote:
> Per above, I think the reference count is unnecessary. But I could wrap
> it in a static function for clarity. (There's no reason to export this
> function).

A well documented helper function would really help to improve the
code for the casual reader I think.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-27 23:01       ` Jason Gunthorpe
@ 2021-05-03 18:28         ` Christoph Hellwig
  2021-05-03 18:31           ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: Christoph Hellwig @ 2021-05-03 18:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu, Stephen Bates, Christoph Hellwig,
	Dan Williams, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Jakowski Andrzej, Minturn Dave B,
	Jason Ekstrand, Dave Hansen, Xiong Jianxin, Bjorn Helgaas,
	Ira Weiny, Robin Murphy

On Tue, Apr 27, 2021 at 08:01:13PM -0300, Jason Gunthorpe wrote:
> At a high level I'm OK with it. dma_map_sg_attrs() is the extra
> extended version of dma_map_sg(), it already has a different
> signature, a different return code is not out of the question.
> 
> dma_map_sg() is just the simple easy to use interface that can't do
> advanced stuff.
> 
> > I'm not that opposed to this. But it will make this series a fair bit
> > longer to change the 8 map_sg_attrs() usages.
> 
> Yes, but the result seems much nicer to not grow the DMA API further.

We already have a mapping function that can return errors:
dma_map_sgtable.

I think it might make more sense to piggy back on that, as the sg_table
abstraction is pretty useful basically everywhere that we deal with
scatterlists anyway.

In the hopefully no too long run I plan to get rid of scatterlists in
at least NVMe and other high performance devices anyway.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-05-03 18:28         ` Christoph Hellwig
@ 2021-05-03 18:31           ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 18:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	iommu, Stephen Bates, Dan Williams, Christian König,
	John Hubbard, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-03 12:28 p.m., Christoph Hellwig wrote:
> On Tue, Apr 27, 2021 at 08:01:13PM -0300, Jason Gunthorpe wrote:
>> At a high level I'm OK with it. dma_map_sg_attrs() is the extra
>> extended version of dma_map_sg(), it already has a different
>> signature, a different return code is not out of the question.
>>
>> dma_map_sg() is just the simple easy to use interface that can't do
>> advanced stuff.
>>
>>> I'm not that opposed to this. But it will make this series a fair bit
>>> longer to change the 8 map_sg_attrs() usages.
>>
>> Yes, but the result seems much nicer to not grow the DMA API further.
> 
> We already have a mapping function that can return errors:
> dma_map_sgtable.
> 
> I think it might make more sense to piggy back on that, as the sg_table
> abstraction is pretty useful basically everywhere that we deal with
> scatterlists anyway.

Oh, I didn't even realize that existed. I'll use dma_map_sgtable() for v2.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-03 16:30     ` Logan Gunthorpe
@ 2021-05-03 18:31       ` John Hubbard
  2021-05-03 18:56         ` Logan Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03 18:31 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 9:30 AM, Logan Gunthorpe wrote:
> 
> 
> On 2021-05-02 2:41 p.m., John Hubbard wrote:
>> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>>> All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
>>> struct device (of the client doing the DMA transfer). Thus move the
>>> conversion to struct pci_devs for the provider and client into this
>>> function.
>>
>> Actually, this is the wrong direction to go! All of these pre-existing
>> pci_*() functions have a small problem already: they are dealing with
>> struct device, instead of struct pci_dev. And so refactoring should be
>> pushing the conversion to pci_dev *up* the calling stack, not lower as
>> the patch here proposes.
>>
>> Also, there is no improvement in clarity by passing in (pgmap, dev)
>> instead of the previous (provider, client). Now you have to do more type
>> checking in the leaf function, which is another indication of a problem.
>>
>> Let's go that direction, please? Just convert to pci_dev much higher in
>> the calling stack, and you'll find that everything fits together better.
>> And it's OK to pass in extra params if that turns out to be necessary,
>> after all.
> 
> No, I disagree with this and it seems a bit confused. This change is

I am not confused here, no. Other places, yes, but not at this moment. :)

> allowing callers to call the function with what they have and doing more
> checks inside the called function. This allows for *less* checks in the
> leaf function, not more checks. (I mean, look at the patch itself, it
> puts a bunch of checks in both call sites into the callee and makes the
> code a lot cleaner -- it's removing more lines than it adds).
> 
> Similar argument can be made with the pci_p2pdma_distance_many() (which
> I assume you are referring to). If the function took struct pci_dev
> instead of struct device, every caller would need to do all checks and
> conversions to struct pci_dev. That is not an improvement.
> 


IMHO, it is better to have all of the pci_*() functions dealing with pci_dev
instead of dev, but it is also true that this is a larger change, so I
won't press the point too hard right now.

The reason I commented was that this refactoring goes in the opposite
direction that I would be going in, if I were to start "improving" this
part of the kernel, via refactoring.

Anyway, I'll leave it alone.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-05-03 16:17     ` Logan Gunthorpe
  2021-05-03 18:22       ` John Hubbard
@ 2021-05-03 18:35       ` Christoph Hellwig
  2021-05-03 18:46         ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Christoph Hellwig @ 2021-05-03 18:35 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu, Stephen Bates, Christoph Hellwig, Dan Williams,
	Jason Gunthorpe, Christian König, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Jakowski Andrzej, Minturn Dave B,
	Jason Ekstrand, Dave Hansen, Xiong Jianxin, Bjorn Helgaas,
	Ira Weiny, Robin Murphy

On Mon, May 03, 2021 at 10:17:59AM -0600, Logan Gunthorpe wrote:
> I agree that some of this has evolved in a way that some of the names
> are a bit odd now. Could definitely use a cleanup, but that's not really
> part of this series. When I have some time I can look at doing a cleanup
> series to help with some of this.

I think adding it to the series would be very helpful.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-05-03 18:35       ` Christoph Hellwig
@ 2021-05-03 18:46         ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 18:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu, Stephen Bates, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-03 12:35 p.m., Christoph Hellwig wrote:
> On Mon, May 03, 2021 at 10:17:59AM -0600, Logan Gunthorpe wrote:
>> I agree that some of this has evolved in a way that some of the names
>> are a bit odd now. Could definitely use a cleanup, but that's not really
>> part of this series. When I have some time I can look at doing a cleanup
>> series to help with some of this.
> 
> I think adding it to the series would be very helpful.

Ok, I'll prepend a handful of cleanup patches.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-03 18:31       ` John Hubbard
@ 2021-05-03 18:56         ` Logan Gunthorpe
  2021-05-03 21:54           ` John Hubbard
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-03 18:56 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-03 12:31 p.m., John Hubbard wrote:
> On 5/3/21 9:30 AM, Logan Gunthorpe wrote:
>>
>>
>> On 2021-05-02 2:41 p.m., John Hubbard wrote:
>>> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>>>> All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
>>>> struct device (of the client doing the DMA transfer). Thus move the
>>>> conversion to struct pci_devs for the provider and client into this
>>>> function.
>>>
>>> Actually, this is the wrong direction to go! All of these pre-existing
>>> pci_*() functions have a small problem already: they are dealing with
>>> struct device, instead of struct pci_dev. And so refactoring should be
>>> pushing the conversion to pci_dev *up* the calling stack, not lower as
>>> the patch here proposes.
>>>
>>> Also, there is no improvement in clarity by passing in (pgmap, dev)
>>> instead of the previous (provider, client). Now you have to do more type
>>> checking in the leaf function, which is another indication of a problem.
>>>
>>> Let's go that direction, please? Just convert to pci_dev much higher in
>>> the calling stack, and you'll find that everything fits together better.
>>> And it's OK to pass in extra params if that turns out to be necessary,
>>> after all.
>>
>> No, I disagree with this and it seems a bit confused. This change is
> 
> I am not confused here, no. Other places, yes, but not at this moment. :)

I only meant confused because you suggested that such a change would
reduce checks in the leaf functions when in fact it's the opposite.

>> allowing callers to call the function with what they have and doing more
>> checks inside the called function. This allows for *less* checks in the
>> leaf function, not more checks. (I mean, look at the patch itself, it
>> puts a bunch of checks in both call sites into the callee and makes the
>> code a lot cleaner -- it's removing more lines than it adds).
>>
>> Similar argument can be made with the pci_p2pdma_distance_many() (which
>> I assume you are referring to). If the function took struct pci_dev
>> instead of struct device, every caller would need to do all checks and
>> conversions to struct pci_dev. That is not an improvement.
>>
> 
> 
> IMHO, it is better to have all of the pci_*() functions dealing with pci_dev
> instead of dev, but it is also true that this is a larger change, so I
> won't press the point too hard right now.

As a general rule, I'd agree with you. However, it's not good to blindly
follow the rule when there might be good reasons to do it differently.

In this case, the caller doesn't have PCI devices. The nvme fabrics code
has a number of block devices and an RDMA device. It doesn't even know
if these devices are backed by PCI devices and it doesn't have a direct
path to obtain the pci_dev.

Each struct device, might be turned into a pci_dev using the static
function find_parent_pci_dev(). If any device is not a PCI device then
we reject the P2PDMA transaction as not supported. Pushing the
find_parent_pci_dev() logic into the callers is, IMO, just asking the
callers to replicate a bunch of logic it shouldn't even be aware of
creating messier code as a result.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-03 18:56         ` Logan Gunthorpe
@ 2021-05-03 21:54           ` John Hubbard
  2021-05-03 22:57             ` Jason Gunthorpe
  0 siblings, 1 reply; 99+ messages in thread
From: John Hubbard @ 2021-05-03 21:54 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 11:56 AM, Logan Gunthorpe wrote:
...
>> IMHO, it is better to have all of the pci_*() functions dealing with pci_dev
>> instead of dev, but it is also true that this is a larger change, so I
>> won't press the point too hard right now.
> 
> As a general rule, I'd agree with you. However, it's not good to blindly
> follow the rule when there might be good reasons to do it differently.
> 
> In this case, the caller doesn't have PCI devices. The nvme fabrics code
> has a number of block devices and an RDMA device. It doesn't even know
> if these devices are backed by PCI devices and it doesn't have a direct
> path to obtain the pci_dev.
> 
> Each struct device, might be turned into a pci_dev using the static
> function find_parent_pci_dev(). If any device is not a PCI device then
> we reject the P2PDMA transaction as not supported. Pushing the
> find_parent_pci_dev() logic into the callers is, IMO, just asking the
> callers to replicate a bunch of logic it shouldn't even be aware of
> creating messier code as a result.
> 

I guess my main concern here is that there are these pci*() functions
that somehow want to pass around struct device. If a layer is carefully
named throughout with pci in the function names, then something is still
misaligned.  This can happen over time, of course. But the really best
patchsets try to avoid or mitigate the effect, by keeping names and
functionality carefully aligned.

Anyway, I've bugged you enough, I should just wait and see what the next
round looks like, at this point. :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-03 21:54           ` John Hubbard
@ 2021-05-03 22:57             ` Jason Gunthorpe
  2021-05-03 23:40               ` John Hubbard
  0 siblings, 1 reply; 99+ messages in thread
From: Jason Gunthorpe @ 2021-05-03 22:57 UTC (permalink / raw)
  To: John Hubbard
  Cc: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu, Stephen Bates, Christoph Hellwig,
	Dan Williams, Christian König, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On Mon, May 03, 2021 at 02:54:26PM -0700, John Hubbard wrote:

> I guess my main concern here is that there are these pci*() functions
> that somehow want to pass around struct device.

Well, this is the main issue - helpers being used inside IOMMU code
should not be called pci* functions. This is some generic device p2p
interface that happens to only support PCI to PCI transfers today.

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  2021-05-03 22:57             ` Jason Gunthorpe
@ 2021-05-03 23:40               ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-03 23:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu, Stephen Bates, Christoph Hellwig,
	Dan Williams, Christian König, Don Dutile, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 5/3/21 3:57 PM, Jason Gunthorpe wrote:
> On Mon, May 03, 2021 at 02:54:26PM -0700, John Hubbard wrote:
> 
>> I guess my main concern here is that there are these pci*() functions
>> that somehow want to pass around struct device.
> 
> Well, this is the main issue - helpers being used inside IOMMU code
> should not be called pci* functions. This is some generic device p2p
> interface that happens to only support PCI to PCI transfers today.
> 

Yes, maybe renaming a few levels of functions would help at least clarify
what the code can do. Once the code reaches layers that truly are
PCI-specific, that's where it should transition to using pci_dev args,
and that's also where it should return -ENOTSUPP back up the calling
stack.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-03 17:04     ` Logan Gunthorpe
@ 2021-05-04  0:01       ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-04  0:01 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 10:04 AM, Logan Gunthorpe wrote:
> Oops missed a comment:
> 
> On 2021-05-02 5:28 p.m., John Hubbard wrote:
>>>    int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>>    		enum dma_data_direction dir, unsigned long attrs)
>>>    {
>>> -	int i;
>>> +	struct pci_p2pdma_map_state p2pdma_state = {};
>>
>> Is it worth putting this stuff on the stack--is there a noticeable
>> performance improvement from caching the state? Because if it's
>> invisible, then simplicity is better. I suspect you're right, and that
>> it *is* worth it, but it's good to know for real.
> 
> I haven't measured it (it would be hard to measure), but I think it's
> fairly clear here. Without the state, xa_load() would need to be called
> on *every* page in an SGL that maps only P2PDMA memory from one device.
> With the state, it only needs to be called once. xa_load() is cheap, but
> it is not that cheap.

OK, thanks for spelling it out for me. :)

> 
> There's essentially the same optimization in get_user_pages for
> ZONE_DEVICE pages. So, if it is necessary there, it should be necessary
> here.
> 

Right, that's a pretty solid example.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  2021-05-03 16:55     ` Logan Gunthorpe
@ 2021-05-04  0:12       ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-04  0:12 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 9:55 AM, Logan Gunthorpe wrote:
...
>> The same thing can be achieved with fewer lines and a bit more clarity.
>> Can we please do it like this instead:
>>
>> 	for_each_sg(sgl, sg, nents, i) {
>> 		if (sg_is_pci_p2pdma(sg))
>> 			sg_unmark_pci_p2pdma(sg);
>> 		else
>> 			dma_direct_unmap_page(dev, sg->dma_address,
>> 					      sg_dma_len(sg), dir, attrs);
>> 	}
>>
>>
> 
> That's debatable (the way I did it emphasizes the common case). But I'll
> consider changing it.
> 

Thanks for considering it.

>>
>> Also here, a block comment for the function would be nice. How about
>> approximately this:
>>
>> /*
>>    * Maps each SG segment. Returns the number of entries mapped, or 0 upon
>>    * failure. If any entry could not be mapped, then no entries are mapped.
>>    */
>>
>> I'll stop complaining about the pre-existing return code conventions,
>> since by now you know what I was thinking of saying. :)
> 
> Not really part of this patchset... Seems like if you think there should
> be a comment like that here, you should send a patch. But this patch
> starts returning a negative value here.

OK, that's fine. Like you say, that comment is rather beyond this patchset.

>>>    	for_each_sg(sgl, sg, nents, i) {
>>> +		if (is_pci_p2pdma_page(sg_page(sg))) {
>>> +			ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg,
>>> +						     attrs);
>>> +			if (ret < 0) {
>>> +				goto out_unmap;
>>> +			} else if (ret) {
>>> +				ret = 0;
>>> +				continue;
>>
>> Is this a bug? If neither of those "if" branches fires (ret == 0), then
>> the code (probably unintentionally) falls through and continues on to
>> attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page!
> 
> No, it's not a bug. Per the documentation of pci_p2pdma_map_segment(),
> if it returns zero the segment should be mapped normally. P2PDMA pages
> must be mapped with physical addresses (or IOVA addresses) if the TLPS
> for the transaction will go through the host bridge.

Could we maybe put a little comment there, to that effect? It would be
nice to call out that point, especially since it is common to miss one
case (negative, 0, positive) when handling return values. Sort of like
we used to put "// fallthrough" in the case statements. Not a big deal
of course.

> 
>> See below for suggestions:
>>
>>> +			}
>>> +		}
>>> +
>>>    		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
>>>    				sg->offset, sg->length, dir, attrs);
>>>    		if (sg->dma_address == DMA_MAPPING_ERROR)
>>
>> This is another case in which "continue" is misleading and not as good
>> as "else". Because unless I'm wrong above, you really only want to take
>> one path *or* the other.
> 
> No, per above, it's not one path or the other. If it's a P2PDMA page it
> may still need to be mapped normally.
> 

Right. That follows.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  2021-05-03 17:17     ` Logan Gunthorpe
@ 2021-05-04  0:17       ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-04  0:17 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 10:17 AM, Logan Gunthorpe wrote:
...
>>>    	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
>>> -	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
>>> +	if (ctrl->ops->supports_pci_p2pdma &&
>>> +	    ctrl->ops->supports_pci_p2pdma(ctrl))
>>
>> This is a little excessive, as I suspected. How about providing a
>> default .supports_pci_p2pdma routine that returns false, so that
>> the op is always available (non-null)? By "default", maybe that
>> means either requiring an init_the_ops_struct() routine to be
>> used, and/or checking all the users of struct nvme_ctrl_ops.
> 
> Honestly that sounds much more messy to me than simply checking if it's
> NULL before using it (which is a common, accepted pattern for ops).

OK, it's a minor suggestion, so feel free to ignore if you prefer it
the other way, sure.

> 
>> Another idea: maybe you don't really need a bool .supports_pci_p2pdma()
>> routine at all, because the existing .flags really is about right.
>> You just need the flags to be filled in dynamically. So, do that
>> during nvme_pci setup/init time: that's when this module would call
>> dma_pci_p2pdma_supported().
> 
> If the flag is filled in dynamically, then the ops struct would have to
> be non-constant. Ops structs should be constant for security reasons.
> 

Hadn't thought about keeping ops structs constant. OK.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  2021-05-03 17:19     ` Logan Gunthorpe
@ 2021-05-04  0:26       ` John Hubbard
  0 siblings, 0 replies; 99+ messages in thread
From: John Hubbard @ 2021-05-04  0:26 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/3/21 10:19 AM, Logan Gunthorpe wrote:
...
>>> +	nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
>>> +				     rq_dma_dir(req), DMA_ATTR_NO_WARN);
>>> +	if (nr_mapped < 0) {
>>> +		if (nr_mapped != -ENOMEM)
>>> +			ret = BLK_STS_TARGET;
>>>    		goto out_free_sg;
>>> +	}
>>
>> But now the "nr_mapped == 0" case is no longer doing an early out_free_sg.
>> Is that OK?
> 
> dma_map_sg_p2pdma_attrs() never returns zero. It will return -ENOMEM in
> the same situation and results in the same goto out_free_sg.
> 

OK...that's true, it doesn't return zero. A comment or WARN or something
might be nice, to show that the zero case hasn't been overlooked. It's
true that the dma_map_sg_p2pdma_attrs() documentation sort of says
that (although not quite out loud).

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-05-03  1:14   ` John Hubbard
@ 2021-05-06 23:59     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-06 23:59 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

Sorry, I think I missed responding to this one so here are the answers:

On 2021-05-02 7:14 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
>>
>> Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
>> indicate bus address segments. On unmap, P2PDMA segments are skipped
>> over when determining the start and end IOVA addresses.
>>
>> With this change, the flags variable in the dma_map_ops is
>> set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
>> P2PDMA pages.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/iommu/dma-iommu.c | 66 ++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 58 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index af765c813cc8..ef49635f9819 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -20,6 +20,7 @@
>>   #include <linux/mm.h>
>>   #include <linux/mutex.h>
>>   #include <linux/pci.h>
>> +#include <linux/pci-p2pdma.h>
>>   #include <linux/swiotlb.h>
>>   #include <linux/scatterlist.h>
>>   #include <linux/vmalloc.h>
>> @@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev,
>> struct scatterlist *sg, int nents,
>>           sg_dma_address(s) = DMA_MAPPING_ERROR;
>>           sg_dma_len(s) = 0;
>>   +        if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
> 
> Newbie question: I'm in the dark as to why the !s_iova_len check is there,
> can you please enlighten me?

The loop in iommu_dma_map_sg() will decide what to do with P2PDMA pages.
If it is to map it with the bus address it will set s_iova_len to zero
so that no space is allocated in the IOVA. If it is to map it through
the host bridge, then it it will leave s_iova_len alone and create the
appropriate mapping with the CPU physical address.

This condition notices that s_iova_len was set to zero and fills in a SG
segment with the PCI bus address for that region.


> 
>> +            if (i > 0)
>> +                cur = sg_next(cur);
>> +
>> +            pci_p2pdma_map_bus_segment(s, cur);
>> +            count++;
>> +            cur_len = 0;
>> +            continue;
>> +        }
>> +
> 
> This is really an if/else condition. And arguably, it would be better
> to split out two subroutines, and call one or the other depending on
> the result of if is_pci_p2pdma_page(), instead of this "continue" approach.

I really disagree here. Putting the exceptional condition in it's own if
statement and leaving the normal case un-indented is easier to read and
understand. It also saves an extra level of indentation in code that is
already starting to look a little squished.


>>           /*
>>            * Now fill in the real DMA data. If...
>>            * - there is a valid output segment to append to
>> @@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>       struct iova_domain *iovad = &cookie->iovad;
>>       struct scatterlist *s, *prev = NULL;
>>       int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
>> +    struct dev_pagemap *pgmap = NULL;
>> +    enum pci_p2pdma_map_type map_type;
>>       dma_addr_t iova;
>>       size_t iova_len = 0;
>>       unsigned long mask = dma_get_seg_boundary(dev);
>> -    int i;
>> +    int i, ret = 0;
>>         if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
>>           iommu_deferred_attach(dev, domain))
>> @@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>           s_length = iova_align(iovad, s_length + s_iova_off);
>>           s->length = s_length;
>>   +        if (is_pci_p2pdma_page(sg_page(s))) {
>> +            if (sg_page(s)->pgmap != pgmap) {
>> +                pgmap = sg_page(s)->pgmap;
>> +                map_type = pci_p2pdma_map_type(pgmap, dev,
>> +                                   attrs);
>> +            }
>> +
>> +            switch (map_type) {
>> +            case PCI_P2PDMA_MAP_BUS_ADDR:
>> +                /*
>> +                 * A zero length will be ignored by
>> +                 * iommu_map_sg() and then can be detected
>> +                 * in __finalise_sg() to actually map the
>> +                 * bus address.
>> +                 */
>> +                s->length = 0;
>> +                continue;
>> +            case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>> +                break;
>> +            default:
>> +                ret = -EREMOTEIO;
>> +                goto out_restore_sg;
>> +            }
>> +        }
>> +
>>           /*
>>            * Due to the alignment of our single IOVA allocation, we can
>>            * depend on these assumptions about the segment boundary mask:
>> @@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>           prev = s;
>>       }
>>   +    if (!iova_len)
>> +        return __finalise_sg(dev, sg, nents, 0);
>> +
> 
> ohhh, we're really slicing up this function pretty severely, what with the
> continue and the early out and several other control flow changes. I think
> it would be better to spend some time factoring this function into two
> cases, now that you're adding a second case for PCI P2PDMA. Roughly,
> two subroutines would do it.

I don't see how we can factor this into two cases. The SGL may contain
normal pages or P2PDMA pages or a mix of both and we have to create an
IOVA area for all the regions that map the CPU physical address (ie
normal pages and some P2PDMA pages) then also insert segments for any
PCI bus address.

> As it is, this leaves behind a routine that is extremely hard to mentally
> verify as correct.

Yes, this is tricky code, but not that incomprehensible. Most of the
difficulty is in understanding how it works before adding the P2PDMA bits.

There are two loops: one to prepare the IOVA region and another to fill
in the SGL. We have to add cases in both loops to skip the segments that
need to be mapped with the bus address in the first loop, and insert the
dma SGL segments in the second loop.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/16] Add new DMA mapping operation for P2PDMA
  2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
                   ` (17 preceding siblings ...)
  2021-05-02  1:22 ` John Hubbard
@ 2021-05-11 16:05 ` Don Dutile
  18 siblings, 0 replies; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> Hi,
>
> This patchset continues my work to to add P2PDMA support to the common
> dma map operations. This allows for creating SGLs that have both P2PDMA
> and regular pages which is a necessary step to allowing P2PDMA pages in
> userspace.
>
> The earlier RFC[1] generated a lot of great feedback and I heard no show
> stopping objections. Thus, I've incorporated all the feedback and have
> decided to post this as a proper patch series with hopes of eventually
> getting it in mainline.
>
> I'm happy to do a few more passes if anyone has any further feedback
> or better ideas.
>
> This series is based on v5.12-rc6 and a git branch can be found here:
>
>    https://github.com/sbates130272/linux-p2pmem/  p2pdma_map_ops_v1
>
> Thanks,
>
> Logan
>
> [1] https://lore.kernel.org/linux-block/20210311233142.7900-1-logang@deltatee.com/
>
>
> Changes since the RFC:
>   * Added comment and fixed up the pci_get_slot patch. (per Bjorn)
>   * Fixed glaring sg_phys() double offset bug. (per Robin)
>   * Created a new map operation (dma_map_sg_p2pdma()) with a new calling
>     convention instead of modifying the calling convention of
>     dma_map_sg(). (per Robin)
>   * Integrated the two similar pci_p2pdma_dma_map_type() and
>     pci_p2pdma_map_type() functions into one (per Ira)
>   * Reworked some of the logic in the map_sg() implementations into
>     helpers in the p2pdma code. (per Christoph)
>   * Dropped a bunch of unnecessary symbol exports (per Christoph)
>   * Expanded the code in dma_pci_p2pdma_supported() for clarity. (per
>     Ira and Christoph)
>   * Finished off using the new dma_map_sg_p2pdma() call in rdma_rw
>     and removed the old pci_p2pdma_[un]map_sg(). (per Jason)
>
> --
>
> Logan Gunthorpe (16):
>    PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
>    PCI/P2PDMA: Avoid pci_get_slot() which sleeps
>    PCI/P2PDMA: Attempt to set map_type if it has not been set
>    PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
>    dma-mapping: Introduce dma_map_sg_p2pdma()
>    lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
>    PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
>    PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
>    dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
>    dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
>    iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
>    nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
>    nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
>    nvme-rdma: Ensure dma support when using p2pdma
>    RDMA/rw: use dma_map_sg_p2pdma()
>    PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
>
>   drivers/infiniband/core/rw.c |  50 +++-------
>   drivers/iommu/dma-iommu.c    |  66 ++++++++++--
>   drivers/nvme/host/core.c     |   3 +-
>   drivers/nvme/host/nvme.h     |   2 +-
>   drivers/nvme/host/pci.c      |  39 ++++----
>   drivers/nvme/target/rdma.c   |   3 +-
>   drivers/pci/Kconfig          |   2 +-
>   drivers/pci/p2pdma.c         | 188 +++++++++++++++++++----------------
>   include/linux/dma-map-ops.h  |   3 +
>   include/linux/dma-mapping.h  |  20 ++++
>   include/linux/pci-p2pdma.h   |  53 ++++++----
>   include/linux/scatterlist.h  |  49 ++++++++-
>   include/rdma/ib_verbs.h      |  32 ++++++
>   kernel/dma/direct.c          |  25 ++++-
>   kernel/dma/mapping.c         |  70 +++++++++++--
>   15 files changed, 416 insertions(+), 189 deletions(-)
>
>
> base-commit: e49d033bddf5b565044e2abe4241353959bc9120
> --
> 2.20.1
>
Apologies in the delay to provide feedback; climbing out of several deep trenches at the mother ship :-/

Replying to some directly, and indirectly (mostly through JohH's reply's).

General comments:
1) nits in 1,2,3,5;
    4: I agree w/JohnH & JasonG -- seems like it needs a device-layer that gets to a bus-layer, but I'm wearing my 'broader then PCI' hat in this review; I see a (classic) ChristophH refactoring and cleanup in this area, and wondering if we ought to clean it up now, since CH has done so much to clean it up and make the dma-mapping system so much easier to add/modify/review due to the broad arch (& bus) cleanup that has been done.  If that delays it too much, then add a TODO to do so.
2) 6: yes! let's not worry or even both supporting 32-bit anything wrt p2pdma.
3) 7:nit
4) 8: ok;
5) 9: ditto to JohnH's feedback on added / clearer comment & code flow (if-else).
6) 10: nits; q: should p2pdma mapping go through dma-ops so it is generalized for future interconnects (CXL, GenZ)?
7) 11: It says it is supporting p2pdma in dma-iommu's map_sg, but it seems like it is just leveraging shared code and short-circuiting IOMMU use.
8) 12-14: didn't review; letting the block/nvme/direct-io folks cover this space
9) 15: Looking to JasonG to sanitize
10) 16: cleanup; a-ok.

- DonD


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-02  3:58   ` John Hubbard
  2021-05-03 15:57     ` Logan Gunthorpe
@ 2021-05-11 16:05     ` Don Dutile
  2021-05-11 16:12       ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: John Hubbard, Logan Gunthorpe, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas

On 5/1/21 11:58 PM, John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> In order to call upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. The only reason it does sleep is to allocate the seqbuf
>> to print which devices are within the ACS path.
>>
>> Switch the kmalloc call to use a passed in gfp_mask and don't print that
>> message if the buffer fails to be allocated.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>> ---
>>   drivers/pci/p2pdma.c | 21 +++++++++++----------
>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 196382630363..bd89437faf06 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>>     static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
>>   {
>> -    if (!buf)
>> +    if (!buf || !buf->buffer)
>
> This is not great, sort of from an overall design point of view, even though
> it makes the rest of the patch work. See below for other ideas, that will
> avoid the need for this sort of odd point fix.
>
+1.
In fact, I didn't see how the kmalloc was changed... you refactored the code to pass-in the
GFP_KERNEL that was originally hard-coded into upstream_bridge_distance_warn();
I don't see how that avoided the kmalloc() call.
in fact, I also see you lost a failed kmalloc() check, so it seems to have taken a step back.

>>           return;
>>         seq_buf_printf(buf, "%s;", pci_name(pdev));
>> @@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
>>     static enum pci_p2pdma_map_type
>>   upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
>> -                  int *dist)
>> +                  int *dist, gfp_t gfp_mask)
>>   {
>>       struct seq_buf acs_list;
>>       bool acs_redirects;
>>       int ret;
>>   -    seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
>> -    if (!acs_list.buffer)
>> -        return -ENOMEM;
>
> Another odd thing: this used to check for memory failure and just give
> up, and now it doesn't. Yes, I realize that it all still works at the
> moment, but this is quirky and we shouldn't stop here.
>
> Instead, a cleaner approach would be to push the memory allocation
> slightly higher up the call stack, out to the
> pci_p2pdma_distance_many(). So pci_p2pdma_distance_many() should make
> the kmalloc() call, and fail out if it can't get a page for the seq_buf
> buffer. Then you don't have to do all this odd stuff.
>
> Furthermore, the call sites can then decide for themselves which GFP
> flags, GFP_ATOMIC or GFP_KERNEL or whatever they want for kmalloc().
>
agree, good proposal to avoid a sleep due to kmalloc().

> A related thing: this whole exercise would go better if there were a
> preparatory patch or two that changed the return codes in this file to
> something less crazy. There are too many functions that can fail, but
> are treated as if they sort-of-mostly-would-never-fail, in the hopes of
> using the return value directly for counting and such. This is badly
> mistaken, and it leads developers to try to avoid returning -ENOMEM
> (which is what we need here).
>
> Really, these functions should all be doing "0 for success, -ERRNO for
> failure, and pass other values, including results, in the arg list".
>
WFM!

>
>> +    seq_buf_init(&acs_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
>>         ret = upstream_bridge_distance(provider, client, dist, &acs_redirects,
>>                          &acs_list);
>>       if (acs_redirects) {
>>           pci_warn(client, "ACS redirect is set between the client and provider (%s)\n",
>>                pci_name(provider));
>> -        /* Drop final semicolon */
>> -        acs_list.buffer[acs_list.len-1] = 0;
>> -        pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
>> -             acs_list.buffer);
>> +
>> +        if (acs_list.buffer) {
>> +            /* Drop final semicolon */
>> +            acs_list.buffer[acs_list.len - 1] = 0;
>> +            pci_warn(client, "to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=%s\n",
>> +                 acs_list.buffer);
>> +        }
>>       }
>>         if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
>> @@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>>             if (verbose)
>>               ret = upstream_bridge_distance_warn(provider,
>> -                    pci_client, &distance);
>> +                    pci_client, &distance, GFP_KERNEL);
>>           else
>>               ret = upstream_bridge_distance(provider, pci_client,
>>                                  &distance, NULL, NULL);
>>
>
> thanks,


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
  2021-05-02  5:35   ` John Hubbard
@ 2021-05-11 16:05   ` Don Dutile
  2021-05-11 16:14     ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> In order to use upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
> might sleep.
>
> In order to avoid this, try to get the host bridge's device from
> bus->self, and if that is not set, just get the first element in the
> device list. It should be impossible for the host bridge's device to
> go away while references are held on child devices, so the first element
> should not be able to change and, thus, this should be safe.
Bjorn:
Why wouldn't (shouldn't?) the bus->self field be set for a host bridge device?
Should this situation be repaired in the host-brige config/setup code elsewhere in the kernel.
... and here, a check-and-fail with info of what doesn't have it setup (another new pci function to do the check & prinfo), so it can point to the offending host-bridge, and thus, the code that needs to be updated?


> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bd89437faf06..473a08940fbc 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
>   static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>   				    bool same_host_bridge)
>   {
> -	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>   	const struct pci_p2pdma_whitelist_entry *entry;
> +	struct pci_dev *root = host->bus->self;
>   	unsigned short vendor, device;
>   
> +	/*
> +	 * This makes the assumption that the first device on the bus is the
> +	 * bridge itself and it has the devfn of 00.0. This assumption should
> +	 * hold for the devices in the white list above, and if there are cases
> +	 * where this isn't true they will have to be dealt with when such a
> +	 * case is added to the whitelist.
> +	 */
>   	if (!root)
> +		root = list_first_entry_or_null(&host->bus->devices,
> +						struct pci_dev, bus_list);
> +
> +	if (!root || root->devfn)
>   		return false;
>   
>   	vendor = root->vendor;
>   	device = root->device;
> -	pci_dev_put(root);
>   
>   	for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
>   		if (vendor != entry->vendor || device != entry->device)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-02  5:35   ` John Hubbard
  2021-05-03 16:08     ` Logan Gunthorpe
@ 2021-05-11 16:05     ` Don Dutile
  2021-05-11 16:16       ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: John Hubbard, Logan Gunthorpe, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/2/21 1:35 AM, John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> In order to use upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
>> might sleep.
>>
>> In order to avoid this, try to get the host bridge's device from
>> bus->self, and if that is not set, just get the first element in the
>> device list. It should be impossible for the host bridge's device to
>> go away while references are held on child devices, so the first element
>> should not be able to change and, thus, this should be safe.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bd89437faf06..473a08940fbc 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
>>   static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>>                       bool same_host_bridge)
>>   {
>> -    struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>>       const struct pci_p2pdma_whitelist_entry *entry;
>> +    struct pci_dev *root = host->bus->self;
>>       unsigned short vendor, device;
>>   +    /*
>> +     * This makes the assumption that the first device on the bus is the
>> +     * bridge itself and it has the devfn of 00.0. This assumption should
>> +     * hold for the devices in the white list above, and if there are cases
>> +     * where this isn't true they will have to be dealt with when such a
>> +     * case is added to the whitelist.
>
> Actually, it makes the assumption that the first device *in the list*
> (the host->bus-devices list) is 00.0.  The previous code made the
> assumption that you wrote.
>
> By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems
> really short. From a naive point of view, I'd expect that there must be
> a lot more CPUs/chipsets that can do pci p2p, what do you think? I
> wonder if we have to be so super strict, anyway. It just seems extremely
> limited, and I suspect there will be some additions to the list as soon
> as we start to use this.
>
>
>> +     */
>>       if (!root)
>> +        root = list_first_entry_or_null(&host->bus->devices,
>> +                        struct pci_dev, bus_list);
>
> OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating.
> Why is it OK to avoid taking any locks in order to retrieve the
> first entry from the list, but in order to retrieve any other entry, you
> have to aquire the pci_bus_sem, and get a reference as well? Something
> is inconsistent there.
>
> The new version here also no longer takes a reference on the device,
> which is also cheating. But I'm guessing that the unstated assumption
> here is that there is always at least one entry in the list. But if
> that's true, then it's better to show clearly that assumption, instead
> of hiding it in an implicit call that skips both locking and reference
> counting.
>
> You could add a new function, which is a cut-down version of pci_get_slot(),
> like this, and call this from __host_bridge_whitelist():
>
> /*
>  * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem
>  * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is
>  * up, it is safe to call this, because there will always be a top-level PCI
>  * root device.
>  *
>  * Other assumptions: the root device is the first device in the list, and the
>  * root device is numbered 00.0.
>  */
> struct pci_dev *pci_get_root_slot(struct pci_bus *bus)
> {
>     struct pci_dev *root;
>     unsigned devfn = PCI_DEVFN(0, 0);
>
>     root = list_first_entry_or_null(&bus->devices, struct pci_dev,
>                     bus_list);
>     if (root->devfn == devfn)
>         goto out;
>
... add a flag (set for p2pdma use)  to the function to print out what the root->devfn is, and what
the device is so the needed quirk &/or modification can added to handle when this assumption fails;
or make it a prdebug that can be flipped on for this failing situation, again, to add needed change to accomodate.

>     root = NULL;
>  out:
>     pci_dev_get(root);
>     return root;
> }
> EXPORT_SYMBOL(pci_get_root_slot);
>
> ...I think that's a lot clearer to the reader, about what's going on here.
>
> Note that I'm not really sure if it *is* safe, I would need to ask other
> PCIe subsystem developers with more experience. But I don't think anyone
> is trying to make p2pdma calls so early that PCIe buses are uninitialized.
>
>
>> +
>> +    if (!root || root->devfn)
>>           return false;
>>         vendor = root->vendor;
>>       device = root->device;
>> -    pci_dev_put(root);
and the reason to remove the dev_put is b/c it can sleep as well?
is that ok, given the dev_get that John put into the new pci_get_root_slot()?
... seems like a locking version with no get/put's is needed, or, fix the host-bridge setups so no !NULL self pointers.


>>         for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
>>           if (vendor != entry->vendor || device != entry->device)
>>
>
> thanks,


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set
  2021-05-02 19:58   ` John Hubbard
  2021-05-03 16:17     ` Logan Gunthorpe
@ 2021-05-11 16:05     ` Don Dutile
  1 sibling, 0 replies; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: John Hubbard, Logan Gunthorpe, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy

On 5/2/21 3:58 PM, John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> Attempt to find the mapping type for P2PDMA pages on the first
>> DMA map attempt if it has not been done ahead of time.
>>
>> Previously, the mapping type was expected to be calculated ahead of
>> time, but if pages are to come from userspace then there's no
>> way to ensure the path was checked ahead of time.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c | 12 +++++++++---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 473a08940fbc..2574a062a255 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -825,11 +825,18 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>>   static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
>>                               struct pci_dev *client)
>>   {
>> +    enum pci_p2pdma_map_type ret;
>> +
>>       if (!provider->p2pdma)
>>           return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>>   -    return xa_to_value(xa_load(&provider->p2pdma->map_types,
>> -                   map_types_idx(client)));
>> +    ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
>> +                  map_types_idx(client)));
>> +    if (ret != PCI_P2PDMA_MAP_UNKNOWN)
>> +        return ret;
>> +
>> +    return upstream_bridge_distance_warn(provider, client, NULL,
>> +                         GFP_ATOMIC);
>
> Returning a "bridge distance" from a "get map type" routine is jarring,
> and I think it is because of a pre-existing problem: the above function
> is severely misnamed. Let's try renaming it (and the other one) to
> approximately:
>
>     upstream_bridge_map_type_warn()
>     upstream_bridge_map_type()
>
> ...and that should fix that. Well, that, plus tweaking the kernel doc
> comments, which are also confused. I think someone started off thinking
> about distances through PCIe, but in the end, the routine boils down to
> just a few situations that are not distances at all.
>
+1. didn't like the 'distance' check  for a 'connection check" in the beginning, and looks like this is the time to clean it out.
:)

> Also, the above will read a little better if it is written like this:
>
>     ret = xa_to_value(xa_load(&provider->p2pdma->map_types,
>                   map_types_idx(client)));
>
>     if (ret == PCI_P2PDMA_MAP_UNKNOWN)
>         ret = upstream_bridge_map_type_warn(provider, client, NULL,
>                             GFP_ATOMIC);
>
>     return ret;
>
>
>>   }
>>     static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
>> @@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>       case PCI_P2PDMA_MAP_BUS_ADDR:
>>           return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>>       default:
>> -        WARN_ON_ONCE(1);
>
> Why? Or at least, why, in this patch? It looks like an accidental
> leftover from something, seeing as how it is not directly related to the
> patch, and is not mentioned at all.
>
>
> thanks,


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()
  2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
                     ` (2 preceding siblings ...)
  2021-05-02 21:23   ` John Hubbard
@ 2021-05-11 16:05   ` Don Dutile
  3 siblings, 0 replies; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:05 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> dma_map_sg() either returns a positive number indicating the number
> of entries mapped or zero indicating that resources were not available
> to create the mapping. When zero is returned, it is always safe to retry
> the mapping later once resources have been freed.
>
> Once P2PDMA pages are mixed into the SGL there may be pages that may
> never be successfully mapped with a given device because that device may
> not actually be able to access those pages. Thus, multiple error
> conditions will need to be distinguished to determine weather a retry
s/weather/whether/
> is safe.
>
> Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
> convention from dma_map_sg(). The function will return a positive
> integer on success or a negative errno on failure.
>
> ENOMEM will be used to indicate a resource failure and EREMOTEIO to
> indicate that a P2PDMA page is not mappable.
>
> The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
> level implementations that P2PDMA pages are allowed and to warn if a
> caller introduces them into the regular dma_map_sg() interface.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
John caught any other comment I had (and more).
-dd

> ---
>   include/linux/dma-mapping.h | 15 +++++++++++
>   kernel/dma/mapping.c        | 52 ++++++++++++++++++++++++++++++++-----
>   2 files changed, 61 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 2a984cb4d1e0..50b8f586cf59 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -60,6 +60,12 @@
>    * at least read-only at lesser-privileged levels).
>    */
>   #define DMA_ATTR_PRIVILEGED		(1UL << 9)
> +/*
> + * __DMA_ATTR_PCI_P2PDMA: This should not be used directly, use
> + * dma_map_sg_p2pdma() instead. Used internally to indicate that the
> + * caller is using the dma_map_sg_p2pdma() interface.
> + */
> +#define __DMA_ATTR_PCI_P2PDMA		(1UL << 10)
>   
>   /*
>    * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
> @@ -107,6 +113,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs);
>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
> +int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   				      int nents, enum dma_data_direction dir,
>   				      unsigned long attrs);
> @@ -160,6 +168,12 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   {
>   	return 0;
>   }
> +static inline int dma_map_sg_p2pdma_attrs(struct device *dev,
> +		struct scatterlist *sg, int nents, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	return 0;
> +}
>   static inline void dma_unmap_sg_attrs(struct device *dev,
>   		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   		unsigned long attrs)
> @@ -392,6 +406,7 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
>   #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
>   #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
>   #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
> +#define dma_map_sg_p2pdma(d, s, n, r) dma_map_sg_p2pdma_attrs(d, s, n, r, 0)
>   #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
>   #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
>   #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index b6a633679933..923089c4267b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -177,12 +177,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   }
>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>   
> -/*
> - * dma_maps_sg_attrs returns 0 on error and > 0 on success.
> - * It should never return a value < 0.
> - */
> -int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> -		enum dma_data_direction dir, unsigned long attrs)
> +static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	int ents;
> @@ -197,6 +193,20 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>   	else
>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> +
> +	return ents;
> +}
> +
> +/*
> + * dma_maps_sg_attrs returns 0 on error and > 0 on success.
> + * It should never return a value < 0.
> + */
> +int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
> +		enum dma_data_direction dir, unsigned long attrs)
> +{
> +	int ents;
> +
> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>   	BUG_ON(ents < 0);
>   	debug_dma_map_sg(dev, sg, nents, ents, dir);
>   
> @@ -204,6 +214,36 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   }
>   EXPORT_SYMBOL(dma_map_sg_attrs);
>   
> +/*
> + * like dma_map_sg_attrs, but returns a negative errno on error (and > 0
> + * on success). This function must be used if PCI P2PDMA pages might
> + * be in the scatterlist.
> + *
> + * On error this function may return:
> + *    -ENOMEM indicating that there was not enough resources available and
> + *      the transfer may be retried later
> + *    -EREMOTEIO indicating that P2PDMA pages were included but cannot
> + *      be mapped by the specified device, retries will always fail
> + *
> + * The scatterlist should be unmapped with the regular dma_unmap_sg[_attrs]().
> + */
> +int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
> +		int nents, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	int ents;
> +
> +	ents = __dma_map_sg_attrs(dev, sg, nents, dir,
> +				  attrs | __DMA_ATTR_PCI_P2PDMA);
> +	if (!ents)
> +		ents = -ENOMEM;
> +
> +	if (ents > 0)
> +		debug_dma_map_sg(dev, sg, nents, ents, dir);
> +
> +	return ents;
> +}
> +EXPORT_SYMBOL_GPL(dma_map_sg_p2pdma_attrs);
> +
>   void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   				      int nents, enum dma_data_direction dir,
>   				      unsigned long attrs)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
  2021-05-02 22:44   ` John Hubbard
@ 2021-05-11 16:06   ` Don Dutile
  2021-05-11 16:17     ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:06 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual iommu mapping.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/pci/p2pdma.c       | 34 +++++++++++++++++++++++-----------
>   include/linux/pci-p2pdma.h | 15 +++++++++++++++
>   2 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bcb1a6d6119d..38c93f57a941 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -20,13 +20,6 @@
>   #include <linux/seq_buf.h>
>   #include <linux/xarray.h>
>   
> -enum pci_p2pdma_map_type {
> -	PCI_P2PDMA_MAP_UNKNOWN = 0,
> -	PCI_P2PDMA_MAP_NOT_SUPPORTED,
> -	PCI_P2PDMA_MAP_BUS_ADDR,
> -	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> -};
> -
>   struct pci_p2pdma {
>   	struct gen_pool *pool;
>   	bool p2pmem_published;
> @@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>   }
>   EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>   
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> -						    struct device *dev)
> +/**
> + * pci_p2pdma_map_type - return the type of mapping that should be used for
> + *	a given device and pgmap
> + * @pgmap: the pagemap of a page to determine the mapping type for
> + * @dev: device that is mapping the page
> + * @dma_attrs: the attributes passed to the dma_map operation --
> + *	this is so they can be checked to ensure P2PDMA pages were not
> + *	introduced into an incorrect interface (like dma_map_sg). *
> + *
> + * Returns one of:
> + *	PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
> + *	PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
> + *	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
> + */
I'd recommend putting these descriptions in the enum's in pci-p2pdma.h .
Also, can you use a better description for THRU_HOST_BRIDGE -- it leaves the reader wondering what 'done directly' means.

Thanks.
-dd

> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +		struct device *dev, unsigned long dma_attrs)
>   {
>   	struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   	enum pci_p2pdma_map_type ret;
>   	struct pci_dev *client;
>   
> +	WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
> +		  "PCI P2PDMA pages were mapped with dma_map_sg!");
> +
>   	if (!provider->p2pdma)
>   		return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   
> @@ -879,7 +889,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	struct pci_p2pdma_pagemap *p2p_pgmap =
>   		to_p2p_pgmap(sg_page(sg)->pgmap);
>   
> -	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
> +	switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
> +				    __DMA_ATTR_PCI_P2PDMA)) {
>   	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>   		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>   	case PCI_P2PDMA_MAP_BUS_ADDR:
> @@ -904,7 +915,8 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   {
>   	enum pci_p2pdma_map_type map_type;
>   
> -	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
> +	map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
> +				       __DMA_ATTR_PCI_P2PDMA);
>   
>   	if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
>   		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..a06072ac3a52 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -16,6 +16,13 @@
>   struct block_device;
>   struct scatterlist;
>   
> +enum pci_p2pdma_map_type {
> +	PCI_P2PDMA_MAP_UNKNOWN = 0,
> +	PCI_P2PDMA_MAP_NOT_SUPPORTED,
> +	PCI_P2PDMA_MAP_BUS_ADDR,
> +	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> +};
> +
>   #ifdef CONFIG_PCI_P2PDMA
>   int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   		u64 offset);
> @@ -30,6 +37,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
>   					 unsigned int *nents, u32 length);
>   void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
>   void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +		struct device *dev, unsigned long dma_attrs);
>   int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs);
>   void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> @@ -83,6 +92,12 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
>   static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>   {
>   }
> +static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
> +		struct dev_pagemap *pgmap, struct device *dev,
> +		unsigned long dma_attrs)
> +{
> +	return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> +}
>   static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
>   		struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   		unsigned long attrs)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
  2021-05-03  0:32   ` John Hubbard
@ 2021-05-11 16:06   ` Don Dutile
  2021-05-11 16:19     ` Logan Gunthorpe
  1 sibling, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:06 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> Add a flags member to the dma_map_ops structure with one flag to
> indicate support for PCI P2PDMA.
>
> Also, add a helper to check if a device supports PCI P2PDMA.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   include/linux/dma-map-ops.h |  3 +++
>   include/linux/dma-mapping.h |  5 +++++
>   kernel/dma/mapping.c        | 18 ++++++++++++++++++
>   3 files changed, 26 insertions(+)
>
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 51872e736e7b..481892822104 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -12,6 +12,9 @@
>   struct cma;
>   
>   struct dma_map_ops {
> +	unsigned int flags;
> +#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
> +
I'm not a fan of in-line define's; if we're going to add a flags field to the dma-ops
-- and logically it'd be good to have p2pdma go through the dma-ops struct --
then let's move this up in front of the dma-ops description.

And now that the dma-ops struct is being 'opened' for p2pdma, should p2pdma ops be added
to this struct, so all this work can be mimic'd/reflected/leveraged/refactored for CXL, GenZ, etc. p2pdma in (the near?) future?

>   	void *(*alloc)(struct device *dev, size_t size,
>   			dma_addr_t *dma_handle, gfp_t gfp,
>   			unsigned long attrs);
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 50b8f586cf59..c31980ecca62 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>   		unsigned long attrs);
>   bool dma_can_mmap(struct device *dev);
>   int dma_supported(struct device *dev, u64 mask);
> +bool dma_pci_p2pdma_supported(struct device *dev);
>   int dma_set_mask(struct device *dev, u64 mask);
>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>   u64 dma_get_required_mask(struct device *dev);
> @@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 mask)
>   {
>   	return 0;
>   }
> +static inline bool dma_pci_p2pdma_supported(struct device *dev)
> +{
> +	return 0;
> +}
>   static inline int dma_set_mask(struct device *dev, u64 mask)
>   {
>   	return -EIO;
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 923089c4267b..ce44a0fcc4e5 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
>   }
>   EXPORT_SYMBOL(dma_supported);
>   
> +bool dma_pci_p2pdma_supported(struct device *dev)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> +	/* if ops is not set, dma direct will be used which supports P2PDMA */
> +	if (!ops)
> +		return true;
So, this means one cannot have p2pdma with IOMMU's? ...
-- or is this 'for now' and this may change?  if it may change, add a note here.

> +
> +	/*
> +	 * Note: dma_ops_bypass is not checked here because P2PDMA should
> +	 * not be used with dma mapping ops that do not have support even
> +	 * if the specific device is bypassing them.
> +	 */
> +
> +	return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
that's a bool?

> +}
> +EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
> +
>   #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
>   void arch_dma_set_mask(struct device *dev, u64 mask);
>   #else


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
  2021-04-27 19:43   ` Jason Gunthorpe
  2021-05-03  1:14   ` John Hubbard
@ 2021-05-11 16:06   ` Don Dutile
  2021-05-11 16:35     ` Logan Gunthorpe
  2 siblings, 1 reply; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:06 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy

On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
> When a PCI P2PDMA page is seen, set the IOVA length of the segment
> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
> apply the appropriate bus address to the segment. The IOVA is not
> created if the scatterlist only consists of P2PDMA pages.
>
> Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
> indicate bus address segments. On unmap, P2PDMA segments are skipped
> over when determining the start and end IOVA addresses.
>
> With this change, the flags variable in the dma_map_ops is
> set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
> P2PDMA pages.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
So, this code prevents use of p2pdma using an IOMMU, which wasn't checked and
short-circuited by other checks to use dma-direct?

So my overall comment to this code & related comments is that it should be sprinkled
with notes like "doesn't support IOMMU" and / or "TODO" when/if IOMMU is to be supported.
Or, if IOMMU-based p2pdma isn't supported in these routines directly, where/how they will be supported?

> ---
>   drivers/iommu/dma-iommu.c | 66 ++++++++++++++++++++++++++++++++++-----
>   1 file changed, 58 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index af765c813cc8..ef49635f9819 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -20,6 +20,7 @@
>   #include <linux/mm.h>
>   #include <linux/mutex.h>
>   #include <linux/pci.h>
> +#include <linux/pci-p2pdma.h>
>   #include <linux/swiotlb.h>
>   #include <linux/scatterlist.h>
>   #include <linux/vmalloc.h>
> @@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
>   		sg_dma_address(s) = DMA_MAPPING_ERROR;
>   		sg_dma_len(s) = 0;
>   
> +		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
> +			if (i > 0)
> +				cur = sg_next(cur);
> +
> +			pci_p2pdma_map_bus_segment(s, cur);
> +			count++;
> +			cur_len = 0;
> +			continue;
> +		}
> +
>   		/*
>   		 * Now fill in the real DMA data. If...
>   		 * - there is a valid output segment to append to
> @@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct scatterlist *s, *prev = NULL;
>   	int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
> +	struct dev_pagemap *pgmap = NULL;
> +	enum pci_p2pdma_map_type map_type;
>   	dma_addr_t iova;
>   	size_t iova_len = 0;
>   	unsigned long mask = dma_get_seg_boundary(dev);
> -	int i;
> +	int i, ret = 0;
>   
>   	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
>   	    iommu_deferred_attach(dev, domain))
> @@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		s_length = iova_align(iovad, s_length + s_iova_off);
>   		s->length = s_length;
>   
> +		if (is_pci_p2pdma_page(sg_page(s))) {
> +			if (sg_page(s)->pgmap != pgmap) {
> +				pgmap = sg_page(s)->pgmap;
> +				map_type = pci_p2pdma_map_type(pgmap, dev,
> +							       attrs);
> +			}
> +
> +			switch (map_type) {
> +			case PCI_P2PDMA_MAP_BUS_ADDR:
> +				/*
> +				 * A zero length will be ignored by
> +				 * iommu_map_sg() and then can be detected
> +				 * in __finalise_sg() to actually map the
> +				 * bus address.
> +				 */
> +				s->length = 0;
> +				continue;

> +			case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> +				break;
So, this 'short-circuits' the use of the IOMMU, silently?
This seems ripe for users to enable IOMMU for secure computing reasons, and using/enabling p2pdma,
and not realizing that it isn't as secure as 1+1=2  appears to be.
If my understanding is wrong, please point me to the Documentation or code that corrects this mis-understanding.  I could have missed a warning when both are enabled in a past patch set.
Thanks.
--dd
> +			default:
> +				ret = -EREMOTEIO;
> +				goto out_restore_sg;
> +			}
> +		}
> +
>   		/*
>   		 * Due to the alignment of our single IOVA allocation, we can
>   		 * depend on these assumptions about the segment boundary mask:
> @@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		prev = s;
>   	}
>   
> +	if (!iova_len)
> +		return __finalise_sg(dev, sg, nents, 0);
> +
>   	iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
>   	if (!iova)
>   		goto out_restore_sg;
> @@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	iommu_dma_free_iova(cookie, iova, iova_len, NULL);
>   out_restore_sg:
>   	__invalidate_sg(sg, nents);
> -	return 0;
> +	return ret;
>   }
>   
>   static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	dma_addr_t start, end;
> +	dma_addr_t end, start = DMA_MAPPING_ERROR;
>   	struct scatterlist *tmp;
>   	int i;
>   
> @@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   	 * The scatterlist segments are mapped into a single
>   	 * contiguous IOVA allocation, so this is incredibly easy.
>   	 */
> -	start = sg_dma_address(sg);
> -	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
> +	for_each_sg(sg, tmp, nents, i) {
> +		if (sg_is_pci_p2pdma(tmp)) {
> +			sg_unmark_pci_p2pdma(tmp);
> +			continue;
> +		}
>   		if (sg_dma_len(tmp) == 0)
>   			break;
> -		sg = tmp;
> +
> +		if (start == DMA_MAPPING_ERROR)
> +			start = sg_dma_address(tmp);
> +
> +		end = sg_dma_address(tmp) + sg_dma_len(tmp);
>   	}
> -	end = sg_dma_address(sg) + sg_dma_len(sg);
> -	__iommu_dma_unmap(dev, start, end - start);
> +
> +	if (start != DMA_MAPPING_ERROR)
> +		__iommu_dma_unmap(dev, start, end - start);
>   }
>   
overall, fiddling with the generic dma-iommu code instead of using a dma-ops-based, p2pdma function that has it carved out and separated/refactored out to be cleaner seems less complicated, but I'm guessing you tried that and it was too complicated to do?
--dd

>   static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
> @@ -1254,6 +1303,7 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
>   }
>   
>   static const struct dma_map_ops iommu_dma_ops = {
> +	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
wait, it's a const that's always turned on?
shouldn't the define for this flag be 0 for non-p2pdma configs?

>   	.alloc			= iommu_dma_alloc,
>   	.free			= iommu_dma_free,
>   	.alloc_pages		= dma_common_alloc_pages,


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-11 16:05     ` Don Dutile
@ 2021-05-11 16:12       ` Logan Gunthorpe
  2021-05-11 16:23         ` Don Dutile
  0 siblings, 1 reply; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:12 UTC (permalink / raw)
  To: Don Dutile, John Hubbard, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas



On 2021-05-11 10:05 a.m., Don Dutile wrote:
> On 5/1/21 11:58 PM, John Hubbard wrote:
>> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>>> In order to call upstream_bridge_distance_warn() from a dma_map function,
>>> it must not sleep. The only reason it does sleep is to allocate the seqbuf
>>> to print which devices are within the ACS path.
>>>
>>> Switch the kmalloc call to use a passed in gfp_mask and don't print that
>>> message if the buffer fails to be allocated.
>>>
>>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>> ---
>>>   drivers/pci/p2pdma.c | 21 +++++++++++----------
>>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>>> index 196382630363..bd89437faf06 100644
>>> --- a/drivers/pci/p2pdma.c
>>> +++ b/drivers/pci/p2pdma.c
>>> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>>>     static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
>>>   {
>>> -    if (!buf)
>>> +    if (!buf || !buf->buffer)
>>
>> This is not great, sort of from an overall design point of view, even though
>> it makes the rest of the patch work. See below for other ideas, that will
>> avoid the need for this sort of odd point fix.
>>
> +1.
> In fact, I didn't see how the kmalloc was changed... you refactored the code to pass-in the
> GFP_KERNEL that was originally hard-coded into upstream_bridge_distance_warn();
> I don't see how that avoided the kmalloc() call.
> in fact, I also see you lost a failed kmalloc() check, so it seems to have taken a step back.

I've changed this in v2 to just use some memory allocated on the stack.
Avoids this argument all together.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-11 16:05   ` Don Dutile
@ 2021-05-11 16:14     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:14 UTC (permalink / raw)
  To: Don Dutile, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-05-11 10:05 a.m., Don Dutile wrote:
> On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
>> In order to use upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
>> might sleep.
>>
>> In order to avoid this, try to get the host bridge's device from
>> bus->self, and if that is not set, just get the first element in the
>> device list. It should be impossible for the host bridge's device to
>> go away while references are held on child devices, so the first element
>> should not be able to change and, thus, this should be safe.
> Bjorn:
> Why wouldn't (shouldn't?) the bus->self field be set for a host bridge device?
> Should this situation be repaired in the host-brige config/setup code elsewhere in the kernel.
> ... and here, a check-and-fail with info of what doesn't have it setup (another new pci function to do the check & prinfo), so it can point to the offending host-bridge, and thus, the code that needs to be updated?

I've dropped the bus->self thing in v2. Seems bus->self is explicitly
unset for root bridges. There's remnants in the pci code that used to
check bus->self to see if the bridge is the root bridge.

I tried setting bus->self with the pci device of the root bridge but
that just caused my machine not to boot and I didn't dig any further.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  2021-05-11 16:05     ` Don Dutile
@ 2021-05-11 16:16       ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:16 UTC (permalink / raw)
  To: Don Dutile, John Hubbard, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy



On 2021-05-11 10:05 a.m., Don Dutile wrote:
> ... add a flag (set for p2pdma use)  to the function to print out what the root->devfn is, and what
> the device is so the needed quirk &/or modification can added to handle when this assumption fails;
> or make it a prdebug that can be flipped on for this failing situation, again, to add needed change to accomodate.

Good idea! Will add.

>>     root = NULL;
>>  out:
>>     pci_dev_get(root);
>>     return root;
>> }
>> EXPORT_SYMBOL(pci_get_root_slot);
>>
>> ...I think that's a lot clearer to the reader, about what's going on here.
>>
>> Note that I'm not really sure if it *is* safe, I would need to ask other
>> PCIe subsystem developers with more experience. But I don't think anyone
>> is trying to make p2pdma calls so early that PCIe buses are uninitialized.
>>
>>
>>> +
>>> +    if (!root || root->devfn)
>>>           return false;
>>>         vendor = root->vendor;
>>>       device = root->device;
>>> -    pci_dev_put(root);
> and the reason to remove the dev_put is b/c it can sleep as well?
> is that ok, given the dev_get that John put into the new pci_get_root_slot()?
> ... seems like a locking version with no get/put's is needed, or, fix the host-bridge setups so no !NULL self pointers.

The dev_get is redundant here seeing we hold references to child
devices. It was only in the previous code because we were using
pci_get_slot() to get the device which did the get for us.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  2021-05-11 16:06   ` Don Dutile
@ 2021-05-11 16:17     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:17 UTC (permalink / raw)
  To: Don Dutile, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-05-11 10:06 a.m., Don Dutile wrote:
> On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
>> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
>> implementation because it will need to determine the mapping type
>> ahead of actually doing the mapping to create the actual iommu mapping.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c       | 34 +++++++++++++++++++++++-----------
>>   include/linux/pci-p2pdma.h | 15 +++++++++++++++
>>   2 files changed, 38 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bcb1a6d6119d..38c93f57a941 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -20,13 +20,6 @@
>>   #include <linux/seq_buf.h>
>>   #include <linux/xarray.h>
>>   
>> -enum pci_p2pdma_map_type {
>> -	PCI_P2PDMA_MAP_UNKNOWN = 0,
>> -	PCI_P2PDMA_MAP_NOT_SUPPORTED,
>> -	PCI_P2PDMA_MAP_BUS_ADDR,
>> -	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
>> -};
>> -
>>   struct pci_p2pdma {
>>   	struct gen_pool *pool;
>>   	bool p2pmem_published;
>> @@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>>   }
>>   EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>>   
>> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>> -						    struct device *dev)
>> +/**
>> + * pci_p2pdma_map_type - return the type of mapping that should be used for
>> + *	a given device and pgmap
>> + * @pgmap: the pagemap of a page to determine the mapping type for
>> + * @dev: device that is mapping the page
>> + * @dma_attrs: the attributes passed to the dma_map operation --
>> + *	this is so they can be checked to ensure P2PDMA pages were not
>> + *	introduced into an incorrect interface (like dma_map_sg). *
>> + *
>> + * Returns one of:
>> + *	PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
>> + *	PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
>> + *	PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
>> + */
> I'd recommend putting these descriptions in the enum's in pci-p2pdma.h .
> Also, can you use a better description for THRU_HOST_BRIDGE -- it leaves the reader wondering what 'done directly' means.

Will do.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  2021-05-11 16:06   ` Don Dutile
@ 2021-05-11 16:19     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:19 UTC (permalink / raw)
  To: Don Dutile, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-05-11 10:06 a.m., Don Dutile wrote:
> On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
>> Add a flags member to the dma_map_ops structure with one flag to
>> indicate support for PCI P2PDMA.
>>
>> Also, add a helper to check if a device supports PCI P2PDMA.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   include/linux/dma-map-ops.h |  3 +++
>>   include/linux/dma-mapping.h |  5 +++++
>>   kernel/dma/mapping.c        | 18 ++++++++++++++++++
>>   3 files changed, 26 insertions(+)
>>
>> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
>> index 51872e736e7b..481892822104 100644
>> --- a/include/linux/dma-map-ops.h
>> +++ b/include/linux/dma-map-ops.h
>> @@ -12,6 +12,9 @@
>>   struct cma;
>>   
>>   struct dma_map_ops {
>> +	unsigned int flags;
>> +#define DMA_F_PCI_P2PDMA_SUPPORTED     (1 << 0)
>> +
> I'm not a fan of in-line define's; if we're going to add a flags field to the dma-ops
> -- and logically it'd be good to have p2pdma go through the dma-ops struct --
> then let's move this up in front of the dma-ops description.

Already changed for v2.

> And now that the dma-ops struct is being 'opened' for p2pdma, should p2pdma ops be added
> to this struct, so all this work can be mimic'd/reflected/leveraged/refactored for CXL, GenZ, etc. p2pdma in (the near?) future?

v2 no longer has a specific op for p2pdma. We are now using
dma_map_sgtable() which already has the error return we need.

I think any work to support CXL, GenZ, etc will need to be done when
they add their own support. I can't and shouldn't guess at their needs now.

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  2021-05-11 16:12       ` Logan Gunthorpe
@ 2021-05-11 16:23         ` Don Dutile
  0 siblings, 0 replies; 99+ messages in thread
From: Don Dutile @ 2021-05-11 16:23 UTC (permalink / raw)
  To: Logan Gunthorpe, John Hubbard, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, Matthew Wilcox, Daniel Vetter,
	Jakowski Andrzej, Minturn Dave B, Jason Ekstrand, Dave Hansen,
	Xiong Jianxin, Bjorn Helgaas, Ira Weiny, Robin Murphy,
	Bjorn Helgaas

On 5/11/21 12:12 PM, Logan Gunthorpe wrote:
>
> On 2021-05-11 10:05 a.m., Don Dutile wrote:
>> On 5/1/21 11:58 PM, John Hubbard wrote:
>>> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>>>> In order to call upstream_bridge_distance_warn() from a dma_map function,
>>>> it must not sleep. The only reason it does sleep is to allocate the seqbuf
>>>> to print which devices are within the ACS path.
>>>>
>>>> Switch the kmalloc call to use a passed in gfp_mask and don't print that
>>>> message if the buffer fails to be allocated.
>>>>
>>>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> ---
>>>>    drivers/pci/p2pdma.c | 21 +++++++++++----------
>>>>    1 file changed, 11 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>>>> index 196382630363..bd89437faf06 100644
>>>> --- a/drivers/pci/p2pdma.c
>>>> +++ b/drivers/pci/p2pdma.c
>>>> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>>>>      static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
>>>>    {
>>>> -    if (!buf)
>>>> +    if (!buf || !buf->buffer)
>>> This is not great, sort of from an overall design point of view, even though
>>> it makes the rest of the patch work. See below for other ideas, that will
>>> avoid the need for this sort of odd point fix.
>>>
>> +1.
>> In fact, I didn't see how the kmalloc was changed... you refactored the code to pass-in the
>> GFP_KERNEL that was originally hard-coded into upstream_bridge_distance_warn();
>> I don't see how that avoided the kmalloc() call.
>> in fact, I also see you lost a failed kmalloc() check, so it seems to have taken a step back.
> I've changed this in v2 to just use some memory allocated on the stack.
> Avoids this argument all together.
>
> Logan
>
Looking fwd to the v2; again, my apologies for the delay, and the redundancy it's adding to your feedback review & changes.
-Don


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  2021-05-11 16:06   ` Don Dutile
@ 2021-05-11 16:35     ` Logan Gunthorpe
  0 siblings, 0 replies; 99+ messages in thread
From: Logan Gunthorpe @ 2021-05-11 16:35 UTC (permalink / raw)
  To: Don Dutile, linux-kernel, linux-nvme, linux-block, linux-pci,
	linux-mm, iommu
  Cc: Stephen Bates, Christoph Hellwig, Dan Williams, Jason Gunthorpe,
	Christian König, John Hubbard, Matthew Wilcox,
	Daniel Vetter, Jakowski Andrzej, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy



On 2021-05-11 10:06 a.m., Don Dutile wrote:
> On 4/8/21 1:01 PM, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
>>
>> Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
>> indicate bus address segments. On unmap, P2PDMA segments are skipped
>> over when determining the start and end IOVA addresses.
>>
>> With this change, the flags variable in the dma_map_ops is
>> set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
>> P2PDMA pages.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> So, this code prevents use of p2pdma using an IOMMU, which wasn't checked and
> short-circuited by other checks to use dma-direct?

No, not at all. This patch is adding support for p2pdma pages for IOMMUs
that use the dma-iommu abstraction. Other arch specific IOMMUs that
don't use the dma-iommu abstraction are left unsupported. Support would
need to be added to them, or better yet; they should be ported to dma-iommu.

> 
> So my overall comment to this code & related comments is that it should be sprinkled
> with notes like "doesn't support IOMMU" and / or "TODO" when/if IOMMU is to be supported.
> Or, if IOMMU-based p2pdma isn't supported in these routines directly, where/how they will be supported?
> 
>> ---
>>   drivers/iommu/dma-iommu.c | 66 ++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 58 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index af765c813cc8..ef49635f9819 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -20,6 +20,7 @@
>>   #include <linux/mm.h>
>>   #include <linux/mutex.h>
>>   #include <linux/pci.h>
>> +#include <linux/pci-p2pdma.h>
>>   #include <linux/swiotlb.h>
>>   #include <linux/scatterlist.h>
>>   #include <linux/vmalloc.h>
>> @@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
>>   		sg_dma_address(s) = DMA_MAPPING_ERROR;
>>   		sg_dma_len(s) = 0;
>>   
>> +		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
>> +			if (i > 0)
>> +				cur = sg_next(cur);
>> +
>> +			pci_p2pdma_map_bus_segment(s, cur);
>> +			count++;
>> +			cur_len = 0;
>> +			continue;
>> +		}
>> +
>>   		/*
>>   		 * Now fill in the real DMA data. If...
>>   		 * - there is a valid output segment to append to
>> @@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   	struct iova_domain *iovad = &cookie->iovad;
>>   	struct scatterlist *s, *prev = NULL;
>>   	int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
>> +	struct dev_pagemap *pgmap = NULL;
>> +	enum pci_p2pdma_map_type map_type;
>>   	dma_addr_t iova;
>>   	size_t iova_len = 0;
>>   	unsigned long mask = dma_get_seg_boundary(dev);
>> -	int i;
>> +	int i, ret = 0;
>>   
>>   	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
>>   	    iommu_deferred_attach(dev, domain))
>> @@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   		s_length = iova_align(iovad, s_length + s_iova_off);
>>   		s->length = s_length;
>>   
>> +		if (is_pci_p2pdma_page(sg_page(s))) {
>> +			if (sg_page(s)->pgmap != pgmap) {
>> +				pgmap = sg_page(s)->pgmap;
>> +				map_type = pci_p2pdma_map_type(pgmap, dev,
>> +							       attrs);
>> +			}
>> +
>> +			switch (map_type) {
>> +			case PCI_P2PDMA_MAP_BUS_ADDR:
>> +				/*
>> +				 * A zero length will be ignored by
>> +				 * iommu_map_sg() and then can be detected
>> +				 * in __finalise_sg() to actually map the
>> +				 * bus address.
>> +				 */
>> +				s->length = 0;
>> +				continue;
> 
>> +			case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>> +				break;
> So, this 'short-circuits' the use of the IOMMU, silently?
> This seems ripe for users to enable IOMMU for secure computing reasons, and using/enabling p2pdma,
> and not realizing that it isn't as secure as 1+1=2  appears to be.
> If my understanding is wrong, please point me to the Documentation or code that corrects this mis-understanding.  I could have missed a warning when both are enabled in a past patch set.


Yes, you've misunderstood this. Part of this dovetails with your comment
about the documentation for PCI_P2PDMA_MAP_THRU_HOST_BRIDGE.

This does not short circuit the IOMMU in any way. THRU_HOST_BRIDGE mode
means the TLPs for this transaction will hit the CPU/HOST BRIDGE and
thus the IOMMU has to be involved. In this case the IOMMU is programmed
with the physical address of the memory (which is normal) and everything
works.

One could argue the PCI_P2PDMA_MAP_BUS_ADDR is short circuiting the
IOMMU by using PCI bus address in the DMA transaction. But this requires
the user to do special setup with the ACS bits ahead of time (not part
of this series).

For the user to use the BUS_ADDR with an IOMMU, they need to
specifically disable the ACS redirect bits on specific PCI switch bridge
ports using a kernel command line option. When they do this, the IOMMU
code will put those devices in the same IOMMU group thus making it
impossible for the user to use devices that can do P2PDMA transactions
together in different security domains.

This was all hashed out in the original P2PDMA patchset and does make sense.

>> +			default:
>> +				ret = -EREMOTEIO;
>> +				goto out_restore_sg;
>> +			}
>> +		}
>> +
>>   		/*
>>   		 * Due to the alignment of our single IOVA allocation, we can
>>   		 * depend on these assumptions about the segment boundary mask:
>> @@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   		prev = s;
>>   	}
>>   
>> +	if (!iova_len)
>> +		return __finalise_sg(dev, sg, nents, 0);
>> +
>>   	iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
>>   	if (!iova)
>>   		goto out_restore_sg;
>> @@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   	iommu_dma_free_iova(cookie, iova, iova_len, NULL);
>>   out_restore_sg:
>>   	__invalidate_sg(sg, nents);
>> -	return 0;
>> +	return ret;
>>   }
>>   
>>   static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>>   {
>> -	dma_addr_t start, end;
>> +	dma_addr_t end, start = DMA_MAPPING_ERROR;
>>   	struct scatterlist *tmp;
>>   	int i;
>>   
>> @@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>>   	 * The scatterlist segments are mapped into a single
>>   	 * contiguous IOVA allocation, so this is incredibly easy.
>>   	 */
>> -	start = sg_dma_address(sg);
>> -	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
>> +	for_each_sg(sg, tmp, nents, i) {
>> +		if (sg_is_pci_p2pdma(tmp)) {
>> +			sg_unmark_pci_p2pdma(tmp);
>> +			continue;
>> +		}
>>   		if (sg_dma_len(tmp) == 0)
>>   			break;
>> -		sg = tmp;
>> +
>> +		if (start == DMA_MAPPING_ERROR)
>> +			start = sg_dma_address(tmp);
>> +
>> +		end = sg_dma_address(tmp) + sg_dma_len(tmp);
>>   	}
>> -	end = sg_dma_address(sg) + sg_dma_len(sg);
>> -	__iommu_dma_unmap(dev, start, end - start);
>> +
>> +	if (start != DMA_MAPPING_ERROR)
>> +		__iommu_dma_unmap(dev, start, end - start);
>>   }
>>   
> overall, fiddling with the generic dma-iommu code instead of using a dma-ops-based, p2pdma function that has it carved out and separated/refactored out to be cleaner seems less complicated, but I'm guessing you tried that and it was too complicated to do?

I don't think you've understood this code correctly. What it does can't
be done in the dma-ops.

>>   static const struct dma_map_ops iommu_dma_ops = {
>> +	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
> wait, it's a const that's always turned on?
> shouldn't the define for this flag be 0 for non-p2pdma configs?

All this flag is saying is that iommu_dma_map_sg() has support for
handling P2PDMA pages. Yes this is a const. The point is to reject it
for map_sg implementations that have not done the above work (ie.
arm_iommu_map_sg).

Hopefully, more of the arch-specific implementations will convert to the
generic dma-iommu code in time but those that don't simply won't support
P2PDMA until they do (or add their own support).

Logan

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2021-05-11 16:35 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
2021-05-02  3:58   ` John Hubbard
2021-05-03 15:57     ` Logan Gunthorpe
2021-05-03 18:17       ` John Hubbard
2021-05-03 18:20         ` Logan Gunthorpe
2021-05-03 18:23           ` John Hubbard
2021-05-03 18:24         ` Christoph Hellwig
2021-05-11 16:05     ` Don Dutile
2021-05-11 16:12       ` Logan Gunthorpe
2021-05-11 16:23         ` Don Dutile
2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
2021-05-02  5:35   ` John Hubbard
2021-05-03 16:08     ` Logan Gunthorpe
2021-05-03 18:20       ` John Hubbard
2021-05-03 18:25       ` Christoph Hellwig
2021-05-11 16:05     ` Don Dutile
2021-05-11 16:16       ` Logan Gunthorpe
2021-05-11 16:05   ` Don Dutile
2021-05-11 16:14     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
2021-05-02 19:58   ` John Hubbard
2021-05-03 16:17     ` Logan Gunthorpe
2021-05-03 18:22       ` John Hubbard
2021-05-03 18:35       ` Christoph Hellwig
2021-05-03 18:46         ` Logan Gunthorpe
2021-05-11 16:05     ` Don Dutile
2021-04-08 17:01 ` [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device Logan Gunthorpe
2021-05-02 20:41   ` John Hubbard
2021-05-03 16:30     ` Logan Gunthorpe
2021-05-03 18:31       ` John Hubbard
2021-05-03 18:56         ` Logan Gunthorpe
2021-05-03 21:54           ` John Hubbard
2021-05-03 22:57             ` Jason Gunthorpe
2021-05-03 23:40               ` John Hubbard
2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
2021-04-27 19:22   ` Jason Gunthorpe
2021-04-27 22:49     ` Logan Gunthorpe
2021-04-27 19:31   ` Jason Gunthorpe
2021-04-27 22:55     ` Logan Gunthorpe
2021-04-27 23:01       ` Jason Gunthorpe
2021-05-03 18:28         ` Christoph Hellwig
2021-05-03 18:31           ` Logan Gunthorpe
2021-05-02 21:23   ` John Hubbard
2021-05-03 16:38     ` Logan Gunthorpe
2021-05-11 16:05   ` Don Dutile
2021-04-08 17:01 ` [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
2021-05-02 22:34   ` John Hubbard
2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
2021-05-02 22:44   ` John Hubbard
2021-05-03 16:39     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:17     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
2021-05-02 22:52   ` John Hubbard
2021-05-03  0:50   ` John Hubbard
2021-05-03 17:15     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
2021-04-27 19:33   ` Jason Gunthorpe
2021-04-27 19:40     ` Jason Gunthorpe
2021-04-27 22:56       ` Logan Gunthorpe
2021-05-02 23:28   ` John Hubbard
2021-05-02 23:32     ` John Hubbard
2021-05-03 17:06       ` Logan Gunthorpe
2021-05-03 16:55     ` Logan Gunthorpe
2021-05-04  0:12       ` John Hubbard
2021-05-03 17:04     ` Logan Gunthorpe
2021-05-04  0:01       ` John Hubbard
2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
2021-05-03  0:32   ` John Hubbard
2021-05-03 17:09     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:19     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
2021-04-27 19:43   ` Jason Gunthorpe
2021-04-27 22:59     ` Logan Gunthorpe
2021-05-03  1:14   ` John Hubbard
2021-05-06 23:59     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:35     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
2021-05-03  1:29   ` John Hubbard
2021-05-03 17:17     ` Logan Gunthorpe
2021-05-04  0:17       ` John Hubbard
2021-04-08 17:01 ` [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages Logan Gunthorpe
2021-05-03  1:34   ` John Hubbard
2021-05-03 17:19     ` Logan Gunthorpe
2021-05-04  0:26       ` John Hubbard
2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
2021-04-27 19:47   ` Jason Gunthorpe
2021-04-27 22:59     ` Logan Gunthorpe
2021-05-03  1:37   ` John Hubbard
2021-04-08 17:01 ` [PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma() Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() Logan Gunthorpe
2021-04-27 19:28 ` [PATCH 00/16] Add new DMA mapping operation for P2PDMA Jason Gunthorpe
2021-04-27 20:21   ` John Hubbard
2021-04-27 20:48     ` Dan Williams
2021-05-02  1:22 ` John Hubbard
2021-05-11 16:05 ` Don Dutile

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).