[PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-18 12:36 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-18 12:36 UTC (permalink / raw)
  To: linux-kernel, gregkh
  Cc: sumit.semwal, christian.koenig, daniel.vetter, galpress, sleybo,
	dri-devel, Tomer Tayar

User process might want to share the device memory with another
driver/device, and to allow it to access it over PCIe (P2P).

To enable this, we utilize the dma-buf mechanism and add a dma-buf
exporter support, so the other driver can import the device memory and
access it.

The device memory is allocated using our existing allocation uAPI,
where the user will get a handle that represents the allocation.

The user will then need to call the new
uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.

The driver will return a FD that represents the DMA-BUF object that
was created to match that allocation.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
---
 include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index a47a731e4527..aa3d8e0ba060 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -808,6 +808,10 @@ union hl_wait_cs_args {
 #define HL_MEM_OP_UNMAP			3
 /* Opcode to map a hw block */
 #define HL_MEM_OP_MAP_BLOCK		4
+/* Opcode to create DMA-BUF object for an existing device memory allocation
+ * and to export an FD of that DMA-BUF back to the caller
+ */
+#define HL_MEM_OP_EXPORT_DMABUF_FD	5
 
 /* Memory flags */
 #define HL_MEM_CONTIGUOUS	0x1
@@ -878,11 +882,26 @@ struct hl_mem_in {
 			/* Virtual address returned from HL_MEM_OP_MAP */
 			__u64 device_virt_addr;
 		} unmap;
+
+		/* HL_MEM_OP_EXPORT_DMABUF_FD */
+		struct {
+			/* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
+			 * where we don't have MMU for the device memory, the
+			 * driver expects a physical address (instead of
+			 * a handle) in the device memory space.
+			 */
+			__u64 handle;
+			/* Size of memory allocation. Relevant only for GAUDI */
+			__u64 mem_size;
+		} export_dmabuf_fd;
 	};
 
 	/* HL_MEM_OP_* */
 	__u32 op;
-	/* HL_MEM_* flags */
+	/* HL_MEM_* flags.
+	 * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
+	 * DMA-BUF file/FD flags.
+	 */
 	__u32 flags;
 	/* Context ID - Currently not in use */
 	__u32 ctx_id;
@@ -919,6 +938,13 @@ struct hl_mem_out {
 
 			__u32 pad;
 		};
+
+		/* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
+		 * DMA-BUF object that was created to describe a memory
+		 * allocation on the device's memory space. The FD should be
+		 * passed to the importer driver
+		 */
+		__u64 fd;
 	};
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-18 12:36 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-18 12:36 UTC (permalink / raw)
  To: linux-kernel, gregkh
  Cc: daniel.vetter, sleybo, galpress, dri-devel, christian.koenig,
	Tomer Tayar

User process might want to share the device memory with another
driver/device, and to allow it to access it over PCIe (P2P).

To enable this, we utilize the dma-buf mechanism and add a dma-buf
exporter support, so the other driver can import the device memory and
access it.

The device memory is allocated using our existing allocation uAPI,
where the user will get a handle that represents the allocation.

The user will then need to call the new
uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.

The driver will return a FD that represents the DMA-BUF object that
was created to match that allocation.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
---
 include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index a47a731e4527..aa3d8e0ba060 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -808,6 +808,10 @@ union hl_wait_cs_args {
 #define HL_MEM_OP_UNMAP			3
 /* Opcode to map a hw block */
 #define HL_MEM_OP_MAP_BLOCK		4
+/* Opcode to create DMA-BUF object for an existing device memory allocation
+ * and to export an FD of that DMA-BUF back to the caller
+ */
+#define HL_MEM_OP_EXPORT_DMABUF_FD	5
 
 /* Memory flags */
 #define HL_MEM_CONTIGUOUS	0x1
@@ -878,11 +882,26 @@ struct hl_mem_in {
 			/* Virtual address returned from HL_MEM_OP_MAP */
 			__u64 device_virt_addr;
 		} unmap;
+
+		/* HL_MEM_OP_EXPORT_DMABUF_FD */
+		struct {
+			/* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
+			 * where we don't have MMU for the device memory, the
+			 * driver expects a physical address (instead of
+			 * a handle) in the device memory space.
+			 */
+			__u64 handle;
+			/* Size of memory allocation. Relevant only for GAUDI */
+			__u64 mem_size;
+		} export_dmabuf_fd;
 	};
 
 	/* HL_MEM_OP_* */
 	__u32 op;
-	/* HL_MEM_* flags */
+	/* HL_MEM_* flags.
+	 * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
+	 * DMA-BUF file/FD flags.
+	 */
 	__u32 flags;
 	/* Context ID - Currently not in use */
 	__u32 ctx_id;
@@ -919,6 +938,13 @@ struct hl_mem_out {
 
 			__u32 pad;
 		};
+
+		/* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
+		 * DMA-BUF object that was created to describe a memory
+		 * allocation on the device's memory space. The FD should be
+		 * passed to the importer driver
+		 */
+		__u64 fd;
 	};
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v3 2/2] habanalabs: add support for dma-buf exporter
  2021-06-18 12:36 ` Oded Gabbay
@ 2021-06-18 12:36   ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-18 12:36 UTC (permalink / raw)
  To: linux-kernel, gregkh
  Cc: sumit.semwal, christian.koenig, daniel.vetter, galpress, sleybo,
	dri-devel, Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

Implement the calls to the dma-buf kernel api to create a dma-buf
object backed by FD.

We block the option to mmap the DMA-BUF object because we don't support
DIRECT_IO and implicit P2P. We only implement support for explicit P2P
through importing the FD of the DMA-BUF.

In the export phase, we provide a static SG list to the DMA-BUF object
because in  Habanalabs's ASICs, the device memory is pinned and
immutable. Therefore, there is no need for dynamic mappings and pinning
callbacks.

Note that in GAUDI we don't have an MMU towards the device memory and
the user works on physical addresses. Therefore, the user doesn't pass
through the kernel driver to allocate memory there. As a result, only
for GAUDI we receive from the user a device memory physical address
(instead of a handle) and a size.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
Changes in v3:
 - remove calls to dev_dbg()
 - remove hl_dmabuf_wrapper.fd as it is only for debug prints
 - clear attachment->priv in the detach callback
 - add a comment to why an empty unmap_dma_buf callback is needed
 - modify exporting_cnt from atomic_t to u32 and add a lock when missing
 - remove hl_ctx.dmabuf_list which wasn't really used
 - replace dev_err with dev_err_ratelimited when checking user's inputs
 - read vm_type directly from phys_pg_pack instead of casting the struct

 drivers/misc/habanalabs/Kconfig             |   1 +
 drivers/misc/habanalabs/common/habanalabs.h |  24 ++
 drivers/misc/habanalabs/common/memory.c     | 400 +++++++++++++++++++-
 drivers/misc/habanalabs/gaudi/gaudi.c       |   1 +
 drivers/misc/habanalabs/goya/goya.c         |   1 +
 5 files changed, 424 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 293d79811372..c82d2e7b2035 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -8,6 +8,7 @@ config HABANA_AI
 	depends on PCI && HAS_IOMEM
 	select GENERIC_ALLOCATOR
 	select HWMON
+	select DMA_SHARED_BUFFER
 	help
 	  Enables PCIe card driver for Habana's AI Processors (AIP) that are
 	  designed to accelerate Deep Learning inference and training workloads.
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 09b89fdeba0b..0e418be61727 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -20,6 +20,7 @@
 #include <linux/scatterlist.h>
 #include <linux/hashtable.h>
 #include <linux/debugfs.h>
+#include <linux/dma-buf.h>
 #include <linux/bitfield.h>
 #include <linux/genalloc.h>
 #include <linux/sched/signal.h>
@@ -1290,6 +1291,25 @@ struct hl_pending_cb {
 	u32			hw_queue_id;
 };
 
+/**
+ * struct hl_dmabuf_wrapper - a dma-buf wrapper object.
+ * @dmabuf: pointer to dma-buf object.
+ * @ctx: pointer to the dma-buf owner's context.
+ * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported for
+ *                memory allocation handle.
+ * @sgt: scatter-gather table that holds the exported pages.
+ * @total_size: total size of all exported pages.
+ * @handle: allocation handle or physical address of the exported memory.
+ */
+struct hl_dmabuf_wrapper {
+	struct dma_buf			*dmabuf;
+	struct hl_ctx			*ctx;
+	struct hl_vm_phys_pg_pack	*phys_pg_pack;
+	struct sg_table			sgt;
+	u64				total_size;
+	u64				handle;
+};
+
 /**
  * struct hl_ctx - user/kernel context.
  * @mem_hash: holds mapping from virtual address to virtual memory area
@@ -1598,6 +1618,7 @@ struct hl_vm_hw_block_list_node {
  * @npages: num physical pages in the pack.
  * @total_size: total size of all the pages in this list.
  * @mapping_cnt: number of shared mappings.
+ * @exporting_cnt: number of dma-buf exporting.
  * @asid: the context related to this list.
  * @page_size: size of each page in the pack.
  * @flags: HL_MEM_* flags related to this list.
@@ -1612,6 +1633,7 @@ struct hl_vm_phys_pg_pack {
 	u64			npages;
 	u64			total_size;
 	atomic_t		mapping_cnt;
+	u32			exporting_cnt;
 	u32			asid;
 	u32			page_size;
 	u32			flags;
@@ -2137,6 +2159,7 @@ struct hl_mmu_funcs {
  *                          the error will be ignored by the driver during
  *                          device initialization. Mainly used to debug and
  *                          workaround firmware bugs
+ * @dram_pci_bar_start: start bus address of PCIe bar towards DRAM.
  * @last_successful_open_jif: timestamp (jiffies) of the last successful
  *                            device open.
  * @last_open_session_duration_jif: duration (jiffies) of the last device open
@@ -2266,6 +2289,7 @@ struct hl_device {
 	u64				max_power;
 	u64				clock_gating_mask;
 	u64				boot_error_status_mask;
+	u64				dram_pci_bar_start;
 	u64				last_successful_open_jif;
 	u64				last_open_session_duration_jif;
 	u64				open_counter;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index af339ce1ab4f..45d663f746a3 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -15,7 +15,14 @@
 #define HL_MMU_DEBUG	0
 
 /* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
-#define DRAM_POOL_PAGE_SIZE SZ_8M
+#define DRAM_POOL_PAGE_SIZE		SZ_8M
+
+/* dma-buf alignment requirements when exporting memory with address/size */
+#define DMA_BUF_MEM_ADDR_ALIGNMENT	SZ_32M
+#define DMA_BUF_MEM_SIZE_ALIGNMENT	SZ_32M
+
+/* dma-buf chunk size cannot exceed the scatterlist "unsigned int" length */
+#define DMA_BUF_CHUNK_MAX_SIZE		SZ_512M
 
 /*
  * The va ranges in context object contain a list with the available chunks of
@@ -347,6 +354,13 @@ static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
 			return -EINVAL;
 		}
 
+		if (phys_pg_pack->exporting_cnt) {
+			dev_err(hdev->dev,
+				"handle %u is exported, cannot free\n",	handle);
+			spin_unlock(&vm->idr_lock);
+			return -EINVAL;
+		}
+
 		/*
 		 * must remove from idr before the freeing of the physical
 		 * pages as the refcount of the pool is also the trigger of the
@@ -1411,13 +1425,367 @@ int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int hl_dmabuf_attach(struct dma_buf *dmabuf,
+				struct dma_buf_attachment *attachment)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = dmabuf->priv;
+
+	attachment->priv = hl_dmabuf;
+
+	return 0;
+}
+
+static void hl_dmabuf_detach(struct dma_buf *dmabuf,
+				struct dma_buf_attachment *attachment)
+{
+	attachment->priv = NULL;
+}
+
+static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
+					enum dma_data_direction dir)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = attachment->priv;
+
+	return &hl_dmabuf->sgt;
+}
+
+static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
+				  struct sg_table *table,
+				  enum dma_data_direction dir)
+{
+	/* Nothing to do here but the "unmap_dma_buf" callback is mandatory */
+}
+
+static void hl_release_dmabuf(struct dma_buf *dmabuf)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = dmabuf->priv;
+	struct hl_ctx *ctx = hl_dmabuf->ctx;
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm *vm = &hdev->vm;
+
+	if (hl_dmabuf->phys_pg_pack) {
+		spin_lock(&vm->idr_lock);
+		hl_dmabuf->phys_pg_pack->exporting_cnt--;
+		spin_unlock(&vm->idr_lock);
+	}
+
+	hl_ctx_put(hl_dmabuf->ctx);
+
+	sg_free_table(&hl_dmabuf->sgt);
+	kfree(hl_dmabuf);
+}
+
+static const struct dma_buf_ops habanalabs_dmabuf_ops = {
+	.attach = hl_dmabuf_attach,
+	.detach = hl_dmabuf_detach,
+	.map_dma_buf = hl_map_dmabuf,
+	.unmap_dma_buf = hl_unmap_dmabuf,
+	.release = hl_release_dmabuf,
+};
+
+static int alloc_sgt_from_device_pages(struct hl_ctx *ctx, struct sg_table *sgt,
+					u64 *pages, u64 npages, u64 page_size)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 chunk_size, bar_address;
+	struct scatterlist *sg;
+	int rc, i, j, nents, cur_page;
+
+	/* Get number of non-contiguous chunks */
+	for (i = 1, nents = 1, chunk_size = page_size ; i < npages ; i++) {
+		if (pages[i - 1] + page_size != pages[i] ||
+				chunk_size + page_size >
+					DMA_BUF_CHUNK_MAX_SIZE) {
+			nents++;
+			chunk_size = page_size;
+			continue;
+		}
+
+		chunk_size += page_size;
+	}
+
+	rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
+	if (rc)
+		return rc;
+
+	/* Merge pages and put them into the scatterlist */
+	cur_page = 0;
+	for_each_sg(sgt->sgl, sg, nents, i) {
+		chunk_size = page_size;
+		for (j = cur_page + 1 ; j < npages ; j++) {
+			if (pages[j - 1] + page_size != pages[j] ||
+					chunk_size + page_size >
+						DMA_BUF_CHUNK_MAX_SIZE)
+				break;
+			chunk_size += page_size;
+		}
+
+		bar_address = hdev->dram_pci_bar_start +
+				(pages[cur_page] - prop->dram_base_address);
+		if (bar_address + chunk_size >
+				hdev->dram_pci_bar_start +
+						prop->dram_pci_bar_size) {
+			dev_err_ratelimited(hdev->dev,
+				"DRAM memory range is outside of PCI BAR boundaries, address 0x%llx, size 0x%llx\n",
+				pages[cur_page], chunk_size);
+			rc = -EINVAL;
+			goto err_sg_free_table;
+		}
+
+		sg_set_page(sg, NULL, chunk_size, 0);
+		sg_dma_address(sg) = bar_address;
+		sg_dma_len(sg) = chunk_size;
+
+		cur_page = j;
+	}
+
+	return 0;
+
+err_sg_free_table:
+	sg_free_table(sgt);
+	return rc;
+}
+
+static int _export_dmabuf_common(struct hl_ctx *ctx,
+			struct hl_dmabuf_wrapper *hl_dmabuf, u64 *pages,
+			u64 npages, u64 page_size, int flags, int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	int rc, fd;
+
+	rc = alloc_sgt_from_device_pages(ctx, &hl_dmabuf->sgt, pages, npages,
+						page_size);
+	if (rc) {
+		dev_err(hdev->dev,
+			"failed to create a scatterlist table for exported device memory\n");
+		return rc;
+	}
+
+	exp_info.ops = &habanalabs_dmabuf_ops;
+	exp_info.size = hl_dmabuf->total_size;
+	exp_info.flags = flags;
+	exp_info.priv = hl_dmabuf;
+	hl_dmabuf->dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(hl_dmabuf->dmabuf)) {
+		dev_err(hdev->dev, "failed to export dma-buf\n");
+		rc = PTR_ERR(hl_dmabuf->dmabuf);
+		goto err_sg_free_table;
+	}
+
+	fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
+	if (fd < 0) {
+		dev_err(hdev->dev,
+			"failed to get a file descriptor for a dma-buf\n");
+		rc = fd;
+		goto err_dma_buf_put;
+	}
+
+	hl_dmabuf->ctx = ctx;
+	hl_ctx_get(hdev, hl_dmabuf->ctx);
+
+	*dmabuf_fd = fd;
+
+	return 0;
+
+err_dma_buf_put:
+	dma_buf_put(hl_dmabuf->dmabuf);
+err_sg_free_table:
+	sg_free_table(&hl_dmabuf->sgt);
+	return rc;
+}
+
+static int export_dmabuf_common(struct hl_ctx *ctx,
+			struct hl_dmabuf_wrapper *hl_dmabuf, u64 *pages,
+			u64 npages, u64 page_size, int flags, int *dmabuf_fd)
+{
+	u64 *split_pages, npages_orig;
+	u32 split_factor;
+	int rc, i, j;
+
+	if (page_size <= DMA_BUF_CHUNK_MAX_SIZE)
+		return _export_dmabuf_common(ctx, hl_dmabuf, pages, npages,
+						page_size, flags, dmabuf_fd);
+
+	/* page_size is a multiple of DMA_BUF_MEM_SIZE_ALIGNMENT */
+	split_factor = (u32) div_u64(page_size, DMA_BUF_MEM_SIZE_ALIGNMENT);
+	npages_orig = npages;
+	npages *= split_factor;
+	page_size = DMA_BUF_MEM_SIZE_ALIGNMENT;
+
+	split_pages = kcalloc(npages, sizeof(*split_pages), GFP_KERNEL);
+	if (!split_pages)
+		return -ENOMEM;
+
+	for (i = 0 ; i < npages_orig ; i++)
+		for (j = 0 ; j < split_factor ; j++)
+			split_pages[i * split_factor + j] =
+					pages[i] + j * page_size;
+
+	rc = _export_dmabuf_common(ctx, hl_dmabuf, split_pages, npages,
+					page_size, flags, dmabuf_fd);
+
+	kfree(split_pages);
+
+	return rc;
+}
+
+/**
+ * export_dmabuf_from_addr() - export a dma-buf object for the given memory
+ *                             address and size.
+ * @ctx: pointer to the context structure.
+ * @device_addr:  device memory physical address.
+ * @size: size of device memory.
+ * @flags: DMA-BUF file/FD flags.
+ * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
+ *
+ * Create and export a dma-buf object for an existing memory allocation inside
+ * the device memory, and return a FD which is associated with the dma-buf
+ * object.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr,
+					u64 size, int flags, int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct hl_dmabuf_wrapper *hl_dmabuf;
+	int rc;
+
+	if (!IS_ALIGNED(device_addr, DMA_BUF_MEM_ADDR_ALIGNMENT)) {
+		dev_err_ratelimited(hdev->dev,
+			"address of exported device memory should be aligned to 0x%x, address 0x%llx\n",
+			DMA_BUF_MEM_ADDR_ALIGNMENT, device_addr);
+		return -EINVAL;
+	}
+
+	if (!size) {
+		dev_err_ratelimited(hdev->dev,
+			"size of exported device memory should be greater than 0\n");
+		return -EINVAL;
+	}
+
+	if (!IS_ALIGNED(size, DMA_BUF_MEM_SIZE_ALIGNMENT)) {
+		dev_err_ratelimited(hdev->dev,
+			"size of exported device memory should be aligned to 0x%x, size 0x%llx\n",
+			DMA_BUF_MEM_SIZE_ALIGNMENT, device_addr);
+		return -EINVAL;
+	}
+
+	if (device_addr < prop->dram_user_base_address ||
+			device_addr + size > prop->dram_end_address) {
+		dev_err_ratelimited(hdev->dev,
+			"DRAM memory range is outside of DRAM boundaries, address 0x%llx, size 0x%llx\n",
+			device_addr, size);
+		return -EINVAL;
+	}
+
+	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
+	if (!hl_dmabuf)
+		return -ENOMEM;
+
+	hl_dmabuf->handle = device_addr;
+	hl_dmabuf->total_size = size;
+
+	rc = export_dmabuf_common(ctx, hl_dmabuf, &device_addr, 1, size, flags,
+					dmabuf_fd);
+	if (rc)
+		goto err_free_dmabuf_wrapper;
+
+	return 0;
+
+err_free_dmabuf_wrapper:
+	kfree(hl_dmabuf);
+	return rc;
+}
+
+/**
+ * export_dmabuf_from_handle() - export a dma-buf object for the given memory
+ *                               handle.
+ * @ctx: pointer to the context structure.
+ * @handle: device memory allocation handle.
+ * @flags: DMA-BUF file/FD flags.
+ * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
+ *
+ * Create and export a dma-buf object for an existing memory allocation inside
+ * the device memory, and return a FD which is associated with the dma-buf
+ * object.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags,
+					int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm_phys_pg_pack *phys_pg_pack;
+	struct hl_dmabuf_wrapper *hl_dmabuf;
+	struct hl_vm *vm = &hdev->vm;
+	u32 idr_handle;
+	int rc;
+
+	idr_handle = lower_32_bits(handle);
+
+	spin_lock(&vm->idr_lock);
+
+	phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, idr_handle);
+	if (!phys_pg_pack) {
+		spin_unlock(&vm->idr_lock);
+		dev_err_ratelimited(hdev->dev, "no match for handle 0x%x\n",
+				idr_handle);
+		return -EINVAL;
+	}
+
+	/* increment now to avoid freeing device memory while exporting */
+	phys_pg_pack->exporting_cnt++;
+
+	spin_unlock(&vm->idr_lock);
+
+	if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
+		dev_err_ratelimited(hdev->dev,
+				"handle 0x%llx is not for DRAM memory\n",
+				handle);
+		rc = -EINVAL;
+		goto err_dec_exporting_cnt;
+	}
+
+	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
+	if (!hl_dmabuf) {
+		rc = -ENOMEM;
+		goto err_dec_exporting_cnt;
+	}
+
+	hl_dmabuf->phys_pg_pack = phys_pg_pack;
+	hl_dmabuf->handle = handle;
+	hl_dmabuf->total_size = phys_pg_pack->total_size;
+
+	rc = export_dmabuf_common(ctx, hl_dmabuf, phys_pg_pack->pages,
+				phys_pg_pack->npages, phys_pg_pack->page_size,
+				flags, dmabuf_fd);
+	if (rc)
+		goto err_free_dmabuf_wrapper;
+
+	return 0;
+
+err_free_dmabuf_wrapper:
+	kfree(hl_dmabuf);
+
+err_dec_exporting_cnt:
+	spin_lock(&vm->idr_lock);
+	phys_pg_pack->exporting_cnt--;
+	spin_unlock(&vm->idr_lock);
+
+	return rc;
+}
+
 static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 {
 	struct hl_device *hdev = hpriv->hdev;
 	struct hl_ctx *ctx = hpriv->ctx;
 	u64 block_handle, device_addr = 0;
 	u32 handle = 0, block_size;
-	int rc;
+	int rc, dmabuf_fd = -EBADF;
 
 	switch (args->in.op) {
 	case HL_MEM_OP_ALLOC:
@@ -1466,6 +1834,16 @@ static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 		args->out.block_size = block_size;
 		break;
 
+	case HL_MEM_OP_EXPORT_DMABUF_FD:
+		rc = export_dmabuf_from_addr(ctx,
+				args->in.export_dmabuf_fd.handle,
+				args->in.export_dmabuf_fd.mem_size,
+				args->in.flags,
+				&dmabuf_fd);
+		memset(args, 0, sizeof(*args));
+		args->out.fd = dmabuf_fd;
+		break;
+
 	default:
 		dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 		rc = -ENOTTY;
@@ -1484,7 +1862,7 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 	struct hl_ctx *ctx = hpriv->ctx;
 	u64 block_handle, device_addr = 0;
 	u32 handle = 0, block_size;
-	int rc;
+	int rc, dmabuf_fd = -EBADF;
 
 	if (!hl_device_operational(hdev, &status)) {
 		dev_warn_ratelimited(hdev->dev,
@@ -1575,6 +1953,22 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 		args->out.block_size = block_size;
 		break;
 
+	case HL_MEM_OP_EXPORT_DMABUF_FD:
+		if (hdev->asic_prop.dram_supports_virtual_memory)
+			rc = export_dmabuf_from_handle(ctx,
+					args->in.export_dmabuf_fd.handle,
+					args->in.flags,
+					&dmabuf_fd);
+		else
+			rc = export_dmabuf_from_addr(ctx,
+					args->in.export_dmabuf_fd.handle,
+					args->in.export_dmabuf_fd.mem_size,
+					args->in.flags,
+					&dmabuf_fd);
+		memset(args, 0, sizeof(*args));
+		args->out.fd = dmabuf_fd;
+		break;
+
 	default:
 		dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 		rc = -ENOTTY;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index be830948e051..33f36da766fc 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -685,6 +685,7 @@ static int gaudi_early_init(struct hl_device *hdev)
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
+	hdev->dram_pci_bar_start = pci_resource_start(pdev, HBM_BAR_ID);
 
 	/* If FW security is enabled at this point it means no access to ELBI */
 	if (hdev->asic_prop.fw_security_enabled) {
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5a837c0b4d76..ad2c6f788030 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -617,6 +617,7 @@ static int goya_early_init(struct hl_device *hdev)
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
+	hdev->dram_pci_bar_start = pci_resource_start(pdev, DDR_BAR_ID);
 
 	/* If FW security is enabled at this point it means no access to ELBI */
 	if (hdev->asic_prop.fw_security_enabled) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v3 2/2] habanalabs: add support for dma-buf exporter
@ 2021-06-18 12:36   ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-18 12:36 UTC (permalink / raw)
  To: linux-kernel, gregkh
  Cc: daniel.vetter, sleybo, galpress, dri-devel, christian.koenig,
	Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

Implement the calls to the dma-buf kernel api to create a dma-buf
object backed by FD.

We block the option to mmap the DMA-BUF object because we don't support
DIRECT_IO and implicit P2P. We only implement support for explicit P2P
through importing the FD of the DMA-BUF.

In the export phase, we provide a static SG list to the DMA-BUF object
because in  Habanalabs's ASICs, the device memory is pinned and
immutable. Therefore, there is no need for dynamic mappings and pinning
callbacks.

Note that in GAUDI we don't have an MMU towards the device memory and
the user works on physical addresses. Therefore, the user doesn't pass
through the kernel driver to allocate memory there. As a result, only
for GAUDI we receive from the user a device memory physical address
(instead of a handle) and a size.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
Changes in v3:
 - remove calls to dev_dbg()
 - remove hl_dmabuf_wrapper.fd as it is only for debug prints
 - clear attachment->priv in the detach callback
 - add a comment to why an empty unmap_dma_buf callback is needed
 - modify exporting_cnt from atomic_t to u32 and add a lock when missing
 - remove hl_ctx.dmabuf_list which wasn't really used
 - replace dev_err with dev_err_ratelimited when checking user's inputs
 - read vm_type directly from phys_pg_pack instead of casting the struct

 drivers/misc/habanalabs/Kconfig             |   1 +
 drivers/misc/habanalabs/common/habanalabs.h |  24 ++
 drivers/misc/habanalabs/common/memory.c     | 400 +++++++++++++++++++-
 drivers/misc/habanalabs/gaudi/gaudi.c       |   1 +
 drivers/misc/habanalabs/goya/goya.c         |   1 +
 5 files changed, 424 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 293d79811372..c82d2e7b2035 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -8,6 +8,7 @@ config HABANA_AI
 	depends on PCI && HAS_IOMEM
 	select GENERIC_ALLOCATOR
 	select HWMON
+	select DMA_SHARED_BUFFER
 	help
 	  Enables PCIe card driver for Habana's AI Processors (AIP) that are
 	  designed to accelerate Deep Learning inference and training workloads.
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 09b89fdeba0b..0e418be61727 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -20,6 +20,7 @@
 #include <linux/scatterlist.h>
 #include <linux/hashtable.h>
 #include <linux/debugfs.h>
+#include <linux/dma-buf.h>
 #include <linux/bitfield.h>
 #include <linux/genalloc.h>
 #include <linux/sched/signal.h>
@@ -1290,6 +1291,25 @@ struct hl_pending_cb {
 	u32			hw_queue_id;
 };
 
+/**
+ * struct hl_dmabuf_wrapper - a dma-buf wrapper object.
+ * @dmabuf: pointer to dma-buf object.
+ * @ctx: pointer to the dma-buf owner's context.
+ * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported for
+ *                memory allocation handle.
+ * @sgt: scatter-gather table that holds the exported pages.
+ * @total_size: total size of all exported pages.
+ * @handle: allocation handle or physical address of the exported memory.
+ */
+struct hl_dmabuf_wrapper {
+	struct dma_buf			*dmabuf;
+	struct hl_ctx			*ctx;
+	struct hl_vm_phys_pg_pack	*phys_pg_pack;
+	struct sg_table			sgt;
+	u64				total_size;
+	u64				handle;
+};
+
 /**
  * struct hl_ctx - user/kernel context.
  * @mem_hash: holds mapping from virtual address to virtual memory area
@@ -1598,6 +1618,7 @@ struct hl_vm_hw_block_list_node {
  * @npages: num physical pages in the pack.
  * @total_size: total size of all the pages in this list.
  * @mapping_cnt: number of shared mappings.
+ * @exporting_cnt: number of dma-buf exporting.
  * @asid: the context related to this list.
  * @page_size: size of each page in the pack.
  * @flags: HL_MEM_* flags related to this list.
@@ -1612,6 +1633,7 @@ struct hl_vm_phys_pg_pack {
 	u64			npages;
 	u64			total_size;
 	atomic_t		mapping_cnt;
+	u32			exporting_cnt;
 	u32			asid;
 	u32			page_size;
 	u32			flags;
@@ -2137,6 +2159,7 @@ struct hl_mmu_funcs {
  *                          the error will be ignored by the driver during
  *                          device initialization. Mainly used to debug and
  *                          workaround firmware bugs
+ * @dram_pci_bar_start: start bus address of PCIe bar towards DRAM.
  * @last_successful_open_jif: timestamp (jiffies) of the last successful
  *                            device open.
  * @last_open_session_duration_jif: duration (jiffies) of the last device open
@@ -2266,6 +2289,7 @@ struct hl_device {
 	u64				max_power;
 	u64				clock_gating_mask;
 	u64				boot_error_status_mask;
+	u64				dram_pci_bar_start;
 	u64				last_successful_open_jif;
 	u64				last_open_session_duration_jif;
 	u64				open_counter;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index af339ce1ab4f..45d663f746a3 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -15,7 +15,14 @@
 #define HL_MMU_DEBUG	0
 
 /* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
-#define DRAM_POOL_PAGE_SIZE SZ_8M
+#define DRAM_POOL_PAGE_SIZE		SZ_8M
+
+/* dma-buf alignment requirements when exporting memory with address/size */
+#define DMA_BUF_MEM_ADDR_ALIGNMENT	SZ_32M
+#define DMA_BUF_MEM_SIZE_ALIGNMENT	SZ_32M
+
+/* dma-buf chunk size cannot exceed the scatterlist "unsigned int" length */
+#define DMA_BUF_CHUNK_MAX_SIZE		SZ_512M
 
 /*
  * The va ranges in context object contain a list with the available chunks of
@@ -347,6 +354,13 @@ static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
 			return -EINVAL;
 		}
 
+		if (phys_pg_pack->exporting_cnt) {
+			dev_err(hdev->dev,
+				"handle %u is exported, cannot free\n",	handle);
+			spin_unlock(&vm->idr_lock);
+			return -EINVAL;
+		}
+
 		/*
 		 * must remove from idr before the freeing of the physical
 		 * pages as the refcount of the pool is also the trigger of the
@@ -1411,13 +1425,367 @@ int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int hl_dmabuf_attach(struct dma_buf *dmabuf,
+				struct dma_buf_attachment *attachment)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = dmabuf->priv;
+
+	attachment->priv = hl_dmabuf;
+
+	return 0;
+}
+
+static void hl_dmabuf_detach(struct dma_buf *dmabuf,
+				struct dma_buf_attachment *attachment)
+{
+	attachment->priv = NULL;
+}
+
+static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
+					enum dma_data_direction dir)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = attachment->priv;
+
+	return &hl_dmabuf->sgt;
+}
+
+static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
+				  struct sg_table *table,
+				  enum dma_data_direction dir)
+{
+	/* Nothing to do here but the "unmap_dma_buf" callback is mandatory */
+}
+
+static void hl_release_dmabuf(struct dma_buf *dmabuf)
+{
+	struct hl_dmabuf_wrapper *hl_dmabuf = dmabuf->priv;
+	struct hl_ctx *ctx = hl_dmabuf->ctx;
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm *vm = &hdev->vm;
+
+	if (hl_dmabuf->phys_pg_pack) {
+		spin_lock(&vm->idr_lock);
+		hl_dmabuf->phys_pg_pack->exporting_cnt--;
+		spin_unlock(&vm->idr_lock);
+	}
+
+	hl_ctx_put(hl_dmabuf->ctx);
+
+	sg_free_table(&hl_dmabuf->sgt);
+	kfree(hl_dmabuf);
+}
+
+static const struct dma_buf_ops habanalabs_dmabuf_ops = {
+	.attach = hl_dmabuf_attach,
+	.detach = hl_dmabuf_detach,
+	.map_dma_buf = hl_map_dmabuf,
+	.unmap_dma_buf = hl_unmap_dmabuf,
+	.release = hl_release_dmabuf,
+};
+
+static int alloc_sgt_from_device_pages(struct hl_ctx *ctx, struct sg_table *sgt,
+					u64 *pages, u64 npages, u64 page_size)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 chunk_size, bar_address;
+	struct scatterlist *sg;
+	int rc, i, j, nents, cur_page;
+
+	/* Get number of non-contiguous chunks */
+	for (i = 1, nents = 1, chunk_size = page_size ; i < npages ; i++) {
+		if (pages[i - 1] + page_size != pages[i] ||
+				chunk_size + page_size >
+					DMA_BUF_CHUNK_MAX_SIZE) {
+			nents++;
+			chunk_size = page_size;
+			continue;
+		}
+
+		chunk_size += page_size;
+	}
+
+	rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
+	if (rc)
+		return rc;
+
+	/* Merge pages and put them into the scatterlist */
+	cur_page = 0;
+	for_each_sg(sgt->sgl, sg, nents, i) {
+		chunk_size = page_size;
+		for (j = cur_page + 1 ; j < npages ; j++) {
+			if (pages[j - 1] + page_size != pages[j] ||
+					chunk_size + page_size >
+						DMA_BUF_CHUNK_MAX_SIZE)
+				break;
+			chunk_size += page_size;
+		}
+
+		bar_address = hdev->dram_pci_bar_start +
+				(pages[cur_page] - prop->dram_base_address);
+		if (bar_address + chunk_size >
+				hdev->dram_pci_bar_start +
+						prop->dram_pci_bar_size) {
+			dev_err_ratelimited(hdev->dev,
+				"DRAM memory range is outside of PCI BAR boundaries, address 0x%llx, size 0x%llx\n",
+				pages[cur_page], chunk_size);
+			rc = -EINVAL;
+			goto err_sg_free_table;
+		}
+
+		sg_set_page(sg, NULL, chunk_size, 0);
+		sg_dma_address(sg) = bar_address;
+		sg_dma_len(sg) = chunk_size;
+
+		cur_page = j;
+	}
+
+	return 0;
+
+err_sg_free_table:
+	sg_free_table(sgt);
+	return rc;
+}
+
+static int _export_dmabuf_common(struct hl_ctx *ctx,
+			struct hl_dmabuf_wrapper *hl_dmabuf, u64 *pages,
+			u64 npages, u64 page_size, int flags, int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	int rc, fd;
+
+	rc = alloc_sgt_from_device_pages(ctx, &hl_dmabuf->sgt, pages, npages,
+						page_size);
+	if (rc) {
+		dev_err(hdev->dev,
+			"failed to create a scatterlist table for exported device memory\n");
+		return rc;
+	}
+
+	exp_info.ops = &habanalabs_dmabuf_ops;
+	exp_info.size = hl_dmabuf->total_size;
+	exp_info.flags = flags;
+	exp_info.priv = hl_dmabuf;
+	hl_dmabuf->dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(hl_dmabuf->dmabuf)) {
+		dev_err(hdev->dev, "failed to export dma-buf\n");
+		rc = PTR_ERR(hl_dmabuf->dmabuf);
+		goto err_sg_free_table;
+	}
+
+	fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
+	if (fd < 0) {
+		dev_err(hdev->dev,
+			"failed to get a file descriptor for a dma-buf\n");
+		rc = fd;
+		goto err_dma_buf_put;
+	}
+
+	hl_dmabuf->ctx = ctx;
+	hl_ctx_get(hdev, hl_dmabuf->ctx);
+
+	*dmabuf_fd = fd;
+
+	return 0;
+
+err_dma_buf_put:
+	dma_buf_put(hl_dmabuf->dmabuf);
+err_sg_free_table:
+	sg_free_table(&hl_dmabuf->sgt);
+	return rc;
+}
+
+static int export_dmabuf_common(struct hl_ctx *ctx,
+			struct hl_dmabuf_wrapper *hl_dmabuf, u64 *pages,
+			u64 npages, u64 page_size, int flags, int *dmabuf_fd)
+{
+	u64 *split_pages, npages_orig;
+	u32 split_factor;
+	int rc, i, j;
+
+	if (page_size <= DMA_BUF_CHUNK_MAX_SIZE)
+		return _export_dmabuf_common(ctx, hl_dmabuf, pages, npages,
+						page_size, flags, dmabuf_fd);
+
+	/* page_size is a multiple of DMA_BUF_MEM_SIZE_ALIGNMENT */
+	split_factor = (u32) div_u64(page_size, DMA_BUF_MEM_SIZE_ALIGNMENT);
+	npages_orig = npages;
+	npages *= split_factor;
+	page_size = DMA_BUF_MEM_SIZE_ALIGNMENT;
+
+	split_pages = kcalloc(npages, sizeof(*split_pages), GFP_KERNEL);
+	if (!split_pages)
+		return -ENOMEM;
+
+	for (i = 0 ; i < npages_orig ; i++)
+		for (j = 0 ; j < split_factor ; j++)
+			split_pages[i * split_factor + j] =
+					pages[i] + j * page_size;
+
+	rc = _export_dmabuf_common(ctx, hl_dmabuf, split_pages, npages,
+					page_size, flags, dmabuf_fd);
+
+	kfree(split_pages);
+
+	return rc;
+}
+
+/**
+ * export_dmabuf_from_addr() - export a dma-buf object for the given memory
+ *                             address and size.
+ * @ctx: pointer to the context structure.
+ * @device_addr:  device memory physical address.
+ * @size: size of device memory.
+ * @flags: DMA-BUF file/FD flags.
+ * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
+ *
+ * Create and export a dma-buf object for an existing memory allocation inside
+ * the device memory, and return a FD which is associated with the dma-buf
+ * object.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr,
+					u64 size, int flags, int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct hl_dmabuf_wrapper *hl_dmabuf;
+	int rc;
+
+	if (!IS_ALIGNED(device_addr, DMA_BUF_MEM_ADDR_ALIGNMENT)) {
+		dev_err_ratelimited(hdev->dev,
+			"address of exported device memory should be aligned to 0x%x, address 0x%llx\n",
+			DMA_BUF_MEM_ADDR_ALIGNMENT, device_addr);
+		return -EINVAL;
+	}
+
+	if (!size) {
+		dev_err_ratelimited(hdev->dev,
+			"size of exported device memory should be greater than 0\n");
+		return -EINVAL;
+	}
+
+	if (!IS_ALIGNED(size, DMA_BUF_MEM_SIZE_ALIGNMENT)) {
+		dev_err_ratelimited(hdev->dev,
+			"size of exported device memory should be aligned to 0x%x, size 0x%llx\n",
+			DMA_BUF_MEM_SIZE_ALIGNMENT, device_addr);
+		return -EINVAL;
+	}
+
+	if (device_addr < prop->dram_user_base_address ||
+			device_addr + size > prop->dram_end_address) {
+		dev_err_ratelimited(hdev->dev,
+			"DRAM memory range is outside of DRAM boundaries, address 0x%llx, size 0x%llx\n",
+			device_addr, size);
+		return -EINVAL;
+	}
+
+	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
+	if (!hl_dmabuf)
+		return -ENOMEM;
+
+	hl_dmabuf->handle = device_addr;
+	hl_dmabuf->total_size = size;
+
+	rc = export_dmabuf_common(ctx, hl_dmabuf, &device_addr, 1, size, flags,
+					dmabuf_fd);
+	if (rc)
+		goto err_free_dmabuf_wrapper;
+
+	return 0;
+
+err_free_dmabuf_wrapper:
+	kfree(hl_dmabuf);
+	return rc;
+}
+
+/**
+ * export_dmabuf_from_handle() - export a dma-buf object for the given memory
+ *                               handle.
+ * @ctx: pointer to the context structure.
+ * @handle: device memory allocation handle.
+ * @flags: DMA-BUF file/FD flags.
+ * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
+ *
+ * Create and export a dma-buf object for an existing memory allocation inside
+ * the device memory, and return a FD which is associated with the dma-buf
+ * object.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags,
+					int *dmabuf_fd)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm_phys_pg_pack *phys_pg_pack;
+	struct hl_dmabuf_wrapper *hl_dmabuf;
+	struct hl_vm *vm = &hdev->vm;
+	u32 idr_handle;
+	int rc;
+
+	idr_handle = lower_32_bits(handle);
+
+	spin_lock(&vm->idr_lock);
+
+	phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, idr_handle);
+	if (!phys_pg_pack) {
+		spin_unlock(&vm->idr_lock);
+		dev_err_ratelimited(hdev->dev, "no match for handle 0x%x\n",
+				idr_handle);
+		return -EINVAL;
+	}
+
+	/* increment now to avoid freeing device memory while exporting */
+	phys_pg_pack->exporting_cnt++;
+
+	spin_unlock(&vm->idr_lock);
+
+	if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
+		dev_err_ratelimited(hdev->dev,
+				"handle 0x%llx is not for DRAM memory\n",
+				handle);
+		rc = -EINVAL;
+		goto err_dec_exporting_cnt;
+	}
+
+	hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
+	if (!hl_dmabuf) {
+		rc = -ENOMEM;
+		goto err_dec_exporting_cnt;
+	}
+
+	hl_dmabuf->phys_pg_pack = phys_pg_pack;
+	hl_dmabuf->handle = handle;
+	hl_dmabuf->total_size = phys_pg_pack->total_size;
+
+	rc = export_dmabuf_common(ctx, hl_dmabuf, phys_pg_pack->pages,
+				phys_pg_pack->npages, phys_pg_pack->page_size,
+				flags, dmabuf_fd);
+	if (rc)
+		goto err_free_dmabuf_wrapper;
+
+	return 0;
+
+err_free_dmabuf_wrapper:
+	kfree(hl_dmabuf);
+
+err_dec_exporting_cnt:
+	spin_lock(&vm->idr_lock);
+	phys_pg_pack->exporting_cnt--;
+	spin_unlock(&vm->idr_lock);
+
+	return rc;
+}
+
 static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 {
 	struct hl_device *hdev = hpriv->hdev;
 	struct hl_ctx *ctx = hpriv->ctx;
 	u64 block_handle, device_addr = 0;
 	u32 handle = 0, block_size;
-	int rc;
+	int rc, dmabuf_fd = -EBADF;
 
 	switch (args->in.op) {
 	case HL_MEM_OP_ALLOC:
@@ -1466,6 +1834,16 @@ static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 		args->out.block_size = block_size;
 		break;
 
+	case HL_MEM_OP_EXPORT_DMABUF_FD:
+		rc = export_dmabuf_from_addr(ctx,
+				args->in.export_dmabuf_fd.handle,
+				args->in.export_dmabuf_fd.mem_size,
+				args->in.flags,
+				&dmabuf_fd);
+		memset(args, 0, sizeof(*args));
+		args->out.fd = dmabuf_fd;
+		break;
+
 	default:
 		dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 		rc = -ENOTTY;
@@ -1484,7 +1862,7 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 	struct hl_ctx *ctx = hpriv->ctx;
 	u64 block_handle, device_addr = 0;
 	u32 handle = 0, block_size;
-	int rc;
+	int rc, dmabuf_fd = -EBADF;
 
 	if (!hl_device_operational(hdev, &status)) {
 		dev_warn_ratelimited(hdev->dev,
@@ -1575,6 +1953,22 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 		args->out.block_size = block_size;
 		break;
 
+	case HL_MEM_OP_EXPORT_DMABUF_FD:
+		if (hdev->asic_prop.dram_supports_virtual_memory)
+			rc = export_dmabuf_from_handle(ctx,
+					args->in.export_dmabuf_fd.handle,
+					args->in.flags,
+					&dmabuf_fd);
+		else
+			rc = export_dmabuf_from_addr(ctx,
+					args->in.export_dmabuf_fd.handle,
+					args->in.export_dmabuf_fd.mem_size,
+					args->in.flags,
+					&dmabuf_fd);
+		memset(args, 0, sizeof(*args));
+		args->out.fd = dmabuf_fd;
+		break;
+
 	default:
 		dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 		rc = -ENOTTY;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index be830948e051..33f36da766fc 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -685,6 +685,7 @@ static int gaudi_early_init(struct hl_device *hdev)
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
+	hdev->dram_pci_bar_start = pci_resource_start(pdev, HBM_BAR_ID);
 
 	/* If FW security is enabled at this point it means no access to ELBI */
 	if (hdev->asic_prop.fw_security_enabled) {
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5a837c0b4d76..ad2c6f788030 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -617,6 +617,7 @@ static int goya_early_init(struct hl_device *hdev)
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
+	hdev->dram_pci_bar_start = pci_resource_start(pdev, DDR_BAR_ID);
 
 	/* If FW security is enabled at this point it means no access to ELBI */
 	if (hdev->asic_prop.fw_security_enabled) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-18 12:36 ` Oded Gabbay
  (?)
@ 2021-06-21 12:28   ` Daniel Vetter
  -1 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 12:28 UTC (permalink / raw)
  To: Oded Gabbay, Jason Gunthorpe, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied
  Cc: Linux Kernel Mailing List, Greg KH, Sumit Semwal,
	Christian König, Gal Pressman, sleybo, dri-devel,
	Tomer Tayar, moderated list:DMA BUFFER SHARING FRAMEWORK,
	amd-gfx list, Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> User process might want to share the device memory with another
> driver/device, and to allow it to access it over PCIe (P2P).
>
> To enable this, we utilize the dma-buf mechanism and add a dma-buf
> exporter support, so the other driver can import the device memory and
> access it.
>
> The device memory is allocated using our existing allocation uAPI,
> where the user will get a handle that represents the allocation.
>
> The user will then need to call the new
> uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
>
> The driver will return a FD that represents the DMA-BUF object that
> was created to match that allocation.
>
> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> Reviewed-by: Tomer Tayar <ttayar@habana.ai>

Mission acomplished, we've gone full circle, and the totally-not-a-gpu
driver is now trying to use gpu infrastructure. And seems to have
gained vram meanwhile too. Next up is going to be synchronization
using dma_fence so you can pass buffers back&forth without stalls
among drivers.

Bonus points for this being at v3 before it shows up on dri-devel and
cc's dma-buf folks properly (not quite all, I added the missing
people).

I think we roughly have two options here

a) Greg continues to piss off dri-devel folks while trying to look
cute&cuddly and steadfastly claiming that this accelator doesn't work
like any of the other accelerator drivers we have in drivers/gpu/drm.
All while the driver ever more looks like one of these other accel
drivers.

b) We finally do what we should have done years back and treat this as
a proper driver submission and review it on dri-devel instead of
sneaking it in through other channels because the merge criteria
dri-devel has are too onerous and people who don't have experience
with accel stacks for the past 20 years or so don't like them.

"But this probably means a new driver and big disruption!"

Not my problem, I'm not the dude who has to come up with an excuse for
this because I didn't merge the driver in the first place. I do get to
throw a "we all told you so" in though, but that's not helping.

Also I'm wondering which is the other driver that we share buffers
with. The gaudi stuff doesn't have real struct pages as backing
storage, it only fills out the dma_addr_t. That tends to blow up with
other drivers, and the only place where this is guaranteed to work is
if you have a dynamic importer which sets the allow_peer2peer flag.
Adding maintainers from other subsystems who might want to chime in
here. So even aside of the big question as-is this is broken.

Currently only 2 drivers set allow_peer2peer, so those are the only
ones who can consume these buffers from device memory. Pinging those
folks specifically.

Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
wildcard match so that you can catch these next time around too? At
least when people use scripts/get_maintainers.pl correctly. All the
other subsystems using dma-buf are on there already (dri-devel,
linux-media and linaro-mm-sig for android/arm embedded stuff).

Cheers, Daniel

> ---
>  include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
> index a47a731e4527..aa3d8e0ba060 100644
> --- a/include/uapi/misc/habanalabs.h
> +++ b/include/uapi/misc/habanalabs.h
> @@ -808,6 +808,10 @@ union hl_wait_cs_args {
>  #define HL_MEM_OP_UNMAP                        3
>  /* Opcode to map a hw block */
>  #define HL_MEM_OP_MAP_BLOCK            4
> +/* Opcode to create DMA-BUF object for an existing device memory allocation
> + * and to export an FD of that DMA-BUF back to the caller
> + */
> +#define HL_MEM_OP_EXPORT_DMABUF_FD     5
>
>  /* Memory flags */
>  #define HL_MEM_CONTIGUOUS      0x1
> @@ -878,11 +882,26 @@ struct hl_mem_in {
>                         /* Virtual address returned from HL_MEM_OP_MAP */
>                         __u64 device_virt_addr;
>                 } unmap;
> +
> +               /* HL_MEM_OP_EXPORT_DMABUF_FD */
> +               struct {
> +                       /* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
> +                        * where we don't have MMU for the device memory, the
> +                        * driver expects a physical address (instead of
> +                        * a handle) in the device memory space.
> +                        */
> +                       __u64 handle;
> +                       /* Size of memory allocation. Relevant only for GAUDI */
> +                       __u64 mem_size;
> +               } export_dmabuf_fd;
>         };
>
>         /* HL_MEM_OP_* */
>         __u32 op;
> -       /* HL_MEM_* flags */
> +       /* HL_MEM_* flags.
> +        * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
> +        * DMA-BUF file/FD flags.
> +        */
>         __u32 flags;
>         /* Context ID - Currently not in use */
>         __u32 ctx_id;
> @@ -919,6 +938,13 @@ struct hl_mem_out {
>
>                         __u32 pad;
>                 };
> +
> +               /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
> +                * DMA-BUF object that was created to describe a memory
> +                * allocation on the device's memory space. The FD should be
> +                * passed to the importer driver
> +                */
> +               __u64 fd;
>         };
>  };
>
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 12:28   ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 12:28 UTC (permalink / raw)
  To: Oded Gabbay, Jason Gunthorpe, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied
  Cc: Greg KH, sleybo, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Gal Pressman,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tomer Tayar,
	amd-gfx list, Alex Deucher, Christian König,
	Leon Romanovsky

On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> User process might want to share the device memory with another
> driver/device, and to allow it to access it over PCIe (P2P).
>
> To enable this, we utilize the dma-buf mechanism and add a dma-buf
> exporter support, so the other driver can import the device memory and
> access it.
>
> The device memory is allocated using our existing allocation uAPI,
> where the user will get a handle that represents the allocation.
>
> The user will then need to call the new
> uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
>
> The driver will return a FD that represents the DMA-BUF object that
> was created to match that allocation.
>
> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> Reviewed-by: Tomer Tayar <ttayar@habana.ai>

Mission acomplished, we've gone full circle, and the totally-not-a-gpu
driver is now trying to use gpu infrastructure. And seems to have
gained vram meanwhile too. Next up is going to be synchronization
using dma_fence so you can pass buffers back&forth without stalls
among drivers.

Bonus points for this being at v3 before it shows up on dri-devel and
cc's dma-buf folks properly (not quite all, I added the missing
people).

I think we roughly have two options here

a) Greg continues to piss off dri-devel folks while trying to look
cute&cuddly and steadfastly claiming that this accelator doesn't work
like any of the other accelerator drivers we have in drivers/gpu/drm.
All while the driver ever more looks like one of these other accel
drivers.

b) We finally do what we should have done years back and treat this as
a proper driver submission and review it on dri-devel instead of
sneaking it in through other channels because the merge criteria
dri-devel has are too onerous and people who don't have experience
with accel stacks for the past 20 years or so don't like them.

"But this probably means a new driver and big disruption!"

Not my problem, I'm not the dude who has to come up with an excuse for
this because I didn't merge the driver in the first place. I do get to
throw a "we all told you so" in though, but that's not helping.

Also I'm wondering which is the other driver that we share buffers
with. The gaudi stuff doesn't have real struct pages as backing
storage, it only fills out the dma_addr_t. That tends to blow up with
other drivers, and the only place where this is guaranteed to work is
if you have a dynamic importer which sets the allow_peer2peer flag.
Adding maintainers from other subsystems who might want to chime in
here. So even aside of the big question as-is this is broken.

Currently only 2 drivers set allow_peer2peer, so those are the only
ones who can consume these buffers from device memory. Pinging those
folks specifically.

Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
wildcard match so that you can catch these next time around too? At
least when people use scripts/get_maintainers.pl correctly. All the
other subsystems using dma-buf are on there already (dri-devel,
linux-media and linaro-mm-sig for android/arm embedded stuff).

Cheers, Daniel

> ---
>  include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
> index a47a731e4527..aa3d8e0ba060 100644
> --- a/include/uapi/misc/habanalabs.h
> +++ b/include/uapi/misc/habanalabs.h
> @@ -808,6 +808,10 @@ union hl_wait_cs_args {
>  #define HL_MEM_OP_UNMAP                        3
>  /* Opcode to map a hw block */
>  #define HL_MEM_OP_MAP_BLOCK            4
> +/* Opcode to create DMA-BUF object for an existing device memory allocation
> + * and to export an FD of that DMA-BUF back to the caller
> + */
> +#define HL_MEM_OP_EXPORT_DMABUF_FD     5
>
>  /* Memory flags */
>  #define HL_MEM_CONTIGUOUS      0x1
> @@ -878,11 +882,26 @@ struct hl_mem_in {
>                         /* Virtual address returned from HL_MEM_OP_MAP */
>                         __u64 device_virt_addr;
>                 } unmap;
> +
> +               /* HL_MEM_OP_EXPORT_DMABUF_FD */
> +               struct {
> +                       /* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
> +                        * where we don't have MMU for the device memory, the
> +                        * driver expects a physical address (instead of
> +                        * a handle) in the device memory space.
> +                        */
> +                       __u64 handle;
> +                       /* Size of memory allocation. Relevant only for GAUDI */
> +                       __u64 mem_size;
> +               } export_dmabuf_fd;
>         };
>
>         /* HL_MEM_OP_* */
>         __u32 op;
> -       /* HL_MEM_* flags */
> +       /* HL_MEM_* flags.
> +        * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
> +        * DMA-BUF file/FD flags.
> +        */
>         __u32 flags;
>         /* Context ID - Currently not in use */
>         __u32 ctx_id;
> @@ -919,6 +938,13 @@ struct hl_mem_out {
>
>                         __u32 pad;
>                 };
> +
> +               /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
> +                * DMA-BUF object that was created to describe a memory
> +                * allocation on the device's memory space. The FD should be
> +                * passed to the importer driver
> +                */
> +               __u64 fd;
>         };
>  };
>
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 12:28   ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 12:28 UTC (permalink / raw)
  To: Oded Gabbay, Jason Gunthorpe, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied
  Cc: Greg KH, sleybo, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Gal Pressman,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tomer Tayar,
	amd-gfx list, Alex Deucher, Sumit Semwal, Christian König,
	Leon Romanovsky

On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> User process might want to share the device memory with another
> driver/device, and to allow it to access it over PCIe (P2P).
>
> To enable this, we utilize the dma-buf mechanism and add a dma-buf
> exporter support, so the other driver can import the device memory and
> access it.
>
> The device memory is allocated using our existing allocation uAPI,
> where the user will get a handle that represents the allocation.
>
> The user will then need to call the new
> uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
>
> The driver will return a FD that represents the DMA-BUF object that
> was created to match that allocation.
>
> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> Reviewed-by: Tomer Tayar <ttayar@habana.ai>

Mission acomplished, we've gone full circle, and the totally-not-a-gpu
driver is now trying to use gpu infrastructure. And seems to have
gained vram meanwhile too. Next up is going to be synchronization
using dma_fence so you can pass buffers back&forth without stalls
among drivers.

Bonus points for this being at v3 before it shows up on dri-devel and
cc's dma-buf folks properly (not quite all, I added the missing
people).

I think we roughly have two options here

a) Greg continues to piss off dri-devel folks while trying to look
cute&cuddly and steadfastly claiming that this accelator doesn't work
like any of the other accelerator drivers we have in drivers/gpu/drm.
All while the driver ever more looks like one of these other accel
drivers.

b) We finally do what we should have done years back and treat this as
a proper driver submission and review it on dri-devel instead of
sneaking it in through other channels because the merge criteria
dri-devel has are too onerous and people who don't have experience
with accel stacks for the past 20 years or so don't like them.

"But this probably means a new driver and big disruption!"

Not my problem, I'm not the dude who has to come up with an excuse for
this because I didn't merge the driver in the first place. I do get to
throw a "we all told you so" in though, but that's not helping.

Also I'm wondering which is the other driver that we share buffers
with. The gaudi stuff doesn't have real struct pages as backing
storage, it only fills out the dma_addr_t. That tends to blow up with
other drivers, and the only place where this is guaranteed to work is
if you have a dynamic importer which sets the allow_peer2peer flag.
Adding maintainers from other subsystems who might want to chime in
here. So even aside of the big question as-is this is broken.

Currently only 2 drivers set allow_peer2peer, so those are the only
ones who can consume these buffers from device memory. Pinging those
folks specifically.

Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
wildcard match so that you can catch these next time around too? At
least when people use scripts/get_maintainers.pl correctly. All the
other subsystems using dma-buf are on there already (dri-devel,
linux-media and linaro-mm-sig for android/arm embedded stuff).

Cheers, Daniel

> ---
>  include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
> index a47a731e4527..aa3d8e0ba060 100644
> --- a/include/uapi/misc/habanalabs.h
> +++ b/include/uapi/misc/habanalabs.h
> @@ -808,6 +808,10 @@ union hl_wait_cs_args {
>  #define HL_MEM_OP_UNMAP                        3
>  /* Opcode to map a hw block */
>  #define HL_MEM_OP_MAP_BLOCK            4
> +/* Opcode to create DMA-BUF object for an existing device memory allocation
> + * and to export an FD of that DMA-BUF back to the caller
> + */
> +#define HL_MEM_OP_EXPORT_DMABUF_FD     5
>
>  /* Memory flags */
>  #define HL_MEM_CONTIGUOUS      0x1
> @@ -878,11 +882,26 @@ struct hl_mem_in {
>                         /* Virtual address returned from HL_MEM_OP_MAP */
>                         __u64 device_virt_addr;
>                 } unmap;
> +
> +               /* HL_MEM_OP_EXPORT_DMABUF_FD */
> +               struct {
> +                       /* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
> +                        * where we don't have MMU for the device memory, the
> +                        * driver expects a physical address (instead of
> +                        * a handle) in the device memory space.
> +                        */
> +                       __u64 handle;
> +                       /* Size of memory allocation. Relevant only for GAUDI */
> +                       __u64 mem_size;
> +               } export_dmabuf_fd;
>         };
>
>         /* HL_MEM_OP_* */
>         __u32 op;
> -       /* HL_MEM_* flags */
> +       /* HL_MEM_* flags.
> +        * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
> +        * DMA-BUF file/FD flags.
> +        */
>         __u32 flags;
>         /* Context ID - Currently not in use */
>         __u32 ctx_id;
> @@ -919,6 +938,13 @@ struct hl_mem_out {
>
>                         __u32 pad;
>                 };
> +
> +               /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
> +                * DMA-BUF object that was created to describe a memory
> +                * allocation on the device's memory space. The FD should be
> +                * passed to the importer driver
> +                */
> +               __u64 fd;
>         };
>  };
>
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 12:28   ` Daniel Vetter
  (?)
@ 2021-06-21 13:02     ` Greg KH
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg KH @ 2021-06-21 13:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Oded Gabbay, Jason Gunthorpe, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > User process might want to share the device memory with another
> > driver/device, and to allow it to access it over PCIe (P2P).
> >
> > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > exporter support, so the other driver can import the device memory and
> > access it.
> >
> > The device memory is allocated using our existing allocation uAPI,
> > where the user will get a handle that represents the allocation.
> >
> > The user will then need to call the new
> > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> >
> > The driver will return a FD that represents the DMA-BUF object that
> > was created to match that allocation.
> >
> > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> 
> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

What's wrong with other drivers using dmabufs and even dma_fence?  It's
a common problem when shuffling memory around systems, why is that
somehow only allowed for gpu drivers?

There are many users of these structures in the kernel today that are
not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
thing that drivers want to do (throw chunks of memory around from
userspace to hardware).

I'm not trying to be a pain here, but I really do not understand why
this is a problem.  A kernel api is present, why not use it by other
in-kernel drivers?  We had the problem in the past where subsystems were
trying to create their own interfaces for the same thing, which is why
you all created the dmabuf api to help unify this.

> Also I'm wondering which is the other driver that we share buffers
> with. The gaudi stuff doesn't have real struct pages as backing
> storage, it only fills out the dma_addr_t. That tends to blow up with
> other drivers, and the only place where this is guaranteed to work is
> if you have a dynamic importer which sets the allow_peer2peer flag.
> Adding maintainers from other subsystems who might want to chime in
> here. So even aside of the big question as-is this is broken.

From what I can tell this driver is sending the buffers to other
instances of the same hardware, as that's what is on the other "end" of
the network connection.  No different from IB's use of RDMA, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 13:02     ` Greg KH
  0 siblings, 0 replies; 143+ messages in thread
From: Greg KH @ 2021-06-21 13:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Gal Pressman, sleybo, linux-rdma, Leon Romanovsky, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > User process might want to share the device memory with another
> > driver/device, and to allow it to access it over PCIe (P2P).
> >
> > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > exporter support, so the other driver can import the device memory and
> > access it.
> >
> > The device memory is allocated using our existing allocation uAPI,
> > where the user will get a handle that represents the allocation.
> >
> > The user will then need to call the new
> > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> >
> > The driver will return a FD that represents the DMA-BUF object that
> > was created to match that allocation.
> >
> > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> 
> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

What's wrong with other drivers using dmabufs and even dma_fence?  It's
a common problem when shuffling memory around systems, why is that
somehow only allowed for gpu drivers?

There are many users of these structures in the kernel today that are
not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
thing that drivers want to do (throw chunks of memory around from
userspace to hardware).

I'm not trying to be a pain here, but I really do not understand why
this is a problem.  A kernel api is present, why not use it by other
in-kernel drivers?  We had the problem in the past where subsystems were
trying to create their own interfaces for the same thing, which is why
you all created the dmabuf api to help unify this.

> Also I'm wondering which is the other driver that we share buffers
> with. The gaudi stuff doesn't have real struct pages as backing
> storage, it only fills out the dma_addr_t. That tends to blow up with
> other drivers, and the only place where this is guaranteed to work is
> if you have a dynamic importer which sets the allow_peer2peer flag.
> Adding maintainers from other subsystems who might want to chime in
> here. So even aside of the big question as-is this is broken.

From what I can tell this driver is sending the buffers to other
instances of the same hardware, as that's what is on the other "end" of
the network connection.  No different from IB's use of RDMA, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 13:02     ` Greg KH
  0 siblings, 0 replies; 143+ messages in thread
From: Greg KH @ 2021-06-21 13:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Gal Pressman, sleybo, linux-rdma, Leon Romanovsky, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher, airlied,
	Sumit Semwal, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > User process might want to share the device memory with another
> > driver/device, and to allow it to access it over PCIe (P2P).
> >
> > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > exporter support, so the other driver can import the device memory and
> > access it.
> >
> > The device memory is allocated using our existing allocation uAPI,
> > where the user will get a handle that represents the allocation.
> >
> > The user will then need to call the new
> > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> >
> > The driver will return a FD that represents the DMA-BUF object that
> > was created to match that allocation.
> >
> > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> 
> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

What's wrong with other drivers using dmabufs and even dma_fence?  It's
a common problem when shuffling memory around systems, why is that
somehow only allowed for gpu drivers?

There are many users of these structures in the kernel today that are
not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
thing that drivers want to do (throw chunks of memory around from
userspace to hardware).

I'm not trying to be a pain here, but I really do not understand why
this is a problem.  A kernel api is present, why not use it by other
in-kernel drivers?  We had the problem in the past where subsystems were
trying to create their own interfaces for the same thing, which is why
you all created the dmabuf api to help unify this.

> Also I'm wondering which is the other driver that we share buffers
> with. The gaudi stuff doesn't have real struct pages as backing
> storage, it only fills out the dma_addr_t. That tends to blow up with
> other drivers, and the only place where this is guaranteed to work is
> if you have a dynamic importer which sets the allow_peer2peer flag.
> Adding maintainers from other subsystems who might want to chime in
> here. So even aside of the big question as-is this is broken.

From what I can tell this driver is sending the buffers to other
instances of the same hardware, as that's what is on the other "end" of
the network connection.  No different from IB's use of RDMA, right?

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 13:02     ` Greg KH
  (?)
@ 2021-06-21 14:12       ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Daniel Vetter, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware,

A dmabuf is consumed by something else in the kernel calling
dma_buf_map_attachment() on the FD.

What is the other side of this? I don't see any
dma_buf_map_attachment() calls in drivers/misc, or added in this patch
set.

AFAIK the only viable in-tree other side is in mlx5 (look in
umem_dmabuf.c)

Though as we already talked habana has their own networking (out of
tree, presumably) so I suspect this is really to support some out of
tree stuff??

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:12       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware,

A dmabuf is consumed by something else in the kernel calling
dma_buf_map_attachment() on the FD.

What is the other side of this? I don't see any
dma_buf_map_attachment() calls in drivers/misc, or added in this patch
set.

AFAIK the only viable in-tree other side is in mlx5 (look in
umem_dmabuf.c)

Though as we already talked habana has their own networking (out of
tree, presumably) so I suspect this is really to support some out of
tree stuff??

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:12       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, airlied, Sumit Semwal,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware,

A dmabuf is consumed by something else in the kernel calling
dma_buf_map_attachment() on the FD.

What is the other side of this? I don't see any
dma_buf_map_attachment() calls in drivers/misc, or added in this patch
set.

AFAIK the only viable in-tree other side is in mlx5 (look in
umem_dmabuf.c)

Though as we already talked habana has their own networking (out of
tree, presumably) so I suspect this is really to support some out of
tree stuff??

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 12:28   ` Daniel Vetter
  (?)
@ 2021-06-21 14:17     ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Oded Gabbay, Christoph Hellwig, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Greg KH, Sumit Semwal,
	Christian König, Gal Pressman, sleybo, dri-devel,
	Tomer Tayar, moderated list:DMA BUFFER SHARING FRAMEWORK,
	amd-gfx list, Alex Deucher, Leon Romanovsky

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

Well, we can't even see the other side of this so who knows

This is a new uAPI, where is the userspace? In RDMA at least I require
to see the new userspace and test suite before changes to
include/uapi/rdma can go ahead.

> Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
> wildcard match so that you can catch these next time around too? At
> least when people use scripts/get_maintainers.pl correctly. All the
> other subsystems using dma-buf are on there already (dri-devel,
> linux-media and linaro-mm-sig for android/arm embedded stuff).

My bigger concern is this doesn't seem to be implementing PCI P2P DMA
correctly. This is following the same hacky NULL page approach that
Christoph Hellwig already NAK'd for AMD.

This should not be allowed to proliferate.

I would be much happier seeing this be done using the approach of
Logan's series here:

https://lore.kernel.org/linux-block/20210513223203.5542-1-logang@deltatee.com/

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:17     ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: sleybo, Gal Pressman, Christian König, linux-rdma, Greg KH,
	Oded Gabbay, Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, Christoph Hellwig,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

Well, we can't even see the other side of this so who knows

This is a new uAPI, where is the userspace? In RDMA at least I require
to see the new userspace and test suite before changes to
include/uapi/rdma can go ahead.

> Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
> wildcard match so that you can catch these next time around too? At
> least when people use scripts/get_maintainers.pl correctly. All the
> other subsystems using dma-buf are on there already (dri-devel,
> linux-media and linaro-mm-sig for android/arm embedded stuff).

My bigger concern is this doesn't seem to be implementing PCI P2P DMA
correctly. This is following the same hacky NULL page approach that
Christoph Hellwig already NAK'd for AMD.

This should not be allowed to proliferate.

I would be much happier seeing this be done using the approach of
Logan's series here:

https://lore.kernel.org/linux-block/20210513223203.5542-1-logang@deltatee.com/

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:17     ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: sleybo, Gal Pressman, Christian König, linux-rdma, Greg KH,
	Oded Gabbay, Linux Kernel Mailing List, dri-devel, Sumit Semwal,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, airlied,
	Christoph Hellwig, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:

> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

Well, we can't even see the other side of this so who knows

This is a new uAPI, where is the userspace? In RDMA at least I require
to see the new userspace and test suite before changes to
include/uapi/rdma can go ahead.

> Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
> wildcard match so that you can catch these next time around too? At
> least when people use scripts/get_maintainers.pl correctly. All the
> other subsystems using dma-buf are on there already (dri-devel,
> linux-media and linaro-mm-sig for android/arm embedded stuff).

My bigger concern is this doesn't seem to be implementing PCI P2P DMA
correctly. This is following the same hacky NULL page approach that
Christoph Hellwig already NAK'd for AMD.

This should not be allowed to proliferate.

I would be much happier seeing this be done using the approach of
Logan's series here:

https://lore.kernel.org/linux-block/20210513223203.5542-1-logang@deltatee.com/

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 13:02     ` Greg KH
  (?)
@ 2021-06-21 14:20       ` Daniel Vetter
  -1 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 14:20 UTC (permalink / raw)
  To: Greg KH
  Cc: Daniel Vetter, Oded Gabbay, Jason Gunthorpe, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > > User process might want to share the device memory with another
> > > driver/device, and to allow it to access it over PCIe (P2P).
> > >
> > > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > > exporter support, so the other driver can import the device memory and
> > > access it.
> > >
> > > The device memory is allocated using our existing allocation uAPI,
> > > where the user will get a handle that represents the allocation.
> > >
> > > The user will then need to call the new
> > > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> > >
> > > The driver will return a FD that represents the DMA-BUF object that
> > > was created to match that allocation.
> > >
> > > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> > 
> > Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> > driver is now trying to use gpu infrastructure. And seems to have
> > gained vram meanwhile too. Next up is going to be synchronization
> > using dma_fence so you can pass buffers back&forth without stalls
> > among drivers.
> 
> What's wrong with other drivers using dmabufs and even dma_fence?  It's
> a common problem when shuffling memory around systems, why is that
> somehow only allowed for gpu drivers?
> 
> There are many users of these structures in the kernel today that are
> not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
> thing that drivers want to do (throw chunks of memory around from
> userspace to hardware).
> 
> I'm not trying to be a pain here, but I really do not understand why
> this is a problem.  A kernel api is present, why not use it by other
> in-kernel drivers?  We had the problem in the past where subsystems were
> trying to create their own interfaces for the same thing, which is why
> you all created the dmabuf api to help unify this.

It's the same thing as ever. 90% of an accel driver are in userspace,
that's where all the fun is, that's where the big picture review needs to
happen, and we've very conveniently bypassed all that a few years back
because it was too annoying.

Once we have the full driver stack and can start reviewing it I have no
objections to totally-not-gpus using all this stuff too. But until we can
do that this is all just causing headaches.

Ofc if you assume that userspace doesn't matter then you don't care, which
is where this giantic disconnect comes from.

Also unless we're actually doing this properly there's zero incentive for
me to review the kernel code and check whether it follows the rules
correctly, so you have excellent chances that you just break the rules.
And dma_buf/fence are tricky enough that you pretty much guaranteed to
break the rules if you're not involved in the discussions. Just now we
have a big one where everyone involved (who's been doing this for 10+
years all at least) realizes we've fucked up big time.

Anyway we've had this discussion, we're not going to move anyone here at
all, so *shrug*. I'll keep seeing accelarators in drivers/misc as blantant
bypassing of review by actual accelerator pieces, you keep seing dri-devel
as ... well I dunno, people who don't know what they're talking about
maybe. Or not relevant to your totally-not-a-gpu thing.

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware, as that's what is on the other "end" of
> the network connection.  No different from IB's use of RDMA, right?

There's no import afaict, but maybe I missed it. Assuming I haven't missed
it the importing necessarily has to happen by some other drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:20       ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 14:20 UTC (permalink / raw)
  To: Greg KH
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > > User process might want to share the device memory with another
> > > driver/device, and to allow it to access it over PCIe (P2P).
> > >
> > > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > > exporter support, so the other driver can import the device memory and
> > > access it.
> > >
> > > The device memory is allocated using our existing allocation uAPI,
> > > where the user will get a handle that represents the allocation.
> > >
> > > The user will then need to call the new
> > > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> > >
> > > The driver will return a FD that represents the DMA-BUF object that
> > > was created to match that allocation.
> > >
> > > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> > 
> > Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> > driver is now trying to use gpu infrastructure. And seems to have
> > gained vram meanwhile too. Next up is going to be synchronization
> > using dma_fence so you can pass buffers back&forth without stalls
> > among drivers.
> 
> What's wrong with other drivers using dmabufs and even dma_fence?  It's
> a common problem when shuffling memory around systems, why is that
> somehow only allowed for gpu drivers?
> 
> There are many users of these structures in the kernel today that are
> not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
> thing that drivers want to do (throw chunks of memory around from
> userspace to hardware).
> 
> I'm not trying to be a pain here, but I really do not understand why
> this is a problem.  A kernel api is present, why not use it by other
> in-kernel drivers?  We had the problem in the past where subsystems were
> trying to create their own interfaces for the same thing, which is why
> you all created the dmabuf api to help unify this.

It's the same thing as ever. 90% of an accel driver are in userspace,
that's where all the fun is, that's where the big picture review needs to
happen, and we've very conveniently bypassed all that a few years back
because it was too annoying.

Once we have the full driver stack and can start reviewing it I have no
objections to totally-not-gpus using all this stuff too. But until we can
do that this is all just causing headaches.

Ofc if you assume that userspace doesn't matter then you don't care, which
is where this giantic disconnect comes from.

Also unless we're actually doing this properly there's zero incentive for
me to review the kernel code and check whether it follows the rules
correctly, so you have excellent chances that you just break the rules.
And dma_buf/fence are tricky enough that you pretty much guaranteed to
break the rules if you're not involved in the discussions. Just now we
have a big one where everyone involved (who's been doing this for 10+
years all at least) realizes we've fucked up big time.

Anyway we've had this discussion, we're not going to move anyone here at
all, so *shrug*. I'll keep seeing accelarators in drivers/misc as blantant
bypassing of review by actual accelerator pieces, you keep seing dri-devel
as ... well I dunno, people who don't know what they're talking about
maybe. Or not relevant to your totally-not-a-gpu thing.

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware, as that's what is on the other "end" of
> the network connection.  No different from IB's use of RDMA, right?

There's no import afaict, but maybe I missed it. Assuming I haven't missed
it the importing necessarily has to happen by some other drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:20       ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 14:20 UTC (permalink / raw)
  To: Greg KH
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher, airlied,
	Sumit Semwal, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > > User process might want to share the device memory with another
> > > driver/device, and to allow it to access it over PCIe (P2P).
> > >
> > > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > > exporter support, so the other driver can import the device memory and
> > > access it.
> > >
> > > The device memory is allocated using our existing allocation uAPI,
> > > where the user will get a handle that represents the allocation.
> > >
> > > The user will then need to call the new
> > > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> > >
> > > The driver will return a FD that represents the DMA-BUF object that
> > > was created to match that allocation.
> > >
> > > Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> > > Reviewed-by: Tomer Tayar <ttayar@habana.ai>
> > 
> > Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> > driver is now trying to use gpu infrastructure. And seems to have
> > gained vram meanwhile too. Next up is going to be synchronization
> > using dma_fence so you can pass buffers back&forth without stalls
> > among drivers.
> 
> What's wrong with other drivers using dmabufs and even dma_fence?  It's
> a common problem when shuffling memory around systems, why is that
> somehow only allowed for gpu drivers?
> 
> There are many users of these structures in the kernel today that are
> not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
> thing that drivers want to do (throw chunks of memory around from
> userspace to hardware).
> 
> I'm not trying to be a pain here, but I really do not understand why
> this is a problem.  A kernel api is present, why not use it by other
> in-kernel drivers?  We had the problem in the past where subsystems were
> trying to create their own interfaces for the same thing, which is why
> you all created the dmabuf api to help unify this.

It's the same thing as ever. 90% of an accel driver are in userspace,
that's where all the fun is, that's where the big picture review needs to
happen, and we've very conveniently bypassed all that a few years back
because it was too annoying.

Once we have the full driver stack and can start reviewing it I have no
objections to totally-not-gpus using all this stuff too. But until we can
do that this is all just causing headaches.

Ofc if you assume that userspace doesn't matter then you don't care, which
is where this giantic disconnect comes from.

Also unless we're actually doing this properly there's zero incentive for
me to review the kernel code and check whether it follows the rules
correctly, so you have excellent chances that you just break the rules.
And dma_buf/fence are tricky enough that you pretty much guaranteed to
break the rules if you're not involved in the discussions. Just now we
have a big one where everyone involved (who's been doing this for 10+
years all at least) realizes we've fucked up big time.

Anyway we've had this discussion, we're not going to move anyone here at
all, so *shrug*. I'll keep seeing accelarators in drivers/misc as blantant
bypassing of review by actual accelerator pieces, you keep seing dri-devel
as ... well I dunno, people who don't know what they're talking about
maybe. Or not relevant to your totally-not-a-gpu thing.

> > Also I'm wondering which is the other driver that we share buffers
> > with. The gaudi stuff doesn't have real struct pages as backing
> > storage, it only fills out the dma_addr_t. That tends to blow up with
> > other drivers, and the only place where this is guaranteed to work is
> > if you have a dynamic importer which sets the allow_peer2peer flag.
> > Adding maintainers from other subsystems who might want to chime in
> > here. So even aside of the big question as-is this is broken.
> 
> From what I can tell this driver is sending the buffers to other
> instances of the same hardware, as that's what is on the other "end" of
> the network connection.  No different from IB's use of RDMA, right?

There's no import afaict, but maybe I missed it. Assuming I haven't missed
it the importing necessarily has to happen by some other drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 14:20       ` Daniel Vetter
@ 2021-06-21 14:49         ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:49 UTC (permalink / raw)
  To: Greg KH, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 04:20:35PM +0200, Daniel Vetter wrote:

> Also unless we're actually doing this properly there's zero incentive for
> me to review the kernel code and check whether it follows the rules
> correctly, so you have excellent chances that you just break the rules.
> And dma_buf/fence are tricky enough that you pretty much guaranteed to
> break the rules if you're not involved in the discussions. Just now we
> have a big one where everyone involved (who's been doing this for 10+
> years all at least) realizes we've fucked up big time.

This is where I come from on dmabuf, it is fiendishly
complicated. Don't use it unless you absoultely have to, are in DRM,
and have people like Daniel helping to make sure you use it right.

It's whole premise and design is compromised by specialty historical
implementation choices on the GPU side.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 14:49         ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 14:49 UTC (permalink / raw)
  To: Greg KH, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 04:20:35PM +0200, Daniel Vetter wrote:

> Also unless we're actually doing this properly there's zero incentive for
> me to review the kernel code and check whether it follows the rules
> correctly, so you have excellent chances that you just break the rules.
> And dma_buf/fence are tricky enough that you pretty much guaranteed to
> break the rules if you're not involved in the discussions. Just now we
> have a big one where everyone involved (who's been doing this for 10+
> years all at least) realizes we've fucked up big time.

This is where I come from on dmabuf, it is fiendishly
complicated. Don't use it unless you absoultely have to, are in DRM,
and have people like Daniel helping to make sure you use it right.

It's whole premise and design is compromised by specialty historical
implementation choices on the GPU side.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 14:12       ` Jason Gunthorpe
  (?)
@ 2021-06-21 16:26         ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Greg KH, Daniel Vetter, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
>
> > > Also I'm wondering which is the other driver that we share buffers
> > > with. The gaudi stuff doesn't have real struct pages as backing
> > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > other drivers, and the only place where this is guaranteed to work is
> > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > Adding maintainers from other subsystems who might want to chime in
> > > here. So even aside of the big question as-is this is broken.
> >
> > From what I can tell this driver is sending the buffers to other
> > instances of the same hardware,
>
> A dmabuf is consumed by something else in the kernel calling
> dma_buf_map_attachment() on the FD.
>
> What is the other side of this? I don't see any
> dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> set.

This patch-set is only to enable the support for the exporter side.
The "other side" is any generic RDMA networking device that will want
to perform p2p communication over PCIe with our GAUDI accelerator.
An example is indeed the mlnx5 card which has already integrated
support for being an "importer".

This is *not* used for communication with another GAUDI device. If I
want to communicate with another GAUDI device, our userspace
communications library will use our internal network links, without
any need for dma-buf.

Oded

>
> AFAIK the only viable in-tree other side is in mlx5 (look in
> umem_dmabuf.c)
>
> Though as we already talked habana has their own networking (out of
> tree, presumably) so I suspect this is really to support some out of
> tree stuff??
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 16:26         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Daniel Vetter, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
>
> > > Also I'm wondering which is the other driver that we share buffers
> > > with. The gaudi stuff doesn't have real struct pages as backing
> > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > other drivers, and the only place where this is guaranteed to work is
> > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > Adding maintainers from other subsystems who might want to chime in
> > > here. So even aside of the big question as-is this is broken.
> >
> > From what I can tell this driver is sending the buffers to other
> > instances of the same hardware,
>
> A dmabuf is consumed by something else in the kernel calling
> dma_buf_map_attachment() on the FD.
>
> What is the other side of this? I don't see any
> dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> set.

This patch-set is only to enable the support for the exporter side.
The "other side" is any generic RDMA networking device that will want
to perform p2p communication over PCIe with our GAUDI accelerator.
An example is indeed the mlnx5 card which has already integrated
support for being an "importer".

This is *not* used for communication with another GAUDI device. If I
want to communicate with another GAUDI device, our userspace
communications library will use our internal network links, without
any need for dma-buf.

Oded

>
> AFAIK the only viable in-tree other side is in mlx5 (look in
> umem_dmabuf.c)
>
> Though as we already talked habana has their own networking (out of
> tree, presumably) so I suspect this is really to support some out of
> tree stuff??
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 16:26         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Daniel Vetter, Alex Deucher, airlied,
	Sumit Semwal, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
>
> > > Also I'm wondering which is the other driver that we share buffers
> > > with. The gaudi stuff doesn't have real struct pages as backing
> > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > other drivers, and the only place where this is guaranteed to work is
> > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > Adding maintainers from other subsystems who might want to chime in
> > > here. So even aside of the big question as-is this is broken.
> >
> > From what I can tell this driver is sending the buffers to other
> > instances of the same hardware,
>
> A dmabuf is consumed by something else in the kernel calling
> dma_buf_map_attachment() on the FD.
>
> What is the other side of this? I don't see any
> dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> set.

This patch-set is only to enable the support for the exporter side.
The "other side" is any generic RDMA networking device that will want
to perform p2p communication over PCIe with our GAUDI accelerator.
An example is indeed the mlnx5 card which has already integrated
support for being an "importer".

This is *not* used for communication with another GAUDI device. If I
want to communicate with another GAUDI device, our userspace
communications library will use our internal network links, without
any need for dma-buf.

Oded

>
> AFAIK the only viable in-tree other side is in mlx5 (look in
> umem_dmabuf.c)
>
> Though as we already talked habana has their own networking (out of
> tree, presumably) so I suspect this is really to support some out of
> tree stuff??
>
> Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 16:26         ` Oded Gabbay
  (?)
@ 2021-06-21 17:55           ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 17:55 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg KH, Daniel Vetter, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> >
> > > > Also I'm wondering which is the other driver that we share buffers
> > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > other drivers, and the only place where this is guaranteed to work is
> > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > Adding maintainers from other subsystems who might want to chime in
> > > > here. So even aside of the big question as-is this is broken.
> > >
> > > From what I can tell this driver is sending the buffers to other
> > > instances of the same hardware,
> >
> > A dmabuf is consumed by something else in the kernel calling
> > dma_buf_map_attachment() on the FD.
> >
> > What is the other side of this? I don't see any
> > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > set.
> 
> This patch-set is only to enable the support for the exporter side.
> The "other side" is any generic RDMA networking device that will want
> to perform p2p communication over PCIe with our GAUDI accelerator.
> An example is indeed the mlnx5 card which has already integrated
> support for being an "importer".

It raises the question of how you are testing this if you aren't using
it with the only intree driver: mlx5.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 17:55           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 17:55 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Daniel Vetter, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> >
> > > > Also I'm wondering which is the other driver that we share buffers
> > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > other drivers, and the only place where this is guaranteed to work is
> > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > Adding maintainers from other subsystems who might want to chime in
> > > > here. So even aside of the big question as-is this is broken.
> > >
> > > From what I can tell this driver is sending the buffers to other
> > > instances of the same hardware,
> >
> > A dmabuf is consumed by something else in the kernel calling
> > dma_buf_map_attachment() on the FD.
> >
> > What is the other side of this? I don't see any
> > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > set.
> 
> This patch-set is only to enable the support for the exporter side.
> The "other side" is any generic RDMA networking device that will want
> to perform p2p communication over PCIe with our GAUDI accelerator.
> An example is indeed the mlnx5 card which has already integrated
> support for being an "importer".

It raises the question of how you are testing this if you aren't using
it with the only intree driver: mlx5.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 17:55           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 17:55 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Daniel Vetter, Alex Deucher, airlied,
	Sumit Semwal, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> >
> > > > Also I'm wondering which is the other driver that we share buffers
> > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > other drivers, and the only place where this is guaranteed to work is
> > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > Adding maintainers from other subsystems who might want to chime in
> > > > here. So even aside of the big question as-is this is broken.
> > >
> > > From what I can tell this driver is sending the buffers to other
> > > instances of the same hardware,
> >
> > A dmabuf is consumed by something else in the kernel calling
> > dma_buf_map_attachment() on the FD.
> >
> > What is the other side of this? I don't see any
> > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > set.
> 
> This patch-set is only to enable the support for the exporter side.
> The "other side" is any generic RDMA networking device that will want
> to perform p2p communication over PCIe with our GAUDI accelerator.
> An example is indeed the mlnx5 card which has already integrated
> support for being an "importer".

It raises the question of how you are testing this if you aren't using
it with the only intree driver: mlx5.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 17:55           ` Jason Gunthorpe
  (?)
@ 2021-06-21 18:27             ` Daniel Vetter
  -1 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 18:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Greg KH, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > >
> > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > here. So even aside of the big question as-is this is broken.
> > > >
> > > > From what I can tell this driver is sending the buffers to other
> > > > instances of the same hardware,
> > >
> > > A dmabuf is consumed by something else in the kernel calling
> > > dma_buf_map_attachment() on the FD.
> > >
> > > What is the other side of this? I don't see any
> > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > set.
> >
> > This patch-set is only to enable the support for the exporter side.
> > The "other side" is any generic RDMA networking device that will want
> > to perform p2p communication over PCIe with our GAUDI accelerator.
> > An example is indeed the mlnx5 card which has already integrated
> > support for being an "importer".
>
> It raises the question of how you are testing this if you aren't using
> it with the only intree driver: mlx5.

For p2p dma-buf there's also amdgpu as a possible in-tree candiate
driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
combos being much in use, at least with upstream gpu drivers (nvidia
blob is a different story ofc, but I don't care what they do in their
own world).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 18:27             ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 18:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > >
> > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > here. So even aside of the big question as-is this is broken.
> > > >
> > > > From what I can tell this driver is sending the buffers to other
> > > > instances of the same hardware,
> > >
> > > A dmabuf is consumed by something else in the kernel calling
> > > dma_buf_map_attachment() on the FD.
> > >
> > > What is the other side of this? I don't see any
> > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > set.
> >
> > This patch-set is only to enable the support for the exporter side.
> > The "other side" is any generic RDMA networking device that will want
> > to perform p2p communication over PCIe with our GAUDI accelerator.
> > An example is indeed the mlnx5 card which has already integrated
> > support for being an "importer".
>
> It raises the question of how you are testing this if you aren't using
> it with the only intree driver: mlx5.

For p2p dma-buf there's also amdgpu as a possible in-tree candiate
driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
combos being much in use, at least with upstream gpu drivers (nvidia
blob is a different story ofc, but I don't care what they do in their
own world).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 18:27             ` Daniel Vetter
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Vetter @ 2021-06-21 18:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Gal Pressman, sleybo, linux-rdma, Greg KH,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, airlied, Sumit Semwal,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > >
> > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > here. So even aside of the big question as-is this is broken.
> > > >
> > > > From what I can tell this driver is sending the buffers to other
> > > > instances of the same hardware,
> > >
> > > A dmabuf is consumed by something else in the kernel calling
> > > dma_buf_map_attachment() on the FD.
> > >
> > > What is the other side of this? I don't see any
> > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > set.
> >
> > This patch-set is only to enable the support for the exporter side.
> > The "other side" is any generic RDMA networking device that will want
> > to perform p2p communication over PCIe with our GAUDI accelerator.
> > An example is indeed the mlnx5 card which has already integrated
> > support for being an "importer".
>
> It raises the question of how you are testing this if you aren't using
> it with the only intree driver: mlx5.

For p2p dma-buf there's also amdgpu as a possible in-tree candiate
driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
combos being much in use, at least with upstream gpu drivers (nvidia
blob is a different story ofc, but I don't care what they do in their
own world).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 18:27             ` Daniel Vetter
  (?)
@ 2021-06-21 19:24               ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 19:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Greg KH, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 9:27 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > > >
> > > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > > here. So even aside of the big question as-is this is broken.
> > > > >
> > > > > From what I can tell this driver is sending the buffers to other
> > > > > instances of the same hardware,
> > > >
> > > > A dmabuf is consumed by something else in the kernel calling
> > > > dma_buf_map_attachment() on the FD.
> > > >
> > > > What is the other side of this? I don't see any
> > > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > > set.
> > >
> > > This patch-set is only to enable the support for the exporter side.
> > > The "other side" is any generic RDMA networking device that will want
> > > to perform p2p communication over PCIe with our GAUDI accelerator.
> > > An example is indeed the mlnx5 card which has already integrated
> > > support for being an "importer".
> >
> > It raises the question of how you are testing this if you aren't using
> > it with the only intree driver: mlx5.
>
> For p2p dma-buf there's also amdgpu as a possible in-tree candiate
> driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
> combos being much in use, at least with upstream gpu drivers (nvidia
> blob is a different story ofc, but I don't care what they do in their
> own world).
> -Daniel
> --
We have/are doing three things:
1. I wrote a simple "importer" driver that emulates an RDMA driver. It
calls all the IB_UMEM_DMABUF functions, same as the mlnx5 driver does.
And instead of using h/w, it accesses the bar directly. We wrote
several tests that emulated the real application. i.e. asking the
habanalabs driver to create dma-buf object and export its FD back to
userspace. Then the userspace sends the FD to the "importer" driver,
which attaches to it, get the SG list and accesses the memory on the
GAUDI device. This gave me the confidence that how we integrated the
exporter is basically correct/working.

2. We are trying to do a POC with a MLNX card we have, WIP.

3. We are working with another 3rd party RDMA device that its driver
is now adding support for being an "importer". also WIP

In both points 2&3 We haven't yet reached the actual stage of checking
this feature.

Another thing I want to emphasize is that we are doing p2p only
through the export/import of the FD. We do *not* allow the user to
mmap the dma-buf as we do not support direct IO. So there is no access
to these pages through the userspace.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 19:24               ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 19:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 9:27 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > > >
> > > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > > here. So even aside of the big question as-is this is broken.
> > > > >
> > > > > From what I can tell this driver is sending the buffers to other
> > > > > instances of the same hardware,
> > > >
> > > > A dmabuf is consumed by something else in the kernel calling
> > > > dma_buf_map_attachment() on the FD.
> > > >
> > > > What is the other side of this? I don't see any
> > > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > > set.
> > >
> > > This patch-set is only to enable the support for the exporter side.
> > > The "other side" is any generic RDMA networking device that will want
> > > to perform p2p communication over PCIe with our GAUDI accelerator.
> > > An example is indeed the mlnx5 card which has already integrated
> > > support for being an "importer".
> >
> > It raises the question of how you are testing this if you aren't using
> > it with the only intree driver: mlx5.
>
> For p2p dma-buf there's also amdgpu as a possible in-tree candiate
> driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
> combos being much in use, at least with upstream gpu drivers (nvidia
> blob is a different story ofc, but I don't care what they do in their
> own world).
> -Daniel
> --
We have/are doing three things:
1. I wrote a simple "importer" driver that emulates an RDMA driver. It
calls all the IB_UMEM_DMABUF functions, same as the mlnx5 driver does.
And instead of using h/w, it accesses the bar directly. We wrote
several tests that emulated the real application. i.e. asking the
habanalabs driver to create dma-buf object and export its FD back to
userspace. Then the userspace sends the FD to the "importer" driver,
which attaches to it, get the SG list and accesses the memory on the
GAUDI device. This gave me the confidence that how we integrated the
exporter is basically correct/working.

2. We are trying to do a POC with a MLNX card we have, WIP.

3. We are working with another 3rd party RDMA device that its driver
is now adding support for being an "importer". also WIP

In both points 2&3 We haven't yet reached the actual stage of checking
this feature.

Another thing I want to emphasize is that we are doing p2p only
through the export/import of the FD. We do *not* allow the user to
mmap the dma-buf as we do not support direct IO. So there is no access
to these pages through the userspace.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 19:24               ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-21 19:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Gal Pressman, sleybo, linux-rdma, Greg KH, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Alex Deucher, airlied,
	Sumit Semwal, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 9:27 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Mon, Jun 21, 2021 at 7:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> > > On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> > > >
> > > > > > Also I'm wondering which is the other driver that we share buffers
> > > > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > > > other drivers, and the only place where this is guaranteed to work is
> > > > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > > > Adding maintainers from other subsystems who might want to chime in
> > > > > > here. So even aside of the big question as-is this is broken.
> > > > >
> > > > > From what I can tell this driver is sending the buffers to other
> > > > > instances of the same hardware,
> > > >
> > > > A dmabuf is consumed by something else in the kernel calling
> > > > dma_buf_map_attachment() on the FD.
> > > >
> > > > What is the other side of this? I don't see any
> > > > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > > > set.
> > >
> > > This patch-set is only to enable the support for the exporter side.
> > > The "other side" is any generic RDMA networking device that will want
> > > to perform p2p communication over PCIe with our GAUDI accelerator.
> > > An example is indeed the mlnx5 card which has already integrated
> > > support for being an "importer".
> >
> > It raises the question of how you are testing this if you aren't using
> > it with the only intree driver: mlx5.
>
> For p2p dma-buf there's also amdgpu as a possible in-tree candiate
> driver, that's why I added amdgpu folks. Otoh I'm not aware of AI+GPU
> combos being much in use, at least with upstream gpu drivers (nvidia
> blob is a different story ofc, but I don't care what they do in their
> own world).
> -Daniel
> --
We have/are doing three things:
1. I wrote a simple "importer" driver that emulates an RDMA driver. It
calls all the IB_UMEM_DMABUF functions, same as the mlnx5 driver does.
And instead of using h/w, it accesses the bar directly. We wrote
several tests that emulated the real application. i.e. asking the
habanalabs driver to create dma-buf object and export its FD back to
userspace. Then the userspace sends the FD to the "importer" driver,
which attaches to it, get the SG list and accesses the memory on the
GAUDI device. This gave me the confidence that how we integrated the
exporter is basically correct/working.

2. We are trying to do a POC with a MLNX card we have, WIP.

3. We are working with another 3rd party RDMA device that its driver
is now adding support for being an "importer". also WIP

In both points 2&3 We haven't yet reached the actual stage of checking
this feature.

Another thing I want to emphasize is that we are doing p2p only
through the export/import of the FD. We do *not* allow the user to
mmap the dma-buf as we do not support direct IO. So there is no access
to these pages through the userspace.

Thanks,
Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 19:24               ` Oded Gabbay
  (?)
@ 2021-06-21 23:29                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 23:29 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Daniel Vetter, Greg KH, Oded Gabbay, linux-rdma,
	open list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford, airlied,
	Linux Kernel Mailing List, Sumit Semwal, Christian König,
	Gal Pressman, sleybo, dri-devel, Tomer Tayar,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Alex Deucher, Leon Romanovsky, Christoph Hellwig

On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:

> Another thing I want to emphasize is that we are doing p2p only
> through the export/import of the FD. We do *not* allow the user to
> mmap the dma-buf as we do not support direct IO. So there is no access
> to these pages through the userspace.

Arguably mmaping the memory is a better choice, and is the direction
that Logan's series goes in. Here the use of DMABUF was specifically
designed to allow hitless revokation of the memory, which this isn't
even using.

So you are taking the hit of very limited hardware support and reduced
performance just to squeeze into DMABUF..

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 23:29                 ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 23:29 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:

> Another thing I want to emphasize is that we are doing p2p only
> through the export/import of the FD. We do *not* allow the user to
> mmap the dma-buf as we do not support direct IO. So there is no access
> to these pages through the userspace.

Arguably mmaping the memory is a better choice, and is the direction
that Logan's series goes in. Here the use of DMABUF was specifically
designed to allow hitless revokation of the memory, which this isn't
even using.

So you are taking the hit of very limited hardware support and reduced
performance just to squeeze into DMABUF..

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-21 23:29                 ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-21 23:29 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, sleybo, linux-rdma, Daniel Vetter, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher, airlied,
	Sumit Semwal, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:

> Another thing I want to emphasize is that we are doing p2p only
> through the export/import of the FD. We do *not* allow the user to
> mmap the dma-buf as we do not support direct IO. So there is no access
> to these pages through the userspace.

Arguably mmaping the memory is a better choice, and is the direction
that Logan's series goes in. Here the use of DMABUF was specifically
designed to allow hitless revokation of the memory, which this isn't
even using.

So you are taking the hit of very limited hardware support and reduced
performance just to squeeze into DMABUF..

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-21 23:29                 ` Jason Gunthorpe
  (?)
@ 2021-06-22  6:37                   ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Gal Pressman, sleybo, linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>
>> Another thing I want to emphasize is that we are doing p2p only
>> through the export/import of the FD. We do *not* allow the user to
>> mmap the dma-buf as we do not support direct IO. So there is no access
>> to these pages through the userspace.
> Arguably mmaping the memory is a better choice, and is the direction
> that Logan's series goes in. Here the use of DMABUF was specifically
> designed to allow hitless revokation of the memory, which this isn't
> even using.

The major problem with this approach is that DMA-buf is also used for 
memory which isn't CPU accessible.

That was one of the reasons we didn't even considered using the mapping 
memory approach for GPUs.

Regards,
Christian.

>
> So you are taking the hit of very limited hardware support and reduced
> performance just to squeeze into DMABUF..
>
> Jason
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22  6:37                   ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Christian König, linux-rdma, Greg KH, Oded Gabbay,
	Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, Christoph Hellwig,
	sleybo, Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>
>> Another thing I want to emphasize is that we are doing p2p only
>> through the export/import of the FD. We do *not* allow the user to
>> mmap the dma-buf as we do not support direct IO. So there is no access
>> to these pages through the userspace.
> Arguably mmaping the memory is a better choice, and is the direction
> that Logan's series goes in. Here the use of DMABUF was specifically
> designed to allow hitless revokation of the memory, which this isn't
> even using.

The major problem with this approach is that DMA-buf is also used for 
memory which isn't CPU accessible.

That was one of the reasons we didn't even considered using the mapping 
memory approach for GPUs.

Regards,
Christian.

>
> So you are taking the hit of very limited hardware support and reduced
> performance just to squeeze into DMABUF..
>
> Jason
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22  6:37                   ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Christian König, linux-rdma, Greg KH, Oded Gabbay,
	Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Alex Deucher, Christoph Hellwig,
	sleybo, Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>
>> Another thing I want to emphasize is that we are doing p2p only
>> through the export/import of the FD. We do *not* allow the user to
>> mmap the dma-buf as we do not support direct IO. So there is no access
>> to these pages through the userspace.
> Arguably mmaping the memory is a better choice, and is the direction
> that Logan's series goes in. Here the use of DMABUF was specifically
> designed to allow hitless revokation of the memory, which this isn't
> even using.

The major problem with this approach is that DMA-buf is also used for 
memory which isn't CPU accessible.

That was one of the reasons we didn't even considered using the mapping 
memory approach for GPUs.

Regards,
Christian.

>
> So you are taking the hit of very limited hardware support and reduced
> performance just to squeeze into DMABUF..
>
> Jason
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22  6:37                   ` Christian König
  (?)
@ 2021-06-22  8:42                     ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22  8:42 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, Gal Pressman, sleybo, linux-rdma, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 9:37 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> >
> >> Another thing I want to emphasize is that we are doing p2p only
> >> through the export/import of the FD. We do *not* allow the user to
> >> mmap the dma-buf as we do not support direct IO. So there is no access
> >> to these pages through the userspace.
> > Arguably mmaping the memory is a better choice, and is the direction
> > that Logan's series goes in. Here the use of DMABUF was specifically
> > designed to allow hitless revokation of the memory, which this isn't
> > even using.
>
> The major problem with this approach is that DMA-buf is also used for
> memory which isn't CPU accessible.
>
> That was one of the reasons we didn't even considered using the mapping
> memory approach for GPUs.
>
> Regards,
> Christian.
>
> >
> > So you are taking the hit of very limited hardware support and reduced
> > performance just to squeeze into DMABUF..

Thanks Jason for the clarification, but I honestly prefer to use
DMA-BUF at the moment.
It gives us just what we need (even more than what we need as you
pointed out), it is *already* integrated and tested in the RDMA
subsystem, and I'm feeling comfortable using it as I'm somewhat
familiar with it from my AMD days.

I'll go and read Logan's patch-set to see if that will work for us in
the future. Please remember, as Daniel said, we don't have struct page
backing our device memory, so if that is a requirement to connect to
Logan's work, then I don't think we will want to do it at this point.

Thanks,
Oded

> >
> > Jason
> > _______________________________________________
> > Linaro-mm-sig mailing list
> > Linaro-mm-sig@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22  8:42                     ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22  8:42 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-rdma, Linux Kernel Mailing List,
	sleybo, Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 9:37 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> >
> >> Another thing I want to emphasize is that we are doing p2p only
> >> through the export/import of the FD. We do *not* allow the user to
> >> mmap the dma-buf as we do not support direct IO. So there is no access
> >> to these pages through the userspace.
> > Arguably mmaping the memory is a better choice, and is the direction
> > that Logan's series goes in. Here the use of DMABUF was specifically
> > designed to allow hitless revokation of the memory, which this isn't
> > even using.
>
> The major problem with this approach is that DMA-buf is also used for
> memory which isn't CPU accessible.
>
> That was one of the reasons we didn't even considered using the mapping
> memory approach for GPUs.
>
> Regards,
> Christian.
>
> >
> > So you are taking the hit of very limited hardware support and reduced
> > performance just to squeeze into DMABUF..

Thanks Jason for the clarification, but I honestly prefer to use
DMA-BUF at the moment.
It gives us just what we need (even more than what we need as you
pointed out), it is *already* integrated and tested in the RDMA
subsystem, and I'm feeling comfortable using it as I'm somewhat
familiar with it from my AMD days.

I'll go and read Logan's patch-set to see if that will work for us in
the future. Please remember, as Daniel said, we don't have struct page
backing our device memory, so if that is a requirement to connect to
Logan's work, then I don't think we will want to do it at this point.

Thanks,
Oded

> >
> > Jason
> > _______________________________________________
> > Linaro-mm-sig mailing list
> > Linaro-mm-sig@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22  8:42                     ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22  8:42 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-rdma, Linux Kernel Mailing List,
	sleybo, Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 9:37 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> >
> >> Another thing I want to emphasize is that we are doing p2p only
> >> through the export/import of the FD. We do *not* allow the user to
> >> mmap the dma-buf as we do not support direct IO. So there is no access
> >> to these pages through the userspace.
> > Arguably mmaping the memory is a better choice, and is the direction
> > that Logan's series goes in. Here the use of DMABUF was specifically
> > designed to allow hitless revokation of the memory, which this isn't
> > even using.
>
> The major problem with this approach is that DMA-buf is also used for
> memory which isn't CPU accessible.
>
> That was one of the reasons we didn't even considered using the mapping
> memory approach for GPUs.
>
> Regards,
> Christian.
>
> >
> > So you are taking the hit of very limited hardware support and reduced
> > performance just to squeeze into DMABUF..

Thanks Jason for the clarification, but I honestly prefer to use
DMA-BUF at the moment.
It gives us just what we need (even more than what we need as you
pointed out), it is *already* integrated and tested in the RDMA
subsystem, and I'm feeling comfortable using it as I'm somewhat
familiar with it from my AMD days.

I'll go and read Logan's patch-set to see if that will work for us in
the future. Please remember, as Daniel said, we don't have struct page
backing our device memory, so if that is a requirement to connect to
Logan's work, then I don't think we will want to do it at this point.

Thanks,
Oded

> >
> > Jason
> > _______________________________________________
> > Linaro-mm-sig mailing list
> > Linaro-mm-sig@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22  8:42                     ` Oded Gabbay
  (?)
@ 2021-06-22 12:01                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 9:37 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > >
> > >> Another thing I want to emphasize is that we are doing p2p only
> > >> through the export/import of the FD. We do *not* allow the user to
> > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > >> to these pages through the userspace.
> > > Arguably mmaping the memory is a better choice, and is the direction
> > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > designed to allow hitless revokation of the memory, which this isn't
> > > even using.
> >
> > The major problem with this approach is that DMA-buf is also used for
> > memory which isn't CPU accessible.

That isn't an issue here because the memory is only intended to be
used with P2P transfers so it must be CPU accessible.

> > That was one of the reasons we didn't even considered using the mapping
> > memory approach for GPUs.

Well, now we have DEVICE_PRIVATE memory that can meet this need
too.. Just nobody has wired it up to hmm_range_fault()

> > > So you are taking the hit of very limited hardware support and reduced
> > > performance just to squeeze into DMABUF..
> 
> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.

You still have the issue that this patch is doing all of this P2P
stuff wrong - following the already NAK'd AMD approach.

> I'll go and read Logan's patch-set to see if that will work for us in
> the future. Please remember, as Daniel said, we don't have struct page
> backing our device memory, so if that is a requirement to connect to
> Logan's work, then I don't think we will want to do it at this point.

It is trivial to get the struct page for a PCI BAR.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:01                       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 9:37 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > >
> > >> Another thing I want to emphasize is that we are doing p2p only
> > >> through the export/import of the FD. We do *not* allow the user to
> > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > >> to these pages through the userspace.
> > > Arguably mmaping the memory is a better choice, and is the direction
> > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > designed to allow hitless revokation of the memory, which this isn't
> > > even using.
> >
> > The major problem with this approach is that DMA-buf is also used for
> > memory which isn't CPU accessible.

That isn't an issue here because the memory is only intended to be
used with P2P transfers so it must be CPU accessible.

> > That was one of the reasons we didn't even considered using the mapping
> > memory approach for GPUs.

Well, now we have DEVICE_PRIVATE memory that can meet this need
too.. Just nobody has wired it up to hmm_range_fault()

> > > So you are taking the hit of very limited hardware support and reduced
> > > performance just to squeeze into DMABUF..
> 
> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.

You still have the issue that this patch is doing all of this P2P
stuff wrong - following the already NAK'd AMD approach.

> I'll go and read Logan's patch-set to see if that will work for us in
> the future. Please remember, as Daniel said, we don't have struct page
> backing our device memory, so if that is a requirement to connect to
> Logan's work, then I don't think we will want to do it at this point.

It is trivial to get the struct page for a PCI BAR.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:01                       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 9:37 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > >
> > >> Another thing I want to emphasize is that we are doing p2p only
> > >> through the export/import of the FD. We do *not* allow the user to
> > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > >> to these pages through the userspace.
> > > Arguably mmaping the memory is a better choice, and is the direction
> > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > designed to allow hitless revokation of the memory, which this isn't
> > > even using.
> >
> > The major problem with this approach is that DMA-buf is also used for
> > memory which isn't CPU accessible.

That isn't an issue here because the memory is only intended to be
used with P2P transfers so it must be CPU accessible.

> > That was one of the reasons we didn't even considered using the mapping
> > memory approach for GPUs.

Well, now we have DEVICE_PRIVATE memory that can meet this need
too.. Just nobody has wired it up to hmm_range_fault()

> > > So you are taking the hit of very limited hardware support and reduced
> > > performance just to squeeze into DMABUF..
> 
> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.

You still have the issue that this patch is doing all of this P2P
stuff wrong - following the already NAK'd AMD approach.

> I'll go and read Logan's patch-set to see if that will work for us in
> the future. Please remember, as Daniel said, we don't have struct page
> backing our device memory, so if that is a requirement to connect to
> Logan's work, then I don't think we will want to do it at this point.

It is trivial to get the struct page for a PCI BAR.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 12:01                       ` Jason Gunthorpe
  (?)
@ 2021-06-22 12:04                         ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 12:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > >
> > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > >
> > > >> Another thing I want to emphasize is that we are doing p2p only
> > > >> through the export/import of the FD. We do *not* allow the user to
> > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > >> to these pages through the userspace.
> > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > designed to allow hitless revokation of the memory, which this isn't
> > > > even using.
> > >
> > > The major problem with this approach is that DMA-buf is also used for
> > > memory which isn't CPU accessible.
>
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.
>
> > > That was one of the reasons we didn't even considered using the mapping
> > > memory approach for GPUs.
>
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
> > > > So you are taking the hit of very limited hardware support and reduced
> > > > performance just to squeeze into DMABUF..
> >
> > Thanks Jason for the clarification, but I honestly prefer to use
> > DMA-BUF at the moment.
> > It gives us just what we need (even more than what we need as you
> > pointed out), it is *already* integrated and tested in the RDMA
> > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > familiar with it from my AMD days.
>
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Could you please point me exactly to the lines of code that are wrong
in your opinion ?
I find it hard to understand from your statement what exactly you
think that we are doing wrong.
The implementation is found in the second patch in this patch-set.

Thanks,
Oded
>
> > I'll go and read Logan's patch-set to see if that will work for us in
> > the future. Please remember, as Daniel said, we don't have struct page
> > backing our device memory, so if that is a requirement to connect to
> > Logan's work, then I don't think we will want to do it at this point.
>
> It is trivial to get the struct page for a PCI BAR.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:04                         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 12:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > >
> > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > >
> > > >> Another thing I want to emphasize is that we are doing p2p only
> > > >> through the export/import of the FD. We do *not* allow the user to
> > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > >> to these pages through the userspace.
> > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > designed to allow hitless revokation of the memory, which this isn't
> > > > even using.
> > >
> > > The major problem with this approach is that DMA-buf is also used for
> > > memory which isn't CPU accessible.
>
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.
>
> > > That was one of the reasons we didn't even considered using the mapping
> > > memory approach for GPUs.
>
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
> > > > So you are taking the hit of very limited hardware support and reduced
> > > > performance just to squeeze into DMABUF..
> >
> > Thanks Jason for the clarification, but I honestly prefer to use
> > DMA-BUF at the moment.
> > It gives us just what we need (even more than what we need as you
> > pointed out), it is *already* integrated and tested in the RDMA
> > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > familiar with it from my AMD days.
>
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Could you please point me exactly to the lines of code that are wrong
in your opinion ?
I find it hard to understand from your statement what exactly you
think that we are doing wrong.
The implementation is found in the second patch in this patch-set.

Thanks,
Oded
>
> > I'll go and read Logan's patch-set to see if that will work for us in
> > the future. Please remember, as Daniel said, we don't have struct page
> > backing our device memory, so if that is a requirement to connect to
> > Logan's work, then I don't think we will want to do it at this point.
>
> It is trivial to get the struct page for a PCI BAR.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:04                         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 12:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > >
> > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > >
> > > >> Another thing I want to emphasize is that we are doing p2p only
> > > >> through the export/import of the FD. We do *not* allow the user to
> > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > >> to these pages through the userspace.
> > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > designed to allow hitless revokation of the memory, which this isn't
> > > > even using.
> > >
> > > The major problem with this approach is that DMA-buf is also used for
> > > memory which isn't CPU accessible.
>
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.
>
> > > That was one of the reasons we didn't even considered using the mapping
> > > memory approach for GPUs.
>
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
> > > > So you are taking the hit of very limited hardware support and reduced
> > > > performance just to squeeze into DMABUF..
> >
> > Thanks Jason for the clarification, but I honestly prefer to use
> > DMA-BUF at the moment.
> > It gives us just what we need (even more than what we need as you
> > pointed out), it is *already* integrated and tested in the RDMA
> > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > familiar with it from my AMD days.
>
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Could you please point me exactly to the lines of code that are wrong
in your opinion ?
I find it hard to understand from your statement what exactly you
think that we are doing wrong.
The implementation is found in the second patch in this patch-set.

Thanks,
Oded
>
> > I'll go and read Logan's patch-set to see if that will work for us in
> > the future. Please remember, as Daniel said, we don't have struct page
> > backing our device memory, so if that is a requirement to connect to
> > Logan's work, then I don't think we will want to do it at this point.
>
> It is trivial to get the struct page for a PCI BAR.
>
> Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 12:04                         ` Oded Gabbay
  (?)
@ 2021-06-22 12:15                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:15 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > >
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > >
> > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > >> to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > >
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> >
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> >
> > > > That was one of the reasons we didn't even considered using the mapping
> > > > memory approach for GPUs.
> >
> > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > too.. Just nobody has wired it up to hmm_range_fault()
> >
> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > >
> > > Thanks Jason for the clarification, but I honestly prefer to use
> > > DMA-BUF at the moment.
> > > It gives us just what we need (even more than what we need as you
> > > pointed out), it is *already* integrated and tested in the RDMA
> > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > familiar with it from my AMD days.
> >
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Could you please point me exactly to the lines of code that are wrong
> in your opinion ?

1) Setting sg_page to NULL
2) 'mapping' pages for P2P DMA without going through the iommu
3) Allowing P2P DMA without using the p2p dma API to validate that it
   can work at all in the first place.

All of these result in functional bugs in certain system
configurations.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:15                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:15 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > >
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > >
> > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > >> to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > >
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> >
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> >
> > > > That was one of the reasons we didn't even considered using the mapping
> > > > memory approach for GPUs.
> >
> > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > too.. Just nobody has wired it up to hmm_range_fault()
> >
> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > >
> > > Thanks Jason for the clarification, but I honestly prefer to use
> > > DMA-BUF at the moment.
> > > It gives us just what we need (even more than what we need as you
> > > pointed out), it is *already* integrated and tested in the RDMA
> > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > familiar with it from my AMD days.
> >
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Could you please point me exactly to the lines of code that are wrong
> in your opinion ?

1) Setting sg_page to NULL
2) 'mapping' pages for P2P DMA without going through the iommu
3) Allowing P2P DMA without using the p2p dma API to validate that it
   can work at all in the first place.

All of these result in functional bugs in certain system
configurations.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:15                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 12:15 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > >
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > >
> > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > >> to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > >
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> >
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> >
> > > > That was one of the reasons we didn't even considered using the mapping
> > > > memory approach for GPUs.
> >
> > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > too.. Just nobody has wired it up to hmm_range_fault()
> >
> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > >
> > > Thanks Jason for the clarification, but I honestly prefer to use
> > > DMA-BUF at the moment.
> > > It gives us just what we need (even more than what we need as you
> > > pointed out), it is *already* integrated and tested in the RDMA
> > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > familiar with it from my AMD days.
> >
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Could you please point me exactly to the lines of code that are wrong
> in your opinion ?

1) Setting sg_page to NULL
2) 'mapping' pages for P2P DMA without going through the iommu
3) Allowing P2P DMA without using the p2p dma API to validate that it
   can work at all in the first place.

All of these result in functional bugs in certain system
configurations.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 12:01                       ` Jason Gunthorpe
  (?)
@ 2021-06-22 12:23                         ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 12:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>
>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>> to these pages through the userspace.
>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>> designed to allow hitless revokation of the memory, which this isn't
>>>> even using.
>>> The major problem with this approach is that DMA-buf is also used for
>>> memory which isn't CPU accessible.
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.

No, especially P2P is often done on memory resources which are not even 
remotely CPU accessible.

That's one of the major reasons why we use P2P in the first place. See 
the whole XGMI implementation for example.

> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.
>>> That was one of the reasons we didn't even considered using the mapping
>>> memory approach for GPUs.
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
>>>> So you are taking the hit of very limited hardware support and reduced
>>>> performance just to squeeze into DMABUF..
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Well that stuff was NAKed because we still use sg_tables, not because we 
don't want to allocate struct pages.

The plan is to push this forward since DEVICE_PRIVATE clearly can't 
handle all of our use cases and is not really a good fit to be honest.

IOMMU is now working as well, so as far as I can see we are all good here.

>> I'll go and read Logan's patch-set to see if that will work for us in
>> the future. Please remember, as Daniel said, we don't have struct page
>> backing our device memory, so if that is a requirement to connect to
>> Logan's work, then I don't think we will want to do it at this point.
> It is trivial to get the struct page for a PCI BAR.

Yeah, but it doesn't make much sense. Why should we create a struct page 
for something that isn't even memory in a lot of cases?

Regards,
Christian.



^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:23                         ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 12:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>
>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>> to these pages through the userspace.
>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>> designed to allow hitless revokation of the memory, which this isn't
>>>> even using.
>>> The major problem with this approach is that DMA-buf is also used for
>>> memory which isn't CPU accessible.
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.

No, especially P2P is often done on memory resources which are not even 
remotely CPU accessible.

That's one of the major reasons why we use P2P in the first place. See 
the whole XGMI implementation for example.

> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.
>>> That was one of the reasons we didn't even considered using the mapping
>>> memory approach for GPUs.
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
>>>> So you are taking the hit of very limited hardware support and reduced
>>>> performance just to squeeze into DMABUF..
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Well that stuff was NAKed because we still use sg_tables, not because we 
don't want to allocate struct pages.

The plan is to push this forward since DEVICE_PRIVATE clearly can't 
handle all of our use cases and is not really a good fit to be honest.

IOMMU is now working as well, so as far as I can see we are all good here.

>> I'll go and read Logan's patch-set to see if that will work for us in
>> the future. Please remember, as Daniel said, we don't have struct page
>> backing our device memory, so if that is a requirement to connect to
>> Logan's work, then I don't think we will want to do it at this point.
> It is trivial to get the struct page for a PCI BAR.

Yeah, but it doesn't make much sense. Why should we create a struct page 
for something that isn't even memory in a lot of cases?

Regards,
Christian.



^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 12:23                         ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 12:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>
>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>> to these pages through the userspace.
>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>> designed to allow hitless revokation of the memory, which this isn't
>>>> even using.
>>> The major problem with this approach is that DMA-buf is also used for
>>> memory which isn't CPU accessible.
> That isn't an issue here because the memory is only intended to be
> used with P2P transfers so it must be CPU accessible.

No, especially P2P is often done on memory resources which are not even 
remotely CPU accessible.

That's one of the major reasons why we use P2P in the first place. See 
the whole XGMI implementation for example.

> Thanks Jason for the clarification, but I honestly prefer to use
> DMA-BUF at the moment.
> It gives us just what we need (even more than what we need as you
> pointed out), it is *already* integrated and tested in the RDMA
> subsystem, and I'm feeling comfortable using it as I'm somewhat
> familiar with it from my AMD days.
>>> That was one of the reasons we didn't even considered using the mapping
>>> memory approach for GPUs.
> Well, now we have DEVICE_PRIVATE memory that can meet this need
> too.. Just nobody has wired it up to hmm_range_fault()
>
>>>> So you are taking the hit of very limited hardware support and reduced
>>>> performance just to squeeze into DMABUF..
> You still have the issue that this patch is doing all of this P2P
> stuff wrong - following the already NAK'd AMD approach.

Well that stuff was NAKed because we still use sg_tables, not because we 
don't want to allocate struct pages.

The plan is to push this forward since DEVICE_PRIVATE clearly can't 
handle all of our use cases and is not really a good fit to be honest.

IOMMU is now working as well, so as far as I can see we are all good here.

>> I'll go and read Logan's patch-set to see if that will work for us in
>> the future. Please remember, as Daniel said, we don't have struct page
>> backing our device memory, so if that is a requirement to connect to
>> Logan's work, then I don't think we will want to do it at this point.
> It is trivial to get the struct page for a PCI BAR.

Yeah, but it doesn't make much sense. Why should we create a struct page 
for something that isn't even memory in a lot of cases?

Regards,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 12:15                           ` Jason Gunthorpe
  (?)
@ 2021-06-22 13:12                             ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:15 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > >
> > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > >
> > > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > >> to these pages through the userspace.
> > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > even using.
> > > > >
> > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > memory which isn't CPU accessible.
> > >
> > > That isn't an issue here because the memory is only intended to be
> > > used with P2P transfers so it must be CPU accessible.
> > >
> > > > > That was one of the reasons we didn't even considered using the mapping
> > > > > memory approach for GPUs.
> > >
> > > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > > too.. Just nobody has wired it up to hmm_range_fault()
> > >
> > > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > > performance just to squeeze into DMABUF..
> > > >
> > > > Thanks Jason for the clarification, but I honestly prefer to use
> > > > DMA-BUF at the moment.
> > > > It gives us just what we need (even more than what we need as you
> > > > pointed out), it is *already* integrated and tested in the RDMA
> > > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > > familiar with it from my AMD days.
> > >
> > > You still have the issue that this patch is doing all of this P2P
> > > stuff wrong - following the already NAK'd AMD approach.
> >
> > Could you please point me exactly to the lines of code that are wrong
> > in your opinion ?
>
> 1) Setting sg_page to NULL
> 2) 'mapping' pages for P2P DMA without going through the iommu
> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>    can work at all in the first place.
>
> All of these result in functional bugs in certain system
> configurations.
>
> Jason

Hi Jason,
Thanks for the feedback.
Regarding point 1, why is that a problem if we disable the option to
mmap the dma-buf from user-space ? We don't want to support CPU
fallback/Direct IO.
In addition, I didn't see any problem with sg_page being NULL in the
RDMA p2p dma-buf code. Did I miss something here ?

Regarding points 2 & 3, I want to examine them more closely in a KVM
virtual machine environment with IOMMU enabled.
I will take two GAUDI devices and use one as an exporter and one as an
importer. I want to see that the solution works end-to-end, with real
device DMA from importer to exporter.
I fear that the dummy importer I wrote is bypassing these two issues
you brought up.

So thanks again and I'll get back and update once I've finished testing it.

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 13:12                             ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:15 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > >
> > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > >
> > > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > >> to these pages through the userspace.
> > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > even using.
> > > > >
> > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > memory which isn't CPU accessible.
> > >
> > > That isn't an issue here because the memory is only intended to be
> > > used with P2P transfers so it must be CPU accessible.
> > >
> > > > > That was one of the reasons we didn't even considered using the mapping
> > > > > memory approach for GPUs.
> > >
> > > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > > too.. Just nobody has wired it up to hmm_range_fault()
> > >
> > > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > > performance just to squeeze into DMABUF..
> > > >
> > > > Thanks Jason for the clarification, but I honestly prefer to use
> > > > DMA-BUF at the moment.
> > > > It gives us just what we need (even more than what we need as you
> > > > pointed out), it is *already* integrated and tested in the RDMA
> > > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > > familiar with it from my AMD days.
> > >
> > > You still have the issue that this patch is doing all of this P2P
> > > stuff wrong - following the already NAK'd AMD approach.
> >
> > Could you please point me exactly to the lines of code that are wrong
> > in your opinion ?
>
> 1) Setting sg_page to NULL
> 2) 'mapping' pages for P2P DMA without going through the iommu
> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>    can work at all in the first place.
>
> All of these result in functional bugs in certain system
> configurations.
>
> Jason

Hi Jason,
Thanks for the feedback.
Regarding point 1, why is that a problem if we disable the option to
mmap the dma-buf from user-space ? We don't want to support CPU
fallback/Direct IO.
In addition, I didn't see any problem with sg_page being NULL in the
RDMA p2p dma-buf code. Did I miss something here ?

Regarding points 2 & 3, I want to examine them more closely in a KVM
virtual machine environment with IOMMU enabled.
I will take two GAUDI devices and use one as an exporter and one as an
importer. I want to see that the solution works end-to-end, with real
device DMA from importer to exporter.
I fear that the dummy importer I wrote is bypassing these two issues
you brought up.

So thanks again and I'll get back and update once I've finished testing it.

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 13:12                             ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 3:15 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> > On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > >
> > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > >
> > > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > >> to these pages through the userspace.
> > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > even using.
> > > > >
> > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > memory which isn't CPU accessible.
> > >
> > > That isn't an issue here because the memory is only intended to be
> > > used with P2P transfers so it must be CPU accessible.
> > >
> > > > > That was one of the reasons we didn't even considered using the mapping
> > > > > memory approach for GPUs.
> > >
> > > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > > too.. Just nobody has wired it up to hmm_range_fault()
> > >
> > > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > > performance just to squeeze into DMABUF..
> > > >
> > > > Thanks Jason for the clarification, but I honestly prefer to use
> > > > DMA-BUF at the moment.
> > > > It gives us just what we need (even more than what we need as you
> > > > pointed out), it is *already* integrated and tested in the RDMA
> > > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > > familiar with it from my AMD days.
> > >
> > > You still have the issue that this patch is doing all of this P2P
> > > stuff wrong - following the already NAK'd AMD approach.
> >
> > Could you please point me exactly to the lines of code that are wrong
> > in your opinion ?
>
> 1) Setting sg_page to NULL
> 2) 'mapping' pages for P2P DMA without going through the iommu
> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>    can work at all in the first place.
>
> All of these result in functional bugs in certain system
> configurations.
>
> Jason

Hi Jason,
Thanks for the feedback.
Regarding point 1, why is that a problem if we disable the option to
mmap the dma-buf from user-space ? We don't want to support CPU
fallback/Direct IO.
In addition, I didn't see any problem with sg_page being NULL in the
RDMA p2p dma-buf code. Did I miss something here ?

Regarding points 2 & 3, I want to examine them more closely in a KVM
virtual machine environment with IOMMU enabled.
I will take two GAUDI devices and use one as an exporter and one as an
importer. I want to see that the solution works end-to-end, with real
device DMA from importer to exporter.
I fear that the dummy importer I wrote is bypassing these two issues
you brought up.

So thanks again and I'll get back and update once I've finished testing it.

Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 13:12                             ` Oded Gabbay
  (?)
@ 2021-06-22 15:11                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:11 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:

> > 1) Setting sg_page to NULL
> > 2) 'mapping' pages for P2P DMA without going through the iommu
> > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> >    can work at all in the first place.
> >
> > All of these result in functional bugs in certain system
> > configurations.
> >
> > Jason
> 
> Hi Jason,
> Thanks for the feedback.
> Regarding point 1, why is that a problem if we disable the option to
> mmap the dma-buf from user-space ? 

Userspace has nothing to do with needing struct pages or not

Point 1 and 2 mostly go together, you supporting the iommu is not nice
if you dont have struct pages.

You should study Logan's patches I pointed you at as they are solving
exactly this problem.

> In addition, I didn't see any problem with sg_page being NULL in the
> RDMA p2p dma-buf code. Did I miss something here ?

No, the design of the dmabuf requires the exporter to do the dma maps
and so it is only the exporter that is wrong to omit all the iommu and
p2p logic.

RDMA is OK today only because nobody has implemented dma buf support
in rxe/si - mainly because the only implementations of exporters don't
set the struct page and are thus buggy.

> I will take two GAUDI devices and use one as an exporter and one as an
> importer. I want to see that the solution works end-to-end, with real
> device DMA from importer to exporter.

I can tell you it doesn't. Stuffing physical addresses directly into
the sg list doesn't involve any of the IOMMU code so any configuration
that requires IOMMU page table setup will not work.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:11                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:11 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:

> > 1) Setting sg_page to NULL
> > 2) 'mapping' pages for P2P DMA without going through the iommu
> > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> >    can work at all in the first place.
> >
> > All of these result in functional bugs in certain system
> > configurations.
> >
> > Jason
> 
> Hi Jason,
> Thanks for the feedback.
> Regarding point 1, why is that a problem if we disable the option to
> mmap the dma-buf from user-space ? 

Userspace has nothing to do with needing struct pages or not

Point 1 and 2 mostly go together, you supporting the iommu is not nice
if you dont have struct pages.

You should study Logan's patches I pointed you at as they are solving
exactly this problem.

> In addition, I didn't see any problem with sg_page being NULL in the
> RDMA p2p dma-buf code. Did I miss something here ?

No, the design of the dmabuf requires the exporter to do the dma maps
and so it is only the exporter that is wrong to omit all the iommu and
p2p logic.

RDMA is OK today only because nobody has implemented dma buf support
in rxe/si - mainly because the only implementations of exporters don't
set the struct page and are thus buggy.

> I will take two GAUDI devices and use one as an exporter and one as an
> importer. I want to see that the solution works end-to-end, with real
> device DMA from importer to exporter.

I can tell you it doesn't. Stuffing physical addresses directly into
the sg list doesn't involve any of the IOMMU code so any configuration
that requires IOMMU page table setup will not work.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:11                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:11 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:

> > 1) Setting sg_page to NULL
> > 2) 'mapping' pages for P2P DMA without going through the iommu
> > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> >    can work at all in the first place.
> >
> > All of these result in functional bugs in certain system
> > configurations.
> >
> > Jason
> 
> Hi Jason,
> Thanks for the feedback.
> Regarding point 1, why is that a problem if we disable the option to
> mmap the dma-buf from user-space ? 

Userspace has nothing to do with needing struct pages or not

Point 1 and 2 mostly go together, you supporting the iommu is not nice
if you dont have struct pages.

You should study Logan's patches I pointed you at as they are solving
exactly this problem.

> In addition, I didn't see any problem with sg_page being NULL in the
> RDMA p2p dma-buf code. Did I miss something here ?

No, the design of the dmabuf requires the exporter to do the dma maps
and so it is only the exporter that is wrong to omit all the iommu and
p2p logic.

RDMA is OK today only because nobody has implemented dma buf support
in rxe/si - mainly because the only implementations of exporters don't
set the struct page and are thus buggy.

> I will take two GAUDI devices and use one as an exporter and one as an
> importer. I want to see that the solution works end-to-end, with real
> device DMA from importer to exporter.

I can tell you it doesn't. Stuffing physical addresses directly into
the sg list doesn't involve any of the IOMMU code so any configuration
that requires IOMMU page table setup will not work.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 12:23                         ` Christian König
  (?)
@ 2021-06-22 15:23                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:23 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > 
> > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> 
> No, especially P2P is often done on memory resources which are not even
> remotely CPU accessible.

That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
CPU accessible.

> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Well that stuff was NAKed because we still use sg_tables, not because we
> don't want to allocate struct pages.

sg lists in general.
 
> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
> all of our use cases and is not really a good fit to be honest.
> 
> IOMMU is now working as well, so as far as I can see we are all good here.

How? Is that more AMD special stuff?

This patch series never calls to the iommu driver, AFAICT.

> > > I'll go and read Logan's patch-set to see if that will work for us in
> > > the future. Please remember, as Daniel said, we don't have struct page
> > > backing our device memory, so if that is a requirement to connect to
> > > Logan's work, then I don't think we will want to do it at this point.
> > It is trivial to get the struct page for a PCI BAR.
> 
> Yeah, but it doesn't make much sense. Why should we create a struct page for
> something that isn't even memory in a lot of cases?

Because the iommu and other places need this handle to setup their
stuff. Nobody has yet been brave enough to try to change those flows
to be able to use a physical CPU address.

This is why we have a special struct page type just for PCI BAR
memory.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:23                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:23 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > 
> > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> 
> No, especially P2P is often done on memory resources which are not even
> remotely CPU accessible.

That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
CPU accessible.

> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Well that stuff was NAKed because we still use sg_tables, not because we
> don't want to allocate struct pages.

sg lists in general.
 
> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
> all of our use cases and is not really a good fit to be honest.
> 
> IOMMU is now working as well, so as far as I can see we are all good here.

How? Is that more AMD special stuff?

This patch series never calls to the iommu driver, AFAICT.

> > > I'll go and read Logan's patch-set to see if that will work for us in
> > > the future. Please remember, as Daniel said, we don't have struct page
> > > backing our device memory, so if that is a requirement to connect to
> > > Logan's work, then I don't think we will want to do it at this point.
> > It is trivial to get the struct page for a PCI BAR.
> 
> Yeah, but it doesn't make much sense. Why should we create a struct page for
> something that isn't even memory in a lot of cases?

Because the iommu and other places need this handle to setup their
stuff. Nobody has yet been brave enough to try to change those flows
to be able to use a physical CPU address.

This is why we have a special struct page type just for PCI BAR
memory.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:23                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:23 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > 
> > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> 
> No, especially P2P is often done on memory resources which are not even
> remotely CPU accessible.

That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
CPU accessible.

> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
> 
> Well that stuff was NAKed because we still use sg_tables, not because we
> don't want to allocate struct pages.

sg lists in general.
 
> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
> all of our use cases and is not really a good fit to be honest.
> 
> IOMMU is now working as well, so as far as I can see we are all good here.

How? Is that more AMD special stuff?

This patch series never calls to the iommu driver, AFAICT.

> > > I'll go and read Logan's patch-set to see if that will work for us in
> > > the future. Please remember, as Daniel said, we don't have struct page
> > > backing our device memory, so if that is a requirement to connect to
> > > Logan's work, then I don't think we will want to do it at this point.
> > It is trivial to get the struct page for a PCI BAR.
> 
> Yeah, but it doesn't make much sense. Why should we create a struct page for
> something that isn't even memory in a lot of cases?

Because the iommu and other places need this handle to setup their
stuff. Nobody has yet been brave enough to try to change those flows
to be able to use a physical CPU address.

This is why we have a special struct page type just for PCI BAR
memory.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:11                               ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:24                                 ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:11 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
>>> 1) Setting sg_page to NULL
>>> 2) 'mapping' pages for P2P DMA without going through the iommu
>>> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>>>     can work at all in the first place.
>>>
>>> All of these result in functional bugs in certain system
>>> configurations.
>>>
>>> Jason
>> Hi Jason,
>> Thanks for the feedback.
>> Regarding point 1, why is that a problem if we disable the option to
>> mmap the dma-buf from user-space ?
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
>
>> In addition, I didn't see any problem with sg_page being NULL in the
>> RDMA p2p dma-buf code. Did I miss something here ?
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
> set the struct page and are thus buggy.
>
>> I will take two GAUDI devices and use one as an exporter and one as an
>> importer. I want to see that the solution works end-to-end, with real
>> device DMA from importer to exporter.
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.

Sure it does. See amdgpu_vram_mgr_alloc_sgt:

         amdgpu_res_first(res, offset, length, &cursor);
         for_each_sgtable_sg((*sgt), sg, i) {
                 phys_addr_t phys = cursor.start + adev->gmc.aper_base;
                 size_t size = cursor.size;
                 dma_addr_t addr;

                 addr = dma_map_resource(dev, phys, size, dir,
                                         DMA_ATTR_SKIP_CPU_SYNC);
                 r = dma_mapping_error(dev, addr);
                 if (r)
                         goto error_unmap;

                 sg_set_page(sg, NULL, size, 0);
                 sg_dma_address(sg) = addr;
                 sg_dma_len(sg) = size;

                 amdgpu_res_next(&cursor, cursor.size);
         }

dma_map_resource() does the IOMMU mapping for us.

Regards,
Christian.


>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:24                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:11 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
>>> 1) Setting sg_page to NULL
>>> 2) 'mapping' pages for P2P DMA without going through the iommu
>>> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>>>     can work at all in the first place.
>>>
>>> All of these result in functional bugs in certain system
>>> configurations.
>>>
>>> Jason
>> Hi Jason,
>> Thanks for the feedback.
>> Regarding point 1, why is that a problem if we disable the option to
>> mmap the dma-buf from user-space ?
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
>
>> In addition, I didn't see any problem with sg_page being NULL in the
>> RDMA p2p dma-buf code. Did I miss something here ?
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
> set the struct page and are thus buggy.
>
>> I will take two GAUDI devices and use one as an exporter and one as an
>> importer. I want to see that the solution works end-to-end, with real
>> device DMA from importer to exporter.
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.

Sure it does. See amdgpu_vram_mgr_alloc_sgt:

         amdgpu_res_first(res, offset, length, &cursor);
         for_each_sgtable_sg((*sgt), sg, i) {
                 phys_addr_t phys = cursor.start + adev->gmc.aper_base;
                 size_t size = cursor.size;
                 dma_addr_t addr;

                 addr = dma_map_resource(dev, phys, size, dir,
                                         DMA_ATTR_SKIP_CPU_SYNC);
                 r = dma_mapping_error(dev, addr);
                 if (r)
                         goto error_unmap;

                 sg_set_page(sg, NULL, size, 0);
                 sg_dma_address(sg) = addr;
                 sg_dma_len(sg) = size;

                 amdgpu_res_next(&cursor, cursor.size);
         }

dma_map_resource() does the IOMMU mapping for us.

Regards,
Christian.


>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:24                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:11 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
>>> 1) Setting sg_page to NULL
>>> 2) 'mapping' pages for P2P DMA without going through the iommu
>>> 3) Allowing P2P DMA without using the p2p dma API to validate that it
>>>     can work at all in the first place.
>>>
>>> All of these result in functional bugs in certain system
>>> configurations.
>>>
>>> Jason
>> Hi Jason,
>> Thanks for the feedback.
>> Regarding point 1, why is that a problem if we disable the option to
>> mmap the dma-buf from user-space ?
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
>
>> In addition, I didn't see any problem with sg_page being NULL in the
>> RDMA p2p dma-buf code. Did I miss something here ?
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
> set the struct page and are thus buggy.
>
>> I will take two GAUDI devices and use one as an exporter and one as an
>> importer. I want to see that the solution works end-to-end, with real
>> device DMA from importer to exporter.
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.

Sure it does. See amdgpu_vram_mgr_alloc_sgt:

         amdgpu_res_first(res, offset, length, &cursor);
         for_each_sgtable_sg((*sgt), sg, i) {
                 phys_addr_t phys = cursor.start + adev->gmc.aper_base;
                 size_t size = cursor.size;
                 dma_addr_t addr;

                 addr = dma_map_resource(dev, phys, size, dir,
                                         DMA_ATTR_SKIP_CPU_SYNC);
                 r = dma_mapping_error(dev, addr);
                 if (r)
                         goto error_unmap;

                 sg_set_page(sg, NULL, size, 0);
                 sg_dma_address(sg) = addr;
                 sg_dma_len(sg) = size;

                 amdgpu_res_next(&cursor, cursor.size);
         }

dma_map_resource() does the IOMMU mapping for us.

Regards,
Christian.


>
> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:11                               ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:24                                 ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
> > > 1) Setting sg_page to NULL
> > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > >    can work at all in the first place.
> > >
> > > All of these result in functional bugs in certain system
> > > configurations.
> > >
> > > Jason
> >
> > Hi Jason,
> > Thanks for the feedback.
> > Regarding point 1, why is that a problem if we disable the option to
> > mmap the dma-buf from user-space ?
>
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have
a hole in my understanding.
I'm missing the connection between iommu support (which I must have of
course) and struct pages.

>
> > In addition, I didn't see any problem with sg_page being NULL in the
> > RDMA p2p dma-buf code. Did I miss something here ?
>
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?

> set the struct page and are thus buggy.

ok...
so how come that patch-set was merged into 5.12 if it's buggy ?
Because the current exporters are buggy ?  I probably need a history
lesson here.
But I understand why you think it's a bad idea to add a new buggy exporter.

>
> > I will take two GAUDI devices and use one as an exporter and one as an
> > importer. I want to see that the solution works end-to-end, with real
> > device DMA from importer to exporter.
>
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.
>
> Jason

Yes, that's what I expect to see. But I want to see it with my own
eyes and then figure out how to solve this.
Maybe the result will be going to Logan's path, maybe something else,
but I need to start by seeing the failure in a real system.

Thanks for the information, it is really helpful.

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:24                                 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
> > > 1) Setting sg_page to NULL
> > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > >    can work at all in the first place.
> > >
> > > All of these result in functional bugs in certain system
> > > configurations.
> > >
> > > Jason
> >
> > Hi Jason,
> > Thanks for the feedback.
> > Regarding point 1, why is that a problem if we disable the option to
> > mmap the dma-buf from user-space ?
>
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have
a hole in my understanding.
I'm missing the connection between iommu support (which I must have of
course) and struct pages.

>
> > In addition, I didn't see any problem with sg_page being NULL in the
> > RDMA p2p dma-buf code. Did I miss something here ?
>
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?

> set the struct page and are thus buggy.

ok...
so how come that patch-set was merged into 5.12 if it's buggy ?
Because the current exporters are buggy ?  I probably need a history
lesson here.
But I understand why you think it's a bad idea to add a new buggy exporter.

>
> > I will take two GAUDI devices and use one as an exporter and one as an
> > importer. I want to see that the solution works end-to-end, with real
> > device DMA from importer to exporter.
>
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.
>
> Jason

Yes, that's what I expect to see. But I want to see it with my own
eyes and then figure out how to solve this.
Maybe the result will be going to Logan's path, maybe something else,
but I need to start by seeing the failure in a real system.

Thanks for the information, it is really helpful.

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:24                                 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
>
> > > 1) Setting sg_page to NULL
> > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > >    can work at all in the first place.
> > >
> > > All of these result in functional bugs in certain system
> > > configurations.
> > >
> > > Jason
> >
> > Hi Jason,
> > Thanks for the feedback.
> > Regarding point 1, why is that a problem if we disable the option to
> > mmap the dma-buf from user-space ?
>
> Userspace has nothing to do with needing struct pages or not
>
> Point 1 and 2 mostly go together, you supporting the iommu is not nice
> if you dont have struct pages.
>
> You should study Logan's patches I pointed you at as they are solving
> exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have
a hole in my understanding.
I'm missing the connection between iommu support (which I must have of
course) and struct pages.

>
> > In addition, I didn't see any problem with sg_page being NULL in the
> > RDMA p2p dma-buf code. Did I miss something here ?
>
> No, the design of the dmabuf requires the exporter to do the dma maps
> and so it is only the exporter that is wrong to omit all the iommu and
> p2p logic.
>
> RDMA is OK today only because nobody has implemented dma buf support
> in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?

> set the struct page and are thus buggy.

ok...
so how come that patch-set was merged into 5.12 if it's buggy ?
Because the current exporters are buggy ?  I probably need a history
lesson here.
But I understand why you think it's a bad idea to add a new buggy exporter.

>
> > I will take two GAUDI devices and use one as an exporter and one as an
> > importer. I want to see that the solution works end-to-end, with real
> > device DMA from importer to exporter.
>
> I can tell you it doesn't. Stuffing physical addresses directly into
> the sg list doesn't involve any of the IOMMU code so any configuration
> that requires IOMMU page table setup will not work.
>
> Jason

Yes, that's what I expect to see. But I want to see it with my own
eyes and then figure out how to solve this.
Maybe the result will be going to Logan's path, maybe something else,
but I need to start by seeing the failure in a real system.

Thanks for the information, it is really helpful.

Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:24                                 ` Christian König
  (?)
@ 2021-06-22 15:28                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:28 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:

> > > I will take two GAUDI devices and use one as an exporter and one as an
> > > importer. I want to see that the solution works end-to-end, with real
> > > device DMA from importer to exporter.
> > I can tell you it doesn't. Stuffing physical addresses directly into
> > the sg list doesn't involve any of the IOMMU code so any configuration
> > that requires IOMMU page table setup will not work.
> 
> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> 
>         amdgpu_res_first(res, offset, length, &cursor);
         ^^^^^^^^^^

I'm not talking about the AMD driver, I'm talking about this patch.

+		bar_address = hdev->dram_pci_bar_start +
+				(pages[cur_page] - prop->dram_base_address);
+		sg_dma_address(sg) = bar_address;

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:28                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:28 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:

> > > I will take two GAUDI devices and use one as an exporter and one as an
> > > importer. I want to see that the solution works end-to-end, with real
> > > device DMA from importer to exporter.
> > I can tell you it doesn't. Stuffing physical addresses directly into
> > the sg list doesn't involve any of the IOMMU code so any configuration
> > that requires IOMMU page table setup will not work.
> 
> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> 
>         amdgpu_res_first(res, offset, length, &cursor);
         ^^^^^^^^^^

I'm not talking about the AMD driver, I'm talking about this patch.

+		bar_address = hdev->dram_pci_bar_start +
+				(pages[cur_page] - prop->dram_base_address);
+		sg_dma_address(sg) = bar_address;

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:28                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:28 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:

> > > I will take two GAUDI devices and use one as an exporter and one as an
> > > importer. I want to see that the solution works end-to-end, with real
> > > device DMA from importer to exporter.
> > I can tell you it doesn't. Stuffing physical addresses directly into
> > the sg list doesn't involve any of the IOMMU code so any configuration
> > that requires IOMMU page table setup will not work.
> 
> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> 
>         amdgpu_res_first(res, offset, length, &cursor);
         ^^^^^^^^^^

I'm not talking about the AMD driver, I'm talking about this patch.

+		bar_address = hdev->dram_pci_bar_start +
+				(pages[cur_page] - prop->dram_base_address);
+		sg_dma_address(sg) = bar_address;

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:23                           ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:29                             ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>
>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>>>> to these pages through the userspace.
>>>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>>>> designed to allow hitless revokation of the memory, which this isn't
>>>>>> even using.
>>>>> The major problem with this approach is that DMA-buf is also used for
>>>>> memory which isn't CPU accessible.
>>> That isn't an issue here because the memory is only intended to be
>>> used with P2P transfers so it must be CPU accessible.
>> No, especially P2P is often done on memory resources which are not even
>> remotely CPU accessible.
> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> CPU accessible.

No absolutely not. NVidia GPUs work exactly the same way.

And you have tons of similar cases in embedded and SoC systems where 
intermediate memory between devices isn't directly addressable with the CPU.

>>>>>> So you are taking the hit of very limited hardware support and reduced
>>>>>> performance just to squeeze into DMABUF..
>>> You still have the issue that this patch is doing all of this P2P
>>> stuff wrong - following the already NAK'd AMD approach.
>> Well that stuff was NAKed because we still use sg_tables, not because we
>> don't want to allocate struct pages.
> sg lists in general.
>   
>> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
>> all of our use cases and is not really a good fit to be honest.
>>
>> IOMMU is now working as well, so as far as I can see we are all good here.
> How? Is that more AMD special stuff?

No, just using the dma_map_resource() interface.

We have that working on tons of IOMMU enabled systems.

> This patch series never calls to the iommu driver, AFAICT.
>
>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>> the future. Please remember, as Daniel said, we don't have struct page
>>>> backing our device memory, so if that is a requirement to connect to
>>>> Logan's work, then I don't think we will want to do it at this point.
>>> It is trivial to get the struct page for a PCI BAR.
>> Yeah, but it doesn't make much sense. Why should we create a struct page for
>> something that isn't even memory in a lot of cases?
> Because the iommu and other places need this handle to setup their
> stuff. Nobody has yet been brave enough to try to change those flows
> to be able to use a physical CPU address.

Well that is certainly not true. I'm just not sure if that works with 
all IOMMU drivers thought.

Would need to ping Felix when the support for this was merged.

Regards,
Christian.

>
> This is why we have a special struct page type just for PCI BAR
> memory.
>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:29                             ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>
>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>>>> to these pages through the userspace.
>>>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>>>> designed to allow hitless revokation of the memory, which this isn't
>>>>>> even using.
>>>>> The major problem with this approach is that DMA-buf is also used for
>>>>> memory which isn't CPU accessible.
>>> That isn't an issue here because the memory is only intended to be
>>> used with P2P transfers so it must be CPU accessible.
>> No, especially P2P is often done on memory resources which are not even
>> remotely CPU accessible.
> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> CPU accessible.

No absolutely not. NVidia GPUs work exactly the same way.

And you have tons of similar cases in embedded and SoC systems where 
intermediate memory between devices isn't directly addressable with the CPU.

>>>>>> So you are taking the hit of very limited hardware support and reduced
>>>>>> performance just to squeeze into DMABUF..
>>> You still have the issue that this patch is doing all of this P2P
>>> stuff wrong - following the already NAK'd AMD approach.
>> Well that stuff was NAKed because we still use sg_tables, not because we
>> don't want to allocate struct pages.
> sg lists in general.
>   
>> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
>> all of our use cases and is not really a good fit to be honest.
>>
>> IOMMU is now working as well, so as far as I can see we are all good here.
> How? Is that more AMD special stuff?

No, just using the dma_map_resource() interface.

We have that working on tons of IOMMU enabled systems.

> This patch series never calls to the iommu driver, AFAICT.
>
>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>> the future. Please remember, as Daniel said, we don't have struct page
>>>> backing our device memory, so if that is a requirement to connect to
>>>> Logan's work, then I don't think we will want to do it at this point.
>>> It is trivial to get the struct page for a PCI BAR.
>> Yeah, but it doesn't make much sense. Why should we create a struct page for
>> something that isn't even memory in a lot of cases?
> Because the iommu and other places need this handle to setup their
> stuff. Nobody has yet been brave enough to try to change those flows
> to be able to use a physical CPU address.

Well that is certainly not true. I'm just not sure if that works with 
all IOMMU drivers thought.

Would need to ping Felix when the support for this was merged.

Regards,
Christian.

>
> This is why we have a special struct page type just for PCI BAR
> memory.
>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:29                             ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>
>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no access
>>>>>>> to these pages through the userspace.
>>>>>> Arguably mmaping the memory is a better choice, and is the direction
>>>>>> that Logan's series goes in. Here the use of DMABUF was specifically
>>>>>> designed to allow hitless revokation of the memory, which this isn't
>>>>>> even using.
>>>>> The major problem with this approach is that DMA-buf is also used for
>>>>> memory which isn't CPU accessible.
>>> That isn't an issue here because the memory is only intended to be
>>> used with P2P transfers so it must be CPU accessible.
>> No, especially P2P is often done on memory resources which are not even
>> remotely CPU accessible.
> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> CPU accessible.

No absolutely not. NVidia GPUs work exactly the same way.

And you have tons of similar cases in embedded and SoC systems where 
intermediate memory between devices isn't directly addressable with the CPU.

>>>>>> So you are taking the hit of very limited hardware support and reduced
>>>>>> performance just to squeeze into DMABUF..
>>> You still have the issue that this patch is doing all of this P2P
>>> stuff wrong - following the already NAK'd AMD approach.
>> Well that stuff was NAKed because we still use sg_tables, not because we
>> don't want to allocate struct pages.
> sg lists in general.
>   
>> The plan is to push this forward since DEVICE_PRIVATE clearly can't handle
>> all of our use cases and is not really a good fit to be honest.
>>
>> IOMMU is now working as well, so as far as I can see we are all good here.
> How? Is that more AMD special stuff?

No, just using the dma_map_resource() interface.

We have that working on tons of IOMMU enabled systems.

> This patch series never calls to the iommu driver, AFAICT.
>
>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>> the future. Please remember, as Daniel said, we don't have struct page
>>>> backing our device memory, so if that is a requirement to connect to
>>>> Logan's work, then I don't think we will want to do it at this point.
>>> It is trivial to get the struct page for a PCI BAR.
>> Yeah, but it doesn't make much sense. Why should we create a struct page for
>> something that isn't even memory in a lot of cases?
> Because the iommu and other places need this handle to setup their
> stuff. Nobody has yet been brave enough to try to change those flows
> to be able to use a physical CPU address.

Well that is certainly not true. I'm just not sure if that works with 
all IOMMU drivers thought.

Would need to ping Felix when the support for this was merged.

Regards,
Christian.

>
> This is why we have a special struct page type just for PCI BAR
> memory.
>
> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:28                                   ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:31                                     ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:28 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
> > > > I will take two GAUDI devices and use one as an exporter and one as an
> > > > importer. I want to see that the solution works end-to-end, with real
> > > > device DMA from importer to exporter.
> > > I can tell you it doesn't. Stuffing physical addresses directly into
> > > the sg list doesn't involve any of the IOMMU code so any configuration
> > > that requires IOMMU page table setup will not work.
> >
> > Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >
> >         amdgpu_res_first(res, offset, length, &cursor);
>          ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +               bar_address = hdev->dram_pci_bar_start +
> +                               (pages[cur_page] - prop->dram_base_address);
> +               sg_dma_address(sg) = bar_address;
>
> Jason
Yes, you are correct of course, but what will happen Jason, If I will
add a call to dma_map_resource() like Christian said ?
Won't that solve that specific issue ?
That's why I want to try it...

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:31                                     ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:28 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
> > > > I will take two GAUDI devices and use one as an exporter and one as an
> > > > importer. I want to see that the solution works end-to-end, with real
> > > > device DMA from importer to exporter.
> > > I can tell you it doesn't. Stuffing physical addresses directly into
> > > the sg list doesn't involve any of the IOMMU code so any configuration
> > > that requires IOMMU page table setup will not work.
> >
> > Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >
> >         amdgpu_res_first(res, offset, length, &cursor);
>          ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +               bar_address = hdev->dram_pci_bar_start +
> +                               (pages[cur_page] - prop->dram_base_address);
> +               sg_dma_address(sg) = bar_address;
>
> Jason
Yes, you are correct of course, but what will happen Jason, If I will
add a call to dma_map_resource() like Christian said ?
Won't that solve that specific issue ?
That's why I want to try it...

Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:31                                     ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:28 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
> > > > I will take two GAUDI devices and use one as an exporter and one as an
> > > > importer. I want to see that the solution works end-to-end, with real
> > > > device DMA from importer to exporter.
> > > I can tell you it doesn't. Stuffing physical addresses directly into
> > > the sg list doesn't involve any of the IOMMU code so any configuration
> > > that requires IOMMU page table setup will not work.
> >
> > Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >
> >         amdgpu_res_first(res, offset, length, &cursor);
>          ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +               bar_address = hdev->dram_pci_bar_start +
> +                               (pages[cur_page] - prop->dram_base_address);
> +               sg_dma_address(sg) = bar_address;
>
> Jason
Yes, you are correct of course, but what will happen Jason, If I will
add a call to dma_map_resource() like Christian said ?
Won't that solve that specific issue ?
That's why I want to try it...

Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:28                                   ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:31                                     ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>> importer. I want to see that the solution works end-to-end, with real
>>>> device DMA from importer to exporter.
>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>> that requires IOMMU page table setup will not work.
>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>
>>          amdgpu_res_first(res, offset, length, &cursor);
>           ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +		bar_address = hdev->dram_pci_bar_start +
> +				(pages[cur_page] - prop->dram_base_address);
> +		sg_dma_address(sg) = bar_address;

Yeah, that is indeed not working.

Oded you need to use dma_map_resource() for this.

Christian.



>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:31                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>> importer. I want to see that the solution works end-to-end, with real
>>>> device DMA from importer to exporter.
>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>> that requires IOMMU page table setup will not work.
>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>
>>          amdgpu_res_first(res, offset, length, &cursor);
>           ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +		bar_address = hdev->dram_pci_bar_start +
> +				(pages[cur_page] - prop->dram_base_address);
> +		sg_dma_address(sg) = bar_address;

Yeah, that is indeed not working.

Oded you need to use dma_map_resource() for this.

Christian.



>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:31                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>
>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>> importer. I want to see that the solution works end-to-end, with real
>>>> device DMA from importer to exporter.
>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>> that requires IOMMU page table setup will not work.
>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>
>>          amdgpu_res_first(res, offset, length, &cursor);
>           ^^^^^^^^^^
>
> I'm not talking about the AMD driver, I'm talking about this patch.
>
> +		bar_address = hdev->dram_pci_bar_start +
> +				(pages[cur_page] - prop->dram_base_address);
> +		sg_dma_address(sg) = bar_address;

Yeah, that is indeed not working.

Oded you need to use dma_map_resource() for this.

Christian.



>
> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:24                                 ` Oded Gabbay
  (?)
@ 2021-06-22 15:34                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Christoph Hellwig, Linux Kernel Mailing List,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> >
> > > > 1) Setting sg_page to NULL
> > > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > >    can work at all in the first place.
> > > >
> > > > All of these result in functional bugs in certain system
> > > > configurations.
> > > >
> > > > Jason
> > >
> > > Hi Jason,
> > > Thanks for the feedback.
> > > Regarding point 1, why is that a problem if we disable the option to
> > > mmap the dma-buf from user-space ?
> >
> > Userspace has nothing to do with needing struct pages or not
> >
> > Point 1 and 2 mostly go together, you supporting the iommu is not nice
> > if you dont have struct pages.
> >
> > You should study Logan's patches I pointed you at as they are solving
> > exactly this problem.

> Yes, I do need to study them. I agree with you here. It appears I
> have a hole in my understanding.  I'm missing the connection between
> iommu support (which I must have of course) and struct pages.

Chistian explained what the AMD driver is doing by calling
dma_map_resource().

Which is a hacky and slow way of achieving what Logan's series is
doing.

> > No, the design of the dmabuf requires the exporter to do the dma maps
> > and so it is only the exporter that is wrong to omit all the iommu and
> > p2p logic.
> >
> > RDMA is OK today only because nobody has implemented dma buf support
> > in rxe/si - mainly because the only implementations of exporters don't
>
> Can you please educate me, what is rxe/si ?

Sorry, rxe/siw - these are the all-software implementations of RDMA
and they require the struct page to do a SW memory copy. They can't
implement dmabuf without it.

> ok...
> so how come that patch-set was merged into 5.12 if it's buggy ?

We only implemented true dma devices for RDMA DMABUF support, so it is
isn't buggy right now.

> Yes, that's what I expect to see. But I want to see it with my own
> eyes and then figure out how to solve this.

It might be tricky to test because you have to ensure the iommu is
turned on and has a non-idenity page table. Basically if it doesn't
trigger a IOMMU failure then the IOMMU isn't setup properly.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:34                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> >
> > > > 1) Setting sg_page to NULL
> > > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > >    can work at all in the first place.
> > > >
> > > > All of these result in functional bugs in certain system
> > > > configurations.
> > > >
> > > > Jason
> > >
> > > Hi Jason,
> > > Thanks for the feedback.
> > > Regarding point 1, why is that a problem if we disable the option to
> > > mmap the dma-buf from user-space ?
> >
> > Userspace has nothing to do with needing struct pages or not
> >
> > Point 1 and 2 mostly go together, you supporting the iommu is not nice
> > if you dont have struct pages.
> >
> > You should study Logan's patches I pointed you at as they are solving
> > exactly this problem.

> Yes, I do need to study them. I agree with you here. It appears I
> have a hole in my understanding.  I'm missing the connection between
> iommu support (which I must have of course) and struct pages.

Chistian explained what the AMD driver is doing by calling
dma_map_resource().

Which is a hacky and slow way of achieving what Logan's series is
doing.

> > No, the design of the dmabuf requires the exporter to do the dma maps
> > and so it is only the exporter that is wrong to omit all the iommu and
> > p2p logic.
> >
> > RDMA is OK today only because nobody has implemented dma buf support
> > in rxe/si - mainly because the only implementations of exporters don't
>
> Can you please educate me, what is rxe/si ?

Sorry, rxe/siw - these are the all-software implementations of RDMA
and they require the struct page to do a SW memory copy. They can't
implement dmabuf without it.

> ok...
> so how come that patch-set was merged into 5.12 if it's buggy ?

We only implemented true dma devices for RDMA DMABUF support, so it is
isn't buggy right now.

> Yes, that's what I expect to see. But I want to see it with my own
> eyes and then figure out how to solve this.

It might be tricky to test because you have to ensure the iommu is
turned on and has a non-idenity page table. Basically if it doesn't
trigger a IOMMU failure then the IOMMU isn't setup properly.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:34                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> >
> > > > 1) Setting sg_page to NULL
> > > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > >    can work at all in the first place.
> > > >
> > > > All of these result in functional bugs in certain system
> > > > configurations.
> > > >
> > > > Jason
> > >
> > > Hi Jason,
> > > Thanks for the feedback.
> > > Regarding point 1, why is that a problem if we disable the option to
> > > mmap the dma-buf from user-space ?
> >
> > Userspace has nothing to do with needing struct pages or not
> >
> > Point 1 and 2 mostly go together, you supporting the iommu is not nice
> > if you dont have struct pages.
> >
> > You should study Logan's patches I pointed you at as they are solving
> > exactly this problem.

> Yes, I do need to study them. I agree with you here. It appears I
> have a hole in my understanding.  I'm missing the connection between
> iommu support (which I must have of course) and struct pages.

Chistian explained what the AMD driver is doing by calling
dma_map_resource().

Which is a hacky and slow way of achieving what Logan's series is
doing.

> > No, the design of the dmabuf requires the exporter to do the dma maps
> > and so it is only the exporter that is wrong to omit all the iommu and
> > p2p logic.
> >
> > RDMA is OK today only because nobody has implemented dma buf support
> > in rxe/si - mainly because the only implementations of exporters don't
>
> Can you please educate me, what is rxe/si ?

Sorry, rxe/siw - these are the all-software implementations of RDMA
and they require the struct page to do a SW memory copy. They can't
implement dmabuf without it.

> ok...
> so how come that patch-set was merged into 5.12 if it's buggy ?

We only implemented true dma devices for RDMA DMABUF support, so it is
isn't buggy right now.

> Yes, that's what I expect to see. But I want to see it with my own
> eyes and then figure out how to solve this.

It might be tricky to test because you have to ensure the iommu is
turned on and has a non-idenity page table. Basically if it doesn't
trigger a IOMMU failure then the IOMMU isn't setup properly.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:29                             ` Christian König
  (?)
@ 2021-06-22 15:40                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> > > Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > > > 
> > > > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > > > to these pages through the userspace.
> > > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > > even using.
> > > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > > memory which isn't CPU accessible.
> > > > That isn't an issue here because the memory is only intended to be
> > > > used with P2P transfers so it must be CPU accessible.
> > > No, especially P2P is often done on memory resources which are not even
> > > remotely CPU accessible.
> > That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> > CPU accessible.
> 
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with the CPU.

None of that is PCI P2P.

It is all some specialty direct transfer.

You can't reasonably call dma_map_resource() on non CPU mapped memory
for instance, what address would you pass?

Do not confuse "I am doing transfers between two HW blocks" with PCI
Peer to Peer DMA transfers - the latter is a very narrow subcase.

> No, just using the dma_map_resource() interface.

Ik, but yes that does "work". Logan's series is better.

> > > > > I'll go and read Logan's patch-set to see if that will work for us in
> > > > > the future. Please remember, as Daniel said, we don't have struct page
> > > > > backing our device memory, so if that is a requirement to connect to
> > > > > Logan's work, then I don't think we will want to do it at this point.
> > > > It is trivial to get the struct page for a PCI BAR.
> > > Yeah, but it doesn't make much sense. Why should we create a struct page for
> > > something that isn't even memory in a lot of cases?
> > Because the iommu and other places need this handle to setup their
> > stuff. Nobody has yet been brave enough to try to change those flows
> > to be able to use a physical CPU address.
> 
> Well that is certainly not true. I'm just not sure if that works with all
> IOMMU drivers thought.

Huh? All the iommu interfaces except for the dma_map_resource() are
struct page based. dma_map_resource() is slow ad limited in what it
can do.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:40                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> > > Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > > > 
> > > > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > > > to these pages through the userspace.
> > > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > > even using.
> > > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > > memory which isn't CPU accessible.
> > > > That isn't an issue here because the memory is only intended to be
> > > > used with P2P transfers so it must be CPU accessible.
> > > No, especially P2P is often done on memory resources which are not even
> > > remotely CPU accessible.
> > That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> > CPU accessible.
> 
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with the CPU.

None of that is PCI P2P.

It is all some specialty direct transfer.

You can't reasonably call dma_map_resource() on non CPU mapped memory
for instance, what address would you pass?

Do not confuse "I am doing transfers between two HW blocks" with PCI
Peer to Peer DMA transfers - the latter is a very narrow subcase.

> No, just using the dma_map_resource() interface.

Ik, but yes that does "work". Logan's series is better.

> > > > > I'll go and read Logan's patch-set to see if that will work for us in
> > > > > the future. Please remember, as Daniel said, we don't have struct page
> > > > > backing our device memory, so if that is a requirement to connect to
> > > > > Logan's work, then I don't think we will want to do it at this point.
> > > > It is trivial to get the struct page for a PCI BAR.
> > > Yeah, but it doesn't make much sense. Why should we create a struct page for
> > > something that isn't even memory in a lot of cases?
> > Because the iommu and other places need this handle to setup their
> > stuff. Nobody has yet been brave enough to try to change those flows
> > to be able to use a physical CPU address.
> 
> Well that is certainly not true. I'm just not sure if that works with all
> IOMMU drivers thought.

Huh? All the iommu interfaces except for the dma_map_resource() are
struct page based. dma_map_resource() is slow ad limited in what it
can do.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:40                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
> > > Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
> > > > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > > > > 
> > > > > > > > Another thing I want to emphasize is that we are doing p2p only
> > > > > > > > through the export/import of the FD. We do *not* allow the user to
> > > > > > > > mmap the dma-buf as we do not support direct IO. So there is no access
> > > > > > > > to these pages through the userspace.
> > > > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > > > even using.
> > > > > > The major problem with this approach is that DMA-buf is also used for
> > > > > > memory which isn't CPU accessible.
> > > > That isn't an issue here because the memory is only intended to be
> > > > used with P2P transfers so it must be CPU accessible.
> > > No, especially P2P is often done on memory resources which are not even
> > > remotely CPU accessible.
> > That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
> > CPU accessible.
> 
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with the CPU.

None of that is PCI P2P.

It is all some specialty direct transfer.

You can't reasonably call dma_map_resource() on non CPU mapped memory
for instance, what address would you pass?

Do not confuse "I am doing transfers between two HW blocks" with PCI
Peer to Peer DMA transfers - the latter is a very narrow subcase.

> No, just using the dma_map_resource() interface.

Ik, but yes that does "work". Logan's series is better.

> > > > > I'll go and read Logan's patch-set to see if that will work for us in
> > > > > the future. Please remember, as Daniel said, we don't have struct page
> > > > > backing our device memory, so if that is a requirement to connect to
> > > > > Logan's work, then I don't think we will want to do it at this point.
> > > > It is trivial to get the struct page for a PCI BAR.
> > > Yeah, but it doesn't make much sense. Why should we create a struct page for
> > > something that isn't even memory in a lot of cases?
> > Because the iommu and other places need this handle to setup their
> > stuff. Nobody has yet been brave enough to try to change those flows
> > to be able to use a physical CPU address.
> 
> Well that is certainly not true. I'm just not sure if that works with all
> IOMMU drivers thought.

Huh? All the iommu interfaces except for the dma_map_resource() are
struct page based. dma_map_resource() is slow ad limited in what it
can do.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:31                                     ` Christian König
  (?)
@ 2021-06-22 15:40                                       ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:31 PM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
> >
> >>>> I will take two GAUDI devices and use one as an exporter and one as an
> >>>> importer. I want to see that the solution works end-to-end, with real
> >>>> device DMA from importer to exporter.
> >>> I can tell you it doesn't. Stuffing physical addresses directly into
> >>> the sg list doesn't involve any of the IOMMU code so any configuration
> >>> that requires IOMMU page table setup will not work.
> >> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >>
> >>          amdgpu_res_first(res, offset, length, &cursor);
> >           ^^^^^^^^^^
> >
> > I'm not talking about the AMD driver, I'm talking about this patch.
> >
> > +             bar_address = hdev->dram_pci_bar_start +
> > +                             (pages[cur_page] - prop->dram_base_address);
> > +             sg_dma_address(sg) = bar_address;
>
> Yeah, that is indeed not working.
>
> Oded you need to use dma_map_resource() for this.
>
> Christian.
Yes, of course.
But will it be enough ?
Jason said that supporting IOMMU isn't nice when we don't have struct pages.
I fail to understand the connection, I need to dig into this.

Oded

>
>
>
> >
> > Jason
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:40                                       ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:31 PM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
> >
> >>>> I will take two GAUDI devices and use one as an exporter and one as an
> >>>> importer. I want to see that the solution works end-to-end, with real
> >>>> device DMA from importer to exporter.
> >>> I can tell you it doesn't. Stuffing physical addresses directly into
> >>> the sg list doesn't involve any of the IOMMU code so any configuration
> >>> that requires IOMMU page table setup will not work.
> >> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >>
> >>          amdgpu_res_first(res, offset, length, &cursor);
> >           ^^^^^^^^^^
> >
> > I'm not talking about the AMD driver, I'm talking about this patch.
> >
> > +             bar_address = hdev->dram_pci_bar_start +
> > +                             (pages[cur_page] - prop->dram_base_address);
> > +             sg_dma_address(sg) = bar_address;
>
> Yeah, that is indeed not working.
>
> Oded you need to use dma_map_resource() for this.
>
> Christian.
Yes, of course.
But will it be enough ?
Jason said that supporting IOMMU isn't nice when we don't have struct pages.
I fail to understand the connection, I need to dig into this.

Oded

>
>
>
> >
> > Jason
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:40                                       ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-22 15:40 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 6:31 PM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
> >
> >>>> I will take two GAUDI devices and use one as an exporter and one as an
> >>>> importer. I want to see that the solution works end-to-end, with real
> >>>> device DMA from importer to exporter.
> >>> I can tell you it doesn't. Stuffing physical addresses directly into
> >>> the sg list doesn't involve any of the IOMMU code so any configuration
> >>> that requires IOMMU page table setup will not work.
> >> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
> >>
> >>          amdgpu_res_first(res, offset, length, &cursor);
> >           ^^^^^^^^^^
> >
> > I'm not talking about the AMD driver, I'm talking about this patch.
> >
> > +             bar_address = hdev->dram_pci_bar_start +
> > +                             (pages[cur_page] - prop->dram_base_address);
> > +             sg_dma_address(sg) = bar_address;
>
> Yeah, that is indeed not working.
>
> Oded you need to use dma_map_resource() for this.
>
> Christian.
Yes, of course.
But will it be enough ?
Jason said that supporting IOMMU isn't nice when we don't have struct pages.
I fail to understand the connection, I need to dig into this.

Oded

>
>
>
> >
> > Jason
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:40                               ` Jason Gunthorpe
  (?)
@ 2021-06-22 15:48                                 ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>> [SNIP]
>> No absolutely not. NVidia GPUs work exactly the same way.
>>
>> And you have tons of similar cases in embedded and SoC systems where
>> intermediate memory between devices isn't directly addressable with the CPU.
> None of that is PCI P2P.
>
> It is all some specialty direct transfer.
>
> You can't reasonably call dma_map_resource() on non CPU mapped memory
> for instance, what address would you pass?
>
> Do not confuse "I am doing transfers between two HW blocks" with PCI
> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>
>> No, just using the dma_map_resource() interface.
> Ik, but yes that does "work". Logan's series is better.

No it isn't. It makes devices depend on allocating struct pages for 
their BARs which is not necessary nor desired.

How do you prevent direct I/O on those pages for example?

Allocating a struct pages has their use case, for example for exposing 
VRAM as memory for HMM. But that is something very specific and should 
not limit PCIe P2P DMA in general.

>> [SNIP]
>> Well that is certainly not true. I'm just not sure if that works with all
>> IOMMU drivers thought.
> Huh? All the iommu interfaces except for the dma_map_resource() are
> struct page based. dma_map_resource() is slow ad limited in what it
> can do.

Yeah, but that is exactly the functionality we need. And as far as I can 
see that is also what Oded wants here.

Mapping stuff into userspace and then doing direct DMA to it is only a 
very limited use case and we need to be more flexible here.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:48                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>> [SNIP]
>> No absolutely not. NVidia GPUs work exactly the same way.
>>
>> And you have tons of similar cases in embedded and SoC systems where
>> intermediate memory between devices isn't directly addressable with the CPU.
> None of that is PCI P2P.
>
> It is all some specialty direct transfer.
>
> You can't reasonably call dma_map_resource() on non CPU mapped memory
> for instance, what address would you pass?
>
> Do not confuse "I am doing transfers between two HW blocks" with PCI
> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>
>> No, just using the dma_map_resource() interface.
> Ik, but yes that does "work". Logan's series is better.

No it isn't. It makes devices depend on allocating struct pages for 
their BARs which is not necessary nor desired.

How do you prevent direct I/O on those pages for example?

Allocating a struct pages has their use case, for example for exposing 
VRAM as memory for HMM. But that is something very specific and should 
not limit PCIe P2P DMA in general.

>> [SNIP]
>> Well that is certainly not true. I'm just not sure if that works with all
>> IOMMU drivers thought.
> Huh? All the iommu interfaces except for the dma_map_resource() are
> struct page based. dma_map_resource() is slow ad limited in what it
> can do.

Yeah, but that is exactly the functionality we need. And as far as I can 
see that is also what Oded wants here.

Mapping stuff into userspace and then doing direct DMA to it is only a 
very limited use case and we need to be more flexible here.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:48                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>> [SNIP]
>> No absolutely not. NVidia GPUs work exactly the same way.
>>
>> And you have tons of similar cases in embedded and SoC systems where
>> intermediate memory between devices isn't directly addressable with the CPU.
> None of that is PCI P2P.
>
> It is all some specialty direct transfer.
>
> You can't reasonably call dma_map_resource() on non CPU mapped memory
> for instance, what address would you pass?
>
> Do not confuse "I am doing transfers between two HW blocks" with PCI
> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>
>> No, just using the dma_map_resource() interface.
> Ik, but yes that does "work". Logan's series is better.

No it isn't. It makes devices depend on allocating struct pages for 
their BARs which is not necessary nor desired.

How do you prevent direct I/O on those pages for example?

Allocating a struct pages has their use case, for example for exposing 
VRAM as memory for HMM. But that is something very specific and should 
not limit PCIe P2P DMA in general.

>> [SNIP]
>> Well that is certainly not true. I'm just not sure if that works with all
>> IOMMU drivers thought.
> Huh? All the iommu interfaces except for the dma_map_resource() are
> struct page based. dma_map_resource() is slow ad limited in what it
> can do.

Yeah, but that is exactly the functionality we need. And as far as I can 
see that is also what Oded wants here.

Mapping stuff into userspace and then doing direct DMA to it is only a 
very limited use case and we need to be more flexible here.

Christian.

>
> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:40                                       ` Oded Gabbay
  (?)
@ 2021-06-22 15:49                                         ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:49 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jason Gunthorpe, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Oded Gabbay:
> On Tue, Jun 22, 2021 at 6:31 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>>
>> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>>>
>>>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>>>> importer. I want to see that the solution works end-to-end, with real
>>>>>> device DMA from importer to exporter.
>>>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>>>> that requires IOMMU page table setup will not work.
>>>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>>>
>>>>           amdgpu_res_first(res, offset, length, &cursor);
>>>            ^^^^^^^^^^
>>>
>>> I'm not talking about the AMD driver, I'm talking about this patch.
>>>
>>> +             bar_address = hdev->dram_pci_bar_start +
>>> +                             (pages[cur_page] - prop->dram_base_address);
>>> +             sg_dma_address(sg) = bar_address;
>> Yeah, that is indeed not working.
>>
>> Oded you need to use dma_map_resource() for this.
>>
>> Christian.
> Yes, of course.
> But will it be enough ?
> Jason said that supporting IOMMU isn't nice when we don't have struct pages.
> I fail to understand the connection, I need to dig into this.

Question is what you want to do with this?

A struct page is always needed if you want to do stuff like HMM with it, 
if you only want P2P between device I actually recommend to avoid it.

Christian.

>
> Oded
>
>>
>>
>>> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:49                                         ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:49 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Oded Gabbay:
> On Tue, Jun 22, 2021 at 6:31 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>>
>> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>>>
>>>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>>>> importer. I want to see that the solution works end-to-end, with real
>>>>>> device DMA from importer to exporter.
>>>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>>>> that requires IOMMU page table setup will not work.
>>>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>>>
>>>>           amdgpu_res_first(res, offset, length, &cursor);
>>>            ^^^^^^^^^^
>>>
>>> I'm not talking about the AMD driver, I'm talking about this patch.
>>>
>>> +             bar_address = hdev->dram_pci_bar_start +
>>> +                             (pages[cur_page] - prop->dram_base_address);
>>> +             sg_dma_address(sg) = bar_address;
>> Yeah, that is indeed not working.
>>
>> Oded you need to use dma_map_resource() for this.
>>
>> Christian.
> Yes, of course.
> But will it be enough ?
> Jason said that supporting IOMMU isn't nice when we don't have struct pages.
> I fail to understand the connection, I need to dig into this.

Question is what you want to do with this?

A struct page is always needed if you want to do stuff like HMM with it, 
if you only want P2P between device I actually recommend to avoid it.

Christian.

>
> Oded
>
>>
>>
>>> Jason


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 15:49                                         ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-22 15:49 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 17:40 schrieb Oded Gabbay:
> On Tue, Jun 22, 2021 at 6:31 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>>
>> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>>>
>>>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>>>> importer. I want to see that the solution works end-to-end, with real
>>>>>> device DMA from importer to exporter.
>>>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>>>> that requires IOMMU page table setup will not work.
>>>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>>>
>>>>           amdgpu_res_first(res, offset, length, &cursor);
>>>            ^^^^^^^^^^
>>>
>>> I'm not talking about the AMD driver, I'm talking about this patch.
>>>
>>> +             bar_address = hdev->dram_pci_bar_start +
>>> +                             (pages[cur_page] - prop->dram_base_address);
>>> +             sg_dma_address(sg) = bar_address;
>> Yeah, that is indeed not working.
>>
>> Oded you need to use dma_map_resource() for this.
>>
>> Christian.
> Yes, of course.
> But will it be enough ?
> Jason said that supporting IOMMU isn't nice when we don't have struct pages.
> I fail to understand the connection, I need to dig into this.

Question is what you want to do with this?

A struct page is always needed if you want to do stuff like HMM with it, 
if you only want P2P between device I actually recommend to avoid it.

Christian.

>
> Oded
>
>>
>>
>>> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:48                                 ` Christian König
  (?)
@ 2021-06-22 16:05                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 16:05 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> > > [SNIP]
> > > No absolutely not. NVidia GPUs work exactly the same way.
> > > 
> > > And you have tons of similar cases in embedded and SoC systems where
> > > intermediate memory between devices isn't directly addressable with the CPU.
> > None of that is PCI P2P.
> > 
> > It is all some specialty direct transfer.
> > 
> > You can't reasonably call dma_map_resource() on non CPU mapped memory
> > for instance, what address would you pass?
> > 
> > Do not confuse "I am doing transfers between two HW blocks" with PCI
> > Peer to Peer DMA transfers - the latter is a very narrow subcase.
> > 
> > > No, just using the dma_map_resource() interface.
> > Ik, but yes that does "work". Logan's series is better.
>
> No it isn't. It makes devices depend on allocating struct pages for their
> BARs which is not necessary nor desired.

Which dramatically reduces the cost of establishing DMA mappings, a
loop of dma_map_resource() is very expensive.
 
> How do you prevent direct I/O on those pages for example?

GUP fails.

> Allocating a struct pages has their use case, for example for exposing VRAM
> as memory for HMM. But that is something very specific and should not limit
> PCIe P2P DMA in general.

Sure, but that is an ideal we are far from obtaining, and nobody wants
to work on it prefering to do hacky hacky like this.

If you believe in this then remove the scatter list from dmabuf, add a
new set of dma_map* APIs to work on physical addresses and all the
other stuff needed.

Otherwise, we have what we have and drivers don't get to opt out. This
is why the stuff in AMDGPU was NAK'd.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 16:05                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 16:05 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> > > [SNIP]
> > > No absolutely not. NVidia GPUs work exactly the same way.
> > > 
> > > And you have tons of similar cases in embedded and SoC systems where
> > > intermediate memory between devices isn't directly addressable with the CPU.
> > None of that is PCI P2P.
> > 
> > It is all some specialty direct transfer.
> > 
> > You can't reasonably call dma_map_resource() on non CPU mapped memory
> > for instance, what address would you pass?
> > 
> > Do not confuse "I am doing transfers between two HW blocks" with PCI
> > Peer to Peer DMA transfers - the latter is a very narrow subcase.
> > 
> > > No, just using the dma_map_resource() interface.
> > Ik, but yes that does "work". Logan's series is better.
>
> No it isn't. It makes devices depend on allocating struct pages for their
> BARs which is not necessary nor desired.

Which dramatically reduces the cost of establishing DMA mappings, a
loop of dma_map_resource() is very expensive.
 
> How do you prevent direct I/O on those pages for example?

GUP fails.

> Allocating a struct pages has their use case, for example for exposing VRAM
> as memory for HMM. But that is something very specific and should not limit
> PCIe P2P DMA in general.

Sure, but that is an ideal we are far from obtaining, and nobody wants
to work on it prefering to do hacky hacky like this.

If you believe in this then remove the scatter list from dmabuf, add a
new set of dma_map* APIs to work on physical addresses and all the
other stuff needed.

Otherwise, we have what we have and drivers don't get to opt out. This
is why the stuff in AMDGPU was NAK'd.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 16:05                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 16:05 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> > > [SNIP]
> > > No absolutely not. NVidia GPUs work exactly the same way.
> > > 
> > > And you have tons of similar cases in embedded and SoC systems where
> > > intermediate memory between devices isn't directly addressable with the CPU.
> > None of that is PCI P2P.
> > 
> > It is all some specialty direct transfer.
> > 
> > You can't reasonably call dma_map_resource() on non CPU mapped memory
> > for instance, what address would you pass?
> > 
> > Do not confuse "I am doing transfers between two HW blocks" with PCI
> > Peer to Peer DMA transfers - the latter is a very narrow subcase.
> > 
> > > No, just using the dma_map_resource() interface.
> > Ik, but yes that does "work". Logan's series is better.
>
> No it isn't. It makes devices depend on allocating struct pages for their
> BARs which is not necessary nor desired.

Which dramatically reduces the cost of establishing DMA mappings, a
loop of dma_map_resource() is very expensive.
 
> How do you prevent direct I/O on those pages for example?

GUP fails.

> Allocating a struct pages has their use case, for example for exposing VRAM
> as memory for HMM. But that is something very specific and should not limit
> PCIe P2P DMA in general.

Sure, but that is an ideal we are far from obtaining, and nobody wants
to work on it prefering to do hacky hacky like this.

If you believe in this then remove the scatter list from dmabuf, add a
new set of dma_map* APIs to work on physical addresses and all the
other stuff needed.

Otherwise, we have what we have and drivers don't get to opt out. This
is why the stuff in AMDGPU was NAK'd.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 15:29                             ` Christian König
  (?)
@ 2021-06-22 16:50                               ` Felix Kuehling
  -1 siblings, 0 replies; 143+ messages in thread
From: Felix Kuehling @ 2021-06-22 16:50 UTC (permalink / raw)
  To: Christian König, Jason Gunthorpe
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 2021-06-22 um 11:29 a.m. schrieb Christian König:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
>> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>>
>>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no
>>>>>>>> access
>>>>>>>> to these pages through the userspace.
>>>>>>> Arguably mmaping the memory is a better choice, and is the
>>>>>>> direction
>>>>>>> that Logan's series goes in. Here the use of DMABUF was
>>>>>>> specifically
>>>>>>> designed to allow hitless revokation of the memory, which this
>>>>>>> isn't
>>>>>>> even using.
>>>>>> The major problem with this approach is that DMA-buf is also used
>>>>>> for
>>>>>> memory which isn't CPU accessible.
>>>> That isn't an issue here because the memory is only intended to be
>>>> used with P2P transfers so it must be CPU accessible.
>>> No, especially P2P is often done on memory resources which are not even
>>> remotely CPU accessible.
>> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
>> CPU accessible.
>
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with
> the CPU.
>
>>>>>>> So you are taking the hit of very limited hardware support and
>>>>>>> reduced
>>>>>>> performance just to squeeze into DMABUF..
>>>> You still have the issue that this patch is doing all of this P2P
>>>> stuff wrong - following the already NAK'd AMD approach.
>>> Well that stuff was NAKed because we still use sg_tables, not
>>> because we
>>> don't want to allocate struct pages.
>> sg lists in general.
>>  
>>> The plan is to push this forward since DEVICE_PRIVATE clearly can't
>>> handle
>>> all of our use cases and is not really a good fit to be honest.
>>>
>>> IOMMU is now working as well, so as far as I can see we are all good
>>> here.
>> How? Is that more AMD special stuff?
>
> No, just using the dma_map_resource() interface.
>
> We have that working on tons of IOMMU enabled systems.
>
>> This patch series never calls to the iommu driver, AFAICT.
>>
>>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>>> the future. Please remember, as Daniel said, we don't have struct
>>>>> page
>>>>> backing our device memory, so if that is a requirement to connect to
>>>>> Logan's work, then I don't think we will want to do it at this point.
>>>> It is trivial to get the struct page for a PCI BAR.
>>> Yeah, but it doesn't make much sense. Why should we create a struct
>>> page for
>>> something that isn't even memory in a lot of cases?
>> Because the iommu and other places need this handle to setup their
>> stuff. Nobody has yet been brave enough to try to change those flows
>> to be able to use a physical CPU address.
>
> Well that is certainly not true. I'm just not sure if that works with
> all IOMMU drivers thought.
>
> Would need to ping Felix when the support for this was merged.

We have been working on IOMMU support for all our multi-GPU memory
mappings in KFD. The PCIe P2P side of this is currently only merged on
our internal branch. Before we can actually use this, we need
CONFIG_DMABUF_MOVE_NOTIFY enabled (which is still documented as
experimental and disabled by default). Otherwise we'll end up pinning
all our VRAM.

I think we'll try to put together an upstream patch series of all our
PCIe P2P support in a few weeks or so. This will include IOMMU mappings,
checking that PCIe P2P is actually possible between two devices, and KFD
topology updates to correctly report those capabilities to user mode.

It will not use struct pages for exported VRAM buffers.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> This is why we have a special struct page type just for PCI BAR
>> memory.
>>
>> Jason
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 16:50                               ` Felix Kuehling
  0 siblings, 0 replies; 143+ messages in thread
From: Felix Kuehling @ 2021-06-22 16:50 UTC (permalink / raw)
  To: Christian König, Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Oded Gabbay,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christoph Hellwig,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 2021-06-22 um 11:29 a.m. schrieb Christian König:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
>> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>>
>>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no
>>>>>>>> access
>>>>>>>> to these pages through the userspace.
>>>>>>> Arguably mmaping the memory is a better choice, and is the
>>>>>>> direction
>>>>>>> that Logan's series goes in. Here the use of DMABUF was
>>>>>>> specifically
>>>>>>> designed to allow hitless revokation of the memory, which this
>>>>>>> isn't
>>>>>>> even using.
>>>>>> The major problem with this approach is that DMA-buf is also used
>>>>>> for
>>>>>> memory which isn't CPU accessible.
>>>> That isn't an issue here because the memory is only intended to be
>>>> used with P2P transfers so it must be CPU accessible.
>>> No, especially P2P is often done on memory resources which are not even
>>> remotely CPU accessible.
>> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
>> CPU accessible.
>
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with
> the CPU.
>
>>>>>>> So you are taking the hit of very limited hardware support and
>>>>>>> reduced
>>>>>>> performance just to squeeze into DMABUF..
>>>> You still have the issue that this patch is doing all of this P2P
>>>> stuff wrong - following the already NAK'd AMD approach.
>>> Well that stuff was NAKed because we still use sg_tables, not
>>> because we
>>> don't want to allocate struct pages.
>> sg lists in general.
>>  
>>> The plan is to push this forward since DEVICE_PRIVATE clearly can't
>>> handle
>>> all of our use cases and is not really a good fit to be honest.
>>>
>>> IOMMU is now working as well, so as far as I can see we are all good
>>> here.
>> How? Is that more AMD special stuff?
>
> No, just using the dma_map_resource() interface.
>
> We have that working on tons of IOMMU enabled systems.
>
>> This patch series never calls to the iommu driver, AFAICT.
>>
>>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>>> the future. Please remember, as Daniel said, we don't have struct
>>>>> page
>>>>> backing our device memory, so if that is a requirement to connect to
>>>>> Logan's work, then I don't think we will want to do it at this point.
>>>> It is trivial to get the struct page for a PCI BAR.
>>> Yeah, but it doesn't make much sense. Why should we create a struct
>>> page for
>>> something that isn't even memory in a lot of cases?
>> Because the iommu and other places need this handle to setup their
>> stuff. Nobody has yet been brave enough to try to change those flows
>> to be able to use a physical CPU address.
>
> Well that is certainly not true. I'm just not sure if that works with
> all IOMMU drivers thought.
>
> Would need to ping Felix when the support for this was merged.

We have been working on IOMMU support for all our multi-GPU memory
mappings in KFD. The PCIe P2P side of this is currently only merged on
our internal branch. Before we can actually use this, we need
CONFIG_DMABUF_MOVE_NOTIFY enabled (which is still documented as
experimental and disabled by default). Otherwise we'll end up pinning
all our VRAM.

I think we'll try to put together an upstream patch series of all our
PCIe P2P support in a few weeks or so. This will include IOMMU mappings,
checking that PCIe P2P is actually possible between two devices, and KFD
topology updates to correctly report those capabilities to user mode.

It will not use struct pages for exported VRAM buffers.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> This is why we have a special struct page type just for PCI BAR
>> memory.
>>
>> Jason
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-22 16:50                               ` Felix Kuehling
  0 siblings, 0 replies; 143+ messages in thread
From: Felix Kuehling @ 2021-06-22 16:50 UTC (permalink / raw)
  To: Christian König, Jason Gunthorpe
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Oded Gabbay,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christoph Hellwig,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 2021-06-22 um 11:29 a.m. schrieb Christian König:
> Am 22.06.21 um 17:23 schrieb Jason Gunthorpe:
>> On Tue, Jun 22, 2021 at 02:23:03PM +0200, Christian König wrote:
>>> Am 22.06.21 um 14:01 schrieb Jason Gunthorpe:
>>>> On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
>>>>> On Tue, Jun 22, 2021 at 9:37 AM Christian König
>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
>>>>>>> On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
>>>>>>>
>>>>>>>> Another thing I want to emphasize is that we are doing p2p only
>>>>>>>> through the export/import of the FD. We do *not* allow the user to
>>>>>>>> mmap the dma-buf as we do not support direct IO. So there is no
>>>>>>>> access
>>>>>>>> to these pages through the userspace.
>>>>>>> Arguably mmaping the memory is a better choice, and is the
>>>>>>> direction
>>>>>>> that Logan's series goes in. Here the use of DMABUF was
>>>>>>> specifically
>>>>>>> designed to allow hitless revokation of the memory, which this
>>>>>>> isn't
>>>>>>> even using.
>>>>>> The major problem with this approach is that DMA-buf is also used
>>>>>> for
>>>>>> memory which isn't CPU accessible.
>>>> That isn't an issue here because the memory is only intended to be
>>>> used with P2P transfers so it must be CPU accessible.
>>> No, especially P2P is often done on memory resources which are not even
>>> remotely CPU accessible.
>> That is a special AMD thing, P2P here is PCI P2P and all PCI memory is
>> CPU accessible.
>
> No absolutely not. NVidia GPUs work exactly the same way.
>
> And you have tons of similar cases in embedded and SoC systems where
> intermediate memory between devices isn't directly addressable with
> the CPU.
>
>>>>>>> So you are taking the hit of very limited hardware support and
>>>>>>> reduced
>>>>>>> performance just to squeeze into DMABUF..
>>>> You still have the issue that this patch is doing all of this P2P
>>>> stuff wrong - following the already NAK'd AMD approach.
>>> Well that stuff was NAKed because we still use sg_tables, not
>>> because we
>>> don't want to allocate struct pages.
>> sg lists in general.
>>  
>>> The plan is to push this forward since DEVICE_PRIVATE clearly can't
>>> handle
>>> all of our use cases and is not really a good fit to be honest.
>>>
>>> IOMMU is now working as well, so as far as I can see we are all good
>>> here.
>> How? Is that more AMD special stuff?
>
> No, just using the dma_map_resource() interface.
>
> We have that working on tons of IOMMU enabled systems.
>
>> This patch series never calls to the iommu driver, AFAICT.
>>
>>>>> I'll go and read Logan's patch-set to see if that will work for us in
>>>>> the future. Please remember, as Daniel said, we don't have struct
>>>>> page
>>>>> backing our device memory, so if that is a requirement to connect to
>>>>> Logan's work, then I don't think we will want to do it at this point.
>>>> It is trivial to get the struct page for a PCI BAR.
>>> Yeah, but it doesn't make much sense. Why should we create a struct
>>> page for
>>> something that isn't even memory in a lot of cases?
>> Because the iommu and other places need this handle to setup their
>> stuff. Nobody has yet been brave enough to try to change those flows
>> to be able to use a physical CPU address.
>
> Well that is certainly not true. I'm just not sure if that works with
> all IOMMU drivers thought.
>
> Would need to ping Felix when the support for this was merged.

We have been working on IOMMU support for all our multi-GPU memory
mappings in KFD. The PCIe P2P side of this is currently only merged on
our internal branch. Before we can actually use this, we need
CONFIG_DMABUF_MOVE_NOTIFY enabled (which is still documented as
experimental and disabled by default). Otherwise we'll end up pinning
all our VRAM.

I think we'll try to put together an upstream patch series of all our
PCIe P2P support in a few weeks or so. This will include IOMMU mappings,
checking that PCIe P2P is actually possible between two devices, and KFD
topology updates to correctly report those capabilities to user mode.

It will not use struct pages for exported VRAM buffers.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> This is why we have a special struct page type just for PCI BAR
>> memory.
>>
>> Jason
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-22 16:05                                   ` Jason Gunthorpe
  (?)
@ 2021-06-23  8:57                                     ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-23  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Christian König
  Cc: Oded Gabbay, Gal Pressman, sleybo, linux-rdma, Oded Gabbay,
	Christoph Hellwig, Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
>> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> No absolutely not. NVidia GPUs work exactly the same way.
>>>>
>>>> And you have tons of similar cases in embedded and SoC systems where
>>>> intermediate memory between devices isn't directly addressable with the CPU.
>>> None of that is PCI P2P.
>>>
>>> It is all some specialty direct transfer.
>>>
>>> You can't reasonably call dma_map_resource() on non CPU mapped memory
>>> for instance, what address would you pass?
>>>
>>> Do not confuse "I am doing transfers between two HW blocks" with PCI
>>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>>>
>>>> No, just using the dma_map_resource() interface.
>>> Ik, but yes that does "work". Logan's series is better.
>> No it isn't. It makes devices depend on allocating struct pages for their
>> BARs which is not necessary nor desired.
> Which dramatically reduces the cost of establishing DMA mappings, a
> loop of dma_map_resource() is very expensive.

Yeah, but that is perfectly ok. Our BAR allocations are either in chunks 
of at least 2MiB or only a single 4KiB page.

Oded might run into more performance problems, but those DMA-buf 
mappings are usually set up only once.

>> How do you prevent direct I/O on those pages for example?
> GUP fails.

At least that is calming.

>> Allocating a struct pages has their use case, for example for exposing VRAM
>> as memory for HMM. But that is something very specific and should not limit
>> PCIe P2P DMA in general.
> Sure, but that is an ideal we are far from obtaining, and nobody wants
> to work on it prefering to do hacky hacky like this.
>
> If you believe in this then remove the scatter list from dmabuf, add a
> new set of dma_map* APIs to work on physical addresses and all the
> other stuff needed.

Yeah, that's what I totally agree on. And I actually hoped that the new 
P2P work for PCIe would go into that direction, but that didn't 
materialized.

But allocating struct pages for PCIe BARs which are essentially 
registers and not memory is much more hacky than the dma_resource_map() 
approach.

To re-iterate why I think that having struct pages for those BARs is a 
bad idea: Our doorbells on AMD GPUs are write and read pointers for ring 
buffers.

When you write to the BAR you essentially tell the firmware that you 
have either filled the ring buffer or read a bunch of it. This in turn 
then triggers an interrupt in the hardware/firmware which was eventually 
asleep.

By using PCIe P2P we want to avoid the round trip to the CPU when one 
device has filled the ring buffer and another device must be woken up to 
process it.

Think of it as MSI-X in reverse and allocating struct pages for those 
BARs just to work around the shortcomings of the DMA API makes no sense 
at all to me.

We also do have the VRAM BAR, and for HMM we do allocate struct pages 
for the address range exposed there. But this is a different use case.

Regards,
Christian.

>
> Otherwise, we have what we have and drivers don't get to opt out. This
> is why the stuff in AMDGPU was NAK'd.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23  8:57                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-23  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Christian König
  Cc: linux-rdma, Linux Kernel Mailing List, sleybo, Gal Pressman,
	dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
>> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> No absolutely not. NVidia GPUs work exactly the same way.
>>>>
>>>> And you have tons of similar cases in embedded and SoC systems where
>>>> intermediate memory between devices isn't directly addressable with the CPU.
>>> None of that is PCI P2P.
>>>
>>> It is all some specialty direct transfer.
>>>
>>> You can't reasonably call dma_map_resource() on non CPU mapped memory
>>> for instance, what address would you pass?
>>>
>>> Do not confuse "I am doing transfers between two HW blocks" with PCI
>>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>>>
>>>> No, just using the dma_map_resource() interface.
>>> Ik, but yes that does "work". Logan's series is better.
>> No it isn't. It makes devices depend on allocating struct pages for their
>> BARs which is not necessary nor desired.
> Which dramatically reduces the cost of establishing DMA mappings, a
> loop of dma_map_resource() is very expensive.

Yeah, but that is perfectly ok. Our BAR allocations are either in chunks 
of at least 2MiB or only a single 4KiB page.

Oded might run into more performance problems, but those DMA-buf 
mappings are usually set up only once.

>> How do you prevent direct I/O on those pages for example?
> GUP fails.

At least that is calming.

>> Allocating a struct pages has their use case, for example for exposing VRAM
>> as memory for HMM. But that is something very specific and should not limit
>> PCIe P2P DMA in general.
> Sure, but that is an ideal we are far from obtaining, and nobody wants
> to work on it prefering to do hacky hacky like this.
>
> If you believe in this then remove the scatter list from dmabuf, add a
> new set of dma_map* APIs to work on physical addresses and all the
> other stuff needed.

Yeah, that's what I totally agree on. And I actually hoped that the new 
P2P work for PCIe would go into that direction, but that didn't 
materialized.

But allocating struct pages for PCIe BARs which are essentially 
registers and not memory is much more hacky than the dma_resource_map() 
approach.

To re-iterate why I think that having struct pages for those BARs is a 
bad idea: Our doorbells on AMD GPUs are write and read pointers for ring 
buffers.

When you write to the BAR you essentially tell the firmware that you 
have either filled the ring buffer or read a bunch of it. This in turn 
then triggers an interrupt in the hardware/firmware which was eventually 
asleep.

By using PCIe P2P we want to avoid the round trip to the CPU when one 
device has filled the ring buffer and another device must be woken up to 
process it.

Think of it as MSI-X in reverse and allocating struct pages for those 
BARs just to work around the shortcomings of the DMA API makes no sense 
at all to me.

We also do have the VRAM BAR, and for HMM we do allocate struct pages 
for the address range exposed there. But this is a different use case.

Regards,
Christian.

>
> Otherwise, we have what we have and drivers don't get to opt out. This
> is why the stuff in AMDGPU was NAK'd.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23  8:57                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-23  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Christian König
  Cc: Oded Gabbay, linux-rdma, Linux Kernel Mailing List, sleybo,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
>> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> No absolutely not. NVidia GPUs work exactly the same way.
>>>>
>>>> And you have tons of similar cases in embedded and SoC systems where
>>>> intermediate memory between devices isn't directly addressable with the CPU.
>>> None of that is PCI P2P.
>>>
>>> It is all some specialty direct transfer.
>>>
>>> You can't reasonably call dma_map_resource() on non CPU mapped memory
>>> for instance, what address would you pass?
>>>
>>> Do not confuse "I am doing transfers between two HW blocks" with PCI
>>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>>>
>>>> No, just using the dma_map_resource() interface.
>>> Ik, but yes that does "work". Logan's series is better.
>> No it isn't. It makes devices depend on allocating struct pages for their
>> BARs which is not necessary nor desired.
> Which dramatically reduces the cost of establishing DMA mappings, a
> loop of dma_map_resource() is very expensive.

Yeah, but that is perfectly ok. Our BAR allocations are either in chunks 
of at least 2MiB or only a single 4KiB page.

Oded might run into more performance problems, but those DMA-buf 
mappings are usually set up only once.

>> How do you prevent direct I/O on those pages for example?
> GUP fails.

At least that is calming.

>> Allocating a struct pages has their use case, for example for exposing VRAM
>> as memory for HMM. But that is something very specific and should not limit
>> PCIe P2P DMA in general.
> Sure, but that is an ideal we are far from obtaining, and nobody wants
> to work on it prefering to do hacky hacky like this.
>
> If you believe in this then remove the scatter list from dmabuf, add a
> new set of dma_map* APIs to work on physical addresses and all the
> other stuff needed.

Yeah, that's what I totally agree on. And I actually hoped that the new 
P2P work for PCIe would go into that direction, but that didn't 
materialized.

But allocating struct pages for PCIe BARs which are essentially 
registers and not memory is much more hacky than the dma_resource_map() 
approach.

To re-iterate why I think that having struct pages for those BARs is a 
bad idea: Our doorbells on AMD GPUs are write and read pointers for ring 
buffers.

When you write to the BAR you essentially tell the firmware that you 
have either filled the ring buffer or read a bunch of it. This in turn 
then triggers an interrupt in the hardware/firmware which was eventually 
asleep.

By using PCIe P2P we want to avoid the round trip to the CPU when one 
device has filled the ring buffer and another device must be woken up to 
process it.

Think of it as MSI-X in reverse and allocating struct pages for those 
BARs just to work around the shortcomings of the DMA API makes no sense 
at all to me.

We also do have the VRAM BAR, and for HMM we do allocate struct pages 
for the address range exposed there. But this is a different use case.

Regards,
Christian.

>
> Otherwise, we have what we have and drivers don't get to opt out. This
> is why the stuff in AMDGPU was NAK'd.
>
> Jason

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23  8:57                                     ` Christian König
  (?)
@ 2021-06-23  9:14                                       ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23  9:14 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 11:57 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> >> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> >>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> >>>> [SNIP]
> >>>> No absolutely not. NVidia GPUs work exactly the same way.
> >>>>
> >>>> And you have tons of similar cases in embedded and SoC systems where
> >>>> intermediate memory between devices isn't directly addressable with the CPU.
> >>> None of that is PCI P2P.
> >>>
> >>> It is all some specialty direct transfer.
> >>>
> >>> You can't reasonably call dma_map_resource() on non CPU mapped memory
> >>> for instance, what address would you pass?
> >>>
> >>> Do not confuse "I am doing transfers between two HW blocks" with PCI
> >>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
> >>>
> >>>> No, just using the dma_map_resource() interface.
> >>> Ik, but yes that does "work". Logan's series is better.
> >> No it isn't. It makes devices depend on allocating struct pages for their
> >> BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
>
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks
> of at least 2MiB or only a single 4KiB page.
>
> Oded might run into more performance problems, but those DMA-buf
> mappings are usually set up only once.
>
> >> How do you prevent direct I/O on those pages for example?
> > GUP fails.
>
> At least that is calming.
>
> >> Allocating a struct pages has their use case, for example for exposing VRAM
> >> as memory for HMM. But that is something very specific and should not limit
> >> PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> >
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
>
> Yeah, that's what I totally agree on. And I actually hoped that the new
> P2P work for PCIe would go into that direction, but that didn't
> materialized.
>
> But allocating struct pages for PCIe BARs which are essentially
> registers and not memory is much more hacky than the dma_resource_map()
> approach.
>
> To re-iterate why I think that having struct pages for those BARs is a
> bad idea: Our doorbells on AMD GPUs are write and read pointers for ring
> buffers.
>
> When you write to the BAR you essentially tell the firmware that you
> have either filled the ring buffer or read a bunch of it. This in turn
> then triggers an interrupt in the hardware/firmware which was eventually
> asleep.
>
> By using PCIe P2P we want to avoid the round trip to the CPU when one
> device has filled the ring buffer and another device must be woken up to
> process it.
>
> Think of it as MSI-X in reverse and allocating struct pages for those
> BARs just to work around the shortcomings of the DMA API makes no sense
> at all to me.
We would also like to do that *in the future*.
In Gaudi it will never be supported (due to security limitations) but
I definitely see it happening in future ASICs.

Oded

>
>
> We also do have the VRAM BAR, and for HMM we do allocate struct pages
> for the address range exposed there. But this is a different use case.
>
> Regards,
> Christian.
>
> >
> > Otherwise, we have what we have and drivers don't get to opt out. This
> > is why the stuff in AMDGPU was NAK'd.
> >
> > Jason
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23  9:14                                       ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23  9:14 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Linux Kernel Mailing List, sleybo, Gal Pressman,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 11:57 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> >> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> >>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> >>>> [SNIP]
> >>>> No absolutely not. NVidia GPUs work exactly the same way.
> >>>>
> >>>> And you have tons of similar cases in embedded and SoC systems where
> >>>> intermediate memory between devices isn't directly addressable with the CPU.
> >>> None of that is PCI P2P.
> >>>
> >>> It is all some specialty direct transfer.
> >>>
> >>> You can't reasonably call dma_map_resource() on non CPU mapped memory
> >>> for instance, what address would you pass?
> >>>
> >>> Do not confuse "I am doing transfers between two HW blocks" with PCI
> >>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
> >>>
> >>>> No, just using the dma_map_resource() interface.
> >>> Ik, but yes that does "work". Logan's series is better.
> >> No it isn't. It makes devices depend on allocating struct pages for their
> >> BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
>
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks
> of at least 2MiB or only a single 4KiB page.
>
> Oded might run into more performance problems, but those DMA-buf
> mappings are usually set up only once.
>
> >> How do you prevent direct I/O on those pages for example?
> > GUP fails.
>
> At least that is calming.
>
> >> Allocating a struct pages has their use case, for example for exposing VRAM
> >> as memory for HMM. But that is something very specific and should not limit
> >> PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> >
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
>
> Yeah, that's what I totally agree on. And I actually hoped that the new
> P2P work for PCIe would go into that direction, but that didn't
> materialized.
>
> But allocating struct pages for PCIe BARs which are essentially
> registers and not memory is much more hacky than the dma_resource_map()
> approach.
>
> To re-iterate why I think that having struct pages for those BARs is a
> bad idea: Our doorbells on AMD GPUs are write and read pointers for ring
> buffers.
>
> When you write to the BAR you essentially tell the firmware that you
> have either filled the ring buffer or read a bunch of it. This in turn
> then triggers an interrupt in the hardware/firmware which was eventually
> asleep.
>
> By using PCIe P2P we want to avoid the round trip to the CPU when one
> device has filled the ring buffer and another device must be woken up to
> process it.
>
> Think of it as MSI-X in reverse and allocating struct pages for those
> BARs just to work around the shortcomings of the DMA API makes no sense
> at all to me.
We would also like to do that *in the future*.
In Gaudi it will never be supported (due to security limitations) but
I definitely see it happening in future ASICs.

Oded

>
>
> We also do have the VRAM BAR, and for HMM we do allocate struct pages
> for the address range exposed there. But this is a different use case.
>
> Regards,
> Christian.
>
> >
> > Otherwise, we have what we have and drivers don't get to opt out. This
> > is why the stuff in AMDGPU was NAK'd.
> >
> > Jason
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23  9:14                                       ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23  9:14 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Linux Kernel Mailing List, sleybo, Gal Pressman,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 11:57 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> >> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> >>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> >>>> [SNIP]
> >>>> No absolutely not. NVidia GPUs work exactly the same way.
> >>>>
> >>>> And you have tons of similar cases in embedded and SoC systems where
> >>>> intermediate memory between devices isn't directly addressable with the CPU.
> >>> None of that is PCI P2P.
> >>>
> >>> It is all some specialty direct transfer.
> >>>
> >>> You can't reasonably call dma_map_resource() on non CPU mapped memory
> >>> for instance, what address would you pass?
> >>>
> >>> Do not confuse "I am doing transfers between two HW blocks" with PCI
> >>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
> >>>
> >>>> No, just using the dma_map_resource() interface.
> >>> Ik, but yes that does "work". Logan's series is better.
> >> No it isn't. It makes devices depend on allocating struct pages for their
> >> BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
>
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks
> of at least 2MiB or only a single 4KiB page.
>
> Oded might run into more performance problems, but those DMA-buf
> mappings are usually set up only once.
>
> >> How do you prevent direct I/O on those pages for example?
> > GUP fails.
>
> At least that is calming.
>
> >> Allocating a struct pages has their use case, for example for exposing VRAM
> >> as memory for HMM. But that is something very specific and should not limit
> >> PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> >
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
>
> Yeah, that's what I totally agree on. And I actually hoped that the new
> P2P work for PCIe would go into that direction, but that didn't
> materialized.
>
> But allocating struct pages for PCIe BARs which are essentially
> registers and not memory is much more hacky than the dma_resource_map()
> approach.
>
> To re-iterate why I think that having struct pages for those BARs is a
> bad idea: Our doorbells on AMD GPUs are write and read pointers for ring
> buffers.
>
> When you write to the BAR you essentially tell the firmware that you
> have either filled the ring buffer or read a bunch of it. This in turn
> then triggers an interrupt in the hardware/firmware which was eventually
> asleep.
>
> By using PCIe P2P we want to avoid the round trip to the CPU when one
> device has filled the ring buffer and another device must be woken up to
> process it.
>
> Think of it as MSI-X in reverse and allocating struct pages for those
> BARs just to work around the shortcomings of the DMA API makes no sense
> at all to me.
We would also like to do that *in the future*.
In Gaudi it will never be supported (due to security limitations) but
I definitely see it happening in future ASICs.

Oded

>
>
> We also do have the VRAM BAR, and for HMM we do allocate struct pages
> for the address range exposed there. But this is a different use case.
>
> Regards,
> Christian.
>
> >
> > Otherwise, we have what we have and drivers don't get to opt out. This
> > is why the stuff in AMDGPU was NAK'd.
> >
> > Jason
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23  8:57                                     ` Christian König
  (?)
@ 2021-06-23 18:24                                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:24 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Oded Gabbay, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:

> > > No it isn't. It makes devices depend on allocating struct pages for their
> > > BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
> 
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> at least 2MiB or only a single 4KiB page.

And very small apparently
 
> > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > as memory for HMM. But that is something very specific and should not limit
> > > PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> > 
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
> 
> Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> work for PCIe would go into that direction, but that didn't materialized.

It is a lot of work and the only gain is to save a bit of memory for
struct pages. Not a very big pay off.
 
> But allocating struct pages for PCIe BARs which are essentially registers
> and not memory is much more hacky than the dma_resource_map() approach.

It doesn't really matter. The pages are in a special zone and are only
being used as handles for the BAR memory.

> By using PCIe P2P we want to avoid the round trip to the CPU when one device
> has filled the ring buffer and another device must be woken up to process
> it.

Sure, we all have these scenarios, what is inside the memory doesn't
realy matter. The mechanism is generic and the struct pages don't care
much if they point at something memory-like or at something
register-like.

They are already in big trouble because you can't portably use CPU
instructions to access them anyhow.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:24                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:24 UTC (permalink / raw)
  To: Christian König
  Cc: linux-rdma, Linux Kernel Mailing List, sleybo, Gal Pressman,
	dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:

> > > No it isn't. It makes devices depend on allocating struct pages for their
> > > BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
> 
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> at least 2MiB or only a single 4KiB page.

And very small apparently
 
> > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > as memory for HMM. But that is something very specific and should not limit
> > > PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> > 
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
> 
> Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> work for PCIe would go into that direction, but that didn't materialized.

It is a lot of work and the only gain is to save a bit of memory for
struct pages. Not a very big pay off.
 
> But allocating struct pages for PCIe BARs which are essentially registers
> and not memory is much more hacky than the dma_resource_map() approach.

It doesn't really matter. The pages are in a special zone and are only
being used as handles for the BAR memory.

> By using PCIe P2P we want to avoid the round trip to the CPU when one device
> has filled the ring buffer and another device must be woken up to process
> it.

Sure, we all have these scenarios, what is inside the memory doesn't
realy matter. The mechanism is generic and the struct pages don't care
much if they point at something memory-like or at something
register-like.

They are already in big trouble because you can't portably use CPU
instructions to access them anyhow.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:24                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:24 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Linux Kernel Mailing List, sleybo,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Leon Romanovsky,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:

> > > No it isn't. It makes devices depend on allocating struct pages for their
> > > BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
> 
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> at least 2MiB or only a single 4KiB page.

And very small apparently
 
> > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > as memory for HMM. But that is something very specific and should not limit
> > > PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> > 
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
> 
> Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> work for PCIe would go into that direction, but that didn't materialized.

It is a lot of work and the only gain is to save a bit of memory for
struct pages. Not a very big pay off.
 
> But allocating struct pages for PCIe BARs which are essentially registers
> and not memory is much more hacky than the dma_resource_map() approach.

It doesn't really matter. The pages are in a special zone and are only
being used as handles for the BAR memory.

> By using PCIe P2P we want to avoid the round trip to the CPU when one device
> has filled the ring buffer and another device must be woken up to process
> it.

Sure, we all have these scenarios, what is inside the memory doesn't
realy matter. The mechanism is generic and the struct pages don't care
much if they point at something memory-like or at something
register-like.

They are already in big trouble because you can't portably use CPU
instructions to access them anyhow.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 18:24                                       ` Jason Gunthorpe
  (?)
@ 2021-06-23 18:43                                         ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 18:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:
>
> > > > No it isn't. It makes devices depend on allocating struct pages for their
> > > > BARs which is not necessary nor desired.
> > > Which dramatically reduces the cost of establishing DMA mappings, a
> > > loop of dma_map_resource() is very expensive.
> >
> > Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> > at least 2MiB or only a single 4KiB page.
>
> And very small apparently
>
> > > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > > as memory for HMM. But that is something very specific and should not limit
> > > > PCIe P2P DMA in general.
> > > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > > to work on it prefering to do hacky hacky like this.
> > >
> > > If you believe in this then remove the scatter list from dmabuf, add a
> > > new set of dma_map* APIs to work on physical addresses and all the
> > > other stuff needed.
> >
> > Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> > work for PCIe would go into that direction, but that didn't materialized.
>
> It is a lot of work and the only gain is to save a bit of memory for
> struct pages. Not a very big pay off.
>
> > But allocating struct pages for PCIe BARs which are essentially registers
> > and not memory is much more hacky than the dma_resource_map() approach.
>
> It doesn't really matter. The pages are in a special zone and are only
> being used as handles for the BAR memory.
>
> > By using PCIe P2P we want to avoid the round trip to the CPU when one device
> > has filled the ring buffer and another device must be woken up to process
> > it.
>
> Sure, we all have these scenarios, what is inside the memory doesn't
> realy matter. The mechanism is generic and the struct pages don't care
> much if they point at something memory-like or at something
> register-like.
>
> They are already in big trouble because you can't portably use CPU
> instructions to access them anyhow.
>
> Jason

Jason,
Can you please explain why it is so important to (allow) access them
through the CPU ?
In regard to p2p, where is the use-case for that ?
The whole purpose is that the other device accesses my device,
bypassing the CPU.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:43                                         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 18:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:
>
> > > > No it isn't. It makes devices depend on allocating struct pages for their
> > > > BARs which is not necessary nor desired.
> > > Which dramatically reduces the cost of establishing DMA mappings, a
> > > loop of dma_map_resource() is very expensive.
> >
> > Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> > at least 2MiB or only a single 4KiB page.
>
> And very small apparently
>
> > > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > > as memory for HMM. But that is something very specific and should not limit
> > > > PCIe P2P DMA in general.
> > > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > > to work on it prefering to do hacky hacky like this.
> > >
> > > If you believe in this then remove the scatter list from dmabuf, add a
> > > new set of dma_map* APIs to work on physical addresses and all the
> > > other stuff needed.
> >
> > Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> > work for PCIe would go into that direction, but that didn't materialized.
>
> It is a lot of work and the only gain is to save a bit of memory for
> struct pages. Not a very big pay off.
>
> > But allocating struct pages for PCIe BARs which are essentially registers
> > and not memory is much more hacky than the dma_resource_map() approach.
>
> It doesn't really matter. The pages are in a special zone and are only
> being used as handles for the BAR memory.
>
> > By using PCIe P2P we want to avoid the round trip to the CPU when one device
> > has filled the ring buffer and another device must be woken up to process
> > it.
>
> Sure, we all have these scenarios, what is inside the memory doesn't
> realy matter. The mechanism is generic and the struct pages don't care
> much if they point at something memory-like or at something
> register-like.
>
> They are already in big trouble because you can't portably use CPU
> instructions to access them anyhow.
>
> Jason

Jason,
Can you please explain why it is so important to (allow) access them
through the CPU ?
In regard to p2p, where is the use-case for that ?
The whole purpose is that the other device accesses my device,
bypassing the CPU.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:43                                         ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 18:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:
>
> > > > No it isn't. It makes devices depend on allocating struct pages for their
> > > > BARs which is not necessary nor desired.
> > > Which dramatically reduces the cost of establishing DMA mappings, a
> > > loop of dma_map_resource() is very expensive.
> >
> > Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of
> > at least 2MiB or only a single 4KiB page.
>
> And very small apparently
>
> > > > Allocating a struct pages has their use case, for example for exposing VRAM
> > > > as memory for HMM. But that is something very specific and should not limit
> > > > PCIe P2P DMA in general.
> > > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > > to work on it prefering to do hacky hacky like this.
> > >
> > > If you believe in this then remove the scatter list from dmabuf, add a
> > > new set of dma_map* APIs to work on physical addresses and all the
> > > other stuff needed.
> >
> > Yeah, that's what I totally agree on. And I actually hoped that the new P2P
> > work for PCIe would go into that direction, but that didn't materialized.
>
> It is a lot of work and the only gain is to save a bit of memory for
> struct pages. Not a very big pay off.
>
> > But allocating struct pages for PCIe BARs which are essentially registers
> > and not memory is much more hacky than the dma_resource_map() approach.
>
> It doesn't really matter. The pages are in a special zone and are only
> being used as handles for the BAR memory.
>
> > By using PCIe P2P we want to avoid the round trip to the CPU when one device
> > has filled the ring buffer and another device must be woken up to process
> > it.
>
> Sure, we all have these scenarios, what is inside the memory doesn't
> realy matter. The mechanism is generic and the struct pages don't care
> much if they point at something memory-like or at something
> register-like.
>
> They are already in big trouble because you can't portably use CPU
> instructions to access them anyhow.
>
> Jason

Jason,
Can you please explain why it is so important to (allow) access them
through the CPU ?
In regard to p2p, where is the use-case for that ?
The whole purpose is that the other device accesses my device,
bypassing the CPU.

Thanks,
Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 18:43                                         ` Oded Gabbay
  (?)
@ 2021-06-23 18:50                                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:50 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:

> Can you please explain why it is so important to (allow) access them
> through the CPU ?

It is not so much important, as it reflects significant design choices
that are already tightly baked into alot of our stacks. 

A SGL is CPU accessible by design - that is baked into this thing and
places all over the place assume it. Even in RDMA we have
RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
to see)

So, the thing at the top of the stack - in this case the gaudi driver
- simply can't assume what the rest of the stack is going to do and
omit the CPU side. It breaks everything.

Logan's patch series is the most fully developed way out of this
predicament so far.

> The whole purpose is that the other device accesses my device,
> bypassing the CPU.

Sure, but you don't know that will happen, or if it is even possible
in any given system configuration. The purpose is to allow for that
optimization when possible, not exclude CPU based approaches.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:50                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:50 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:

> Can you please explain why it is so important to (allow) access them
> through the CPU ?

It is not so much important, as it reflects significant design choices
that are already tightly baked into alot of our stacks. 

A SGL is CPU accessible by design - that is baked into this thing and
places all over the place assume it. Even in RDMA we have
RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
to see)

So, the thing at the top of the stack - in this case the gaudi driver
- simply can't assume what the rest of the stack is going to do and
omit the CPU side. It breaks everything.

Logan's patch series is the most fully developed way out of this
predicament so far.

> The whole purpose is that the other device accesses my device,
> bypassing the CPU.

Sure, but you don't know that will happen, or if it is even possible
in any given system configuration. The purpose is to allow for that
optimization when possible, not exclude CPU based approaches.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 18:50                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 18:50 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:

> Can you please explain why it is so important to (allow) access them
> through the CPU ?

It is not so much important, as it reflects significant design choices
that are already tightly baked into alot of our stacks. 

A SGL is CPU accessible by design - that is baked into this thing and
places all over the place assume it. Even in RDMA we have
RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
to see)

So, the thing at the top of the stack - in this case the gaudi driver
- simply can't assume what the rest of the stack is going to do and
omit the CPU side. It breaks everything.

Logan's patch series is the most fully developed way out of this
predicament so far.

> The whole purpose is that the other device accesses my device,
> bypassing the CPU.

Sure, but you don't know that will happen, or if it is even possible
in any given system configuration. The purpose is to allow for that
optimization when possible, not exclude CPU based approaches.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 18:50                                           ` Jason Gunthorpe
  (?)
@ 2021-06-23 19:00                                             ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
>
> > Can you please explain why it is so important to (allow) access them
> > through the CPU ?
>
> It is not so much important, as it reflects significant design choices
> that are already tightly baked into alot of our stacks.
>
> A SGL is CPU accessible by design - that is baked into this thing and
> places all over the place assume it. Even in RDMA we have
> RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> to see)
>
> So, the thing at the top of the stack - in this case the gaudi driver
> - simply can't assume what the rest of the stack is going to do and
> omit the CPU side. It breaks everything.
>
> Logan's patch series is the most fully developed way out of this
> predicament so far.

I understand the argument and I agree that for the generic case, the
top of the stack can't assume anything.
Having said that, in this case the SGL is encapsulated inside a dma-buf object.

Maybe its a stupid/over-simplified suggestion, but can't we add a
property to the dma-buf object,
that will be set by the exporter, which will "tell" the importer it
can't use any CPU fallback ? Only "real" p2p ?
Won't that solve the problem by eliminating the unsupported access methods ?

Oded

>
> > The whole purpose is that the other device accesses my device,
> > bypassing the CPU.
>
> Sure, but you don't know that will happen, or if it is even possible
> in any given system configuration. The purpose is to allow for that
> optimization when possible, not exclude CPU based approaches.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:00                                             ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
>
> > Can you please explain why it is so important to (allow) access them
> > through the CPU ?
>
> It is not so much important, as it reflects significant design choices
> that are already tightly baked into alot of our stacks.
>
> A SGL is CPU accessible by design - that is baked into this thing and
> places all over the place assume it. Even in RDMA we have
> RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> to see)
>
> So, the thing at the top of the stack - in this case the gaudi driver
> - simply can't assume what the rest of the stack is going to do and
> omit the CPU side. It breaks everything.
>
> Logan's patch series is the most fully developed way out of this
> predicament so far.

I understand the argument and I agree that for the generic case, the
top of the stack can't assume anything.
Having said that, in this case the SGL is encapsulated inside a dma-buf object.

Maybe its a stupid/over-simplified suggestion, but can't we add a
property to the dma-buf object,
that will be set by the exporter, which will "tell" the importer it
can't use any CPU fallback ? Only "real" p2p ?
Won't that solve the problem by eliminating the unsupported access methods ?

Oded

>
> > The whole purpose is that the other device accesses my device,
> > bypassing the CPU.
>
> Sure, but you don't know that will happen, or if it is even possible
> in any given system configuration. The purpose is to allow for that
> optimization when possible, not exclude CPU based approaches.
>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:00                                             ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
>
> > Can you please explain why it is so important to (allow) access them
> > through the CPU ?
>
> It is not so much important, as it reflects significant design choices
> that are already tightly baked into alot of our stacks.
>
> A SGL is CPU accessible by design - that is baked into this thing and
> places all over the place assume it. Even in RDMA we have
> RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> to see)
>
> So, the thing at the top of the stack - in this case the gaudi driver
> - simply can't assume what the rest of the stack is going to do and
> omit the CPU side. It breaks everything.
>
> Logan's patch series is the most fully developed way out of this
> predicament so far.

I understand the argument and I agree that for the generic case, the
top of the stack can't assume anything.
Having said that, in this case the SGL is encapsulated inside a dma-buf object.

Maybe its a stupid/over-simplified suggestion, but can't we add a
property to the dma-buf object,
that will be set by the exporter, which will "tell" the importer it
can't use any CPU fallback ? Only "real" p2p ?
Won't that solve the problem by eliminating the unsupported access methods ?

Oded

>
> > The whole purpose is that the other device accesses my device,
> > bypassing the CPU.
>
> Sure, but you don't know that will happen, or if it is even possible
> in any given system configuration. The purpose is to allow for that
> optimization when possible, not exclude CPU based approaches.
>
> Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 19:00                                             ` Oded Gabbay
  (?)
@ 2021-06-23 19:34                                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 19:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> >
> > > Can you please explain why it is so important to (allow) access them
> > > through the CPU ?
> >
> > It is not so much important, as it reflects significant design choices
> > that are already tightly baked into alot of our stacks.
> >
> > A SGL is CPU accessible by design - that is baked into this thing and
> > places all over the place assume it. Even in RDMA we have
> > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > to see)
> >
> > So, the thing at the top of the stack - in this case the gaudi driver
> > - simply can't assume what the rest of the stack is going to do and
> > omit the CPU side. It breaks everything.
> >
> > Logan's patch series is the most fully developed way out of this
> > predicament so far.
> 
> I understand the argument and I agree that for the generic case, the
> top of the stack can't assume anything.
> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
>
> Maybe its a stupid/over-simplified suggestion, but can't we add a
> property to the dma-buf object,
> that will be set by the exporter, which will "tell" the importer it
> can't use any CPU fallback ? Only "real" p2p ?

The block stack has been trying to do something like this.

The flag doesn't solve the DMA API/IOMMU problems though.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:34                                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 19:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> >
> > > Can you please explain why it is so important to (allow) access them
> > > through the CPU ?
> >
> > It is not so much important, as it reflects significant design choices
> > that are already tightly baked into alot of our stacks.
> >
> > A SGL is CPU accessible by design - that is baked into this thing and
> > places all over the place assume it. Even in RDMA we have
> > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > to see)
> >
> > So, the thing at the top of the stack - in this case the gaudi driver
> > - simply can't assume what the rest of the stack is going to do and
> > omit the CPU side. It breaks everything.
> >
> > Logan's patch series is the most fully developed way out of this
> > predicament so far.
> 
> I understand the argument and I agree that for the generic case, the
> top of the stack can't assume anything.
> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
>
> Maybe its a stupid/over-simplified suggestion, but can't we add a
> property to the dma-buf object,
> that will be set by the exporter, which will "tell" the importer it
> can't use any CPU fallback ? Only "real" p2p ?

The block stack has been trying to do something like this.

The flag doesn't solve the DMA API/IOMMU problems though.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:34                                               ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-23 19:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> >
> > > Can you please explain why it is so important to (allow) access them
> > > through the CPU ?
> >
> > It is not so much important, as it reflects significant design choices
> > that are already tightly baked into alot of our stacks.
> >
> > A SGL is CPU accessible by design - that is baked into this thing and
> > places all over the place assume it. Even in RDMA we have
> > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > to see)
> >
> > So, the thing at the top of the stack - in this case the gaudi driver
> > - simply can't assume what the rest of the stack is going to do and
> > omit the CPU side. It breaks everything.
> >
> > Logan's patch series is the most fully developed way out of this
> > predicament so far.
> 
> I understand the argument and I agree that for the generic case, the
> top of the stack can't assume anything.
> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
>
> Maybe its a stupid/over-simplified suggestion, but can't we add a
> property to the dma-buf object,
> that will be set by the exporter, which will "tell" the importer it
> can't use any CPU fallback ? Only "real" p2p ?

The block stack has been trying to do something like this.

The flag doesn't solve the DMA API/IOMMU problems though.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 19:34                                               ` Jason Gunthorpe
  (?)
@ 2021-06-23 19:39                                                 ` Oded Gabbay
  -1 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > >
> > > > Can you please explain why it is so important to (allow) access them
> > > > through the CPU ?
> > >
> > > It is not so much important, as it reflects significant design choices
> > > that are already tightly baked into alot of our stacks.
> > >
> > > A SGL is CPU accessible by design - that is baked into this thing and
> > > places all over the place assume it. Even in RDMA we have
> > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > to see)
> > >
> > > So, the thing at the top of the stack - in this case the gaudi driver
> > > - simply can't assume what the rest of the stack is going to do and
> > > omit the CPU side. It breaks everything.
> > >
> > > Logan's patch series is the most fully developed way out of this
> > > predicament so far.
> >
> > I understand the argument and I agree that for the generic case, the
> > top of the stack can't assume anything.
> > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> >
> > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > property to the dma-buf object,
> > that will be set by the exporter, which will "tell" the importer it
> > can't use any CPU fallback ? Only "real" p2p ?
>
> The block stack has been trying to do something like this.
>
> The flag doesn't solve the DMA API/IOMMU problems though.
hmm, I thought using dma_map_resource will solve the IOMMU issues, no ?
We talked about it yesterday, and you said that it will "work"
(although I noticed a tone of reluctance when you said that).

If I use dma_map_resource to set the addresses inside the SGL before I
export the dma-buf, and guarantee no one will use the SGL in the
dma-buf for any other purpose than device p2p, what else is needed ?

Oded

>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:39                                                 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > >
> > > > Can you please explain why it is so important to (allow) access them
> > > > through the CPU ?
> > >
> > > It is not so much important, as it reflects significant design choices
> > > that are already tightly baked into alot of our stacks.
> > >
> > > A SGL is CPU accessible by design - that is baked into this thing and
> > > places all over the place assume it. Even in RDMA we have
> > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > to see)
> > >
> > > So, the thing at the top of the stack - in this case the gaudi driver
> > > - simply can't assume what the rest of the stack is going to do and
> > > omit the CPU side. It breaks everything.
> > >
> > > Logan's patch series is the most fully developed way out of this
> > > predicament so far.
> >
> > I understand the argument and I agree that for the generic case, the
> > top of the stack can't assume anything.
> > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> >
> > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > property to the dma-buf object,
> > that will be set by the exporter, which will "tell" the importer it
> > can't use any CPU fallback ? Only "real" p2p ?
>
> The block stack has been trying to do something like this.
>
> The flag doesn't solve the DMA API/IOMMU problems though.
hmm, I thought using dma_map_resource will solve the IOMMU issues, no ?
We talked about it yesterday, and you said that it will "work"
(although I noticed a tone of reluctance when you said that).

If I use dma_map_resource to set the addresses inside the SGL before I
export the dma-buf, and guarantee no one will use the SGL in the
dma-buf for any other purpose than device p2p, what else is needed ?

Oded

>
> Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-23 19:39                                                 ` Oded Gabbay
  0 siblings, 0 replies; 143+ messages in thread
From: Oded Gabbay @ 2021-06-23 19:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > >
> > > > Can you please explain why it is so important to (allow) access them
> > > > through the CPU ?
> > >
> > > It is not so much important, as it reflects significant design choices
> > > that are already tightly baked into alot of our stacks.
> > >
> > > A SGL is CPU accessible by design - that is baked into this thing and
> > > places all over the place assume it. Even in RDMA we have
> > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > to see)
> > >
> > > So, the thing at the top of the stack - in this case the gaudi driver
> > > - simply can't assume what the rest of the stack is going to do and
> > > omit the CPU side. It breaks everything.
> > >
> > > Logan's patch series is the most fully developed way out of this
> > > predicament so far.
> >
> > I understand the argument and I agree that for the generic case, the
> > top of the stack can't assume anything.
> > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> >
> > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > property to the dma-buf object,
> > that will be set by the exporter, which will "tell" the importer it
> > can't use any CPU fallback ? Only "real" p2p ?
>
> The block stack has been trying to do something like this.
>
> The flag doesn't solve the DMA API/IOMMU problems though.
hmm, I thought using dma_map_resource will solve the IOMMU issues, no ?
We talked about it yesterday, and you said that it will "work"
(although I noticed a tone of reluctance when you said that).

If I use dma_map_resource to set the addresses inside the SGL before I
export the dma-buf, and guarantee no one will use the SGL in the
dma-buf for any other purpose than device p2p, what else is needed ?

Oded

>
> Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 19:39                                                 ` Oded Gabbay
  (?)
@ 2021-06-24  0:45                                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-24  0:45 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christian König, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:39:48PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > > >
> > > > > Can you please explain why it is so important to (allow) access them
> > > > > through the CPU ?
> > > >
> > > > It is not so much important, as it reflects significant design choices
> > > > that are already tightly baked into alot of our stacks.
> > > >
> > > > A SGL is CPU accessible by design - that is baked into this thing and
> > > > places all over the place assume it. Even in RDMA we have
> > > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > > to see)
> > > >
> > > > So, the thing at the top of the stack - in this case the gaudi driver
> > > > - simply can't assume what the rest of the stack is going to do and
> > > > omit the CPU side. It breaks everything.
> > > >
> > > > Logan's patch series is the most fully developed way out of this
> > > > predicament so far.
> > >
> > > I understand the argument and I agree that for the generic case, the
> > > top of the stack can't assume anything.
> > > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> > >
> > > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > > property to the dma-buf object,
> > > that will be set by the exporter, which will "tell" the importer it
> > > can't use any CPU fallback ? Only "real" p2p ?
> >
> > The block stack has been trying to do something like this.
> >
> > The flag doesn't solve the DMA API/IOMMU problems though.
> hmm, I thought using dma_map_resource will solve the IOMMU issues,
> no ?

dma_map_resource() will configure the IOMMU but it is not the correct
API to use when building a SG list for DMA, that would be dma_map_sg
or sgtable.

So it works, but it is an API abuse to build things this way.

> If I use dma_map_resource to set the addresses inside the SGL before I
> export the dma-buf, and guarantee no one will use the SGL in the
> dma-buf for any other purpose than device p2p, what else is needed ?

You still have to check the p2p stuff to ensure that p2p is even
possible

And this approach is misusing all the APIs and has been NAK'd by
Christoph, so up to Greg if he wants to take it or insist you work
with Logan to get the proper generlized solution finished.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  0:45                                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-24  0:45 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:39:48PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > > >
> > > > > Can you please explain why it is so important to (allow) access them
> > > > > through the CPU ?
> > > >
> > > > It is not so much important, as it reflects significant design choices
> > > > that are already tightly baked into alot of our stacks.
> > > >
> > > > A SGL is CPU accessible by design - that is baked into this thing and
> > > > places all over the place assume it. Even in RDMA we have
> > > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > > to see)
> > > >
> > > > So, the thing at the top of the stack - in this case the gaudi driver
> > > > - simply can't assume what the rest of the stack is going to do and
> > > > omit the CPU side. It breaks everything.
> > > >
> > > > Logan's patch series is the most fully developed way out of this
> > > > predicament so far.
> > >
> > > I understand the argument and I agree that for the generic case, the
> > > top of the stack can't assume anything.
> > > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> > >
> > > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > > property to the dma-buf object,
> > > that will be set by the exporter, which will "tell" the importer it
> > > can't use any CPU fallback ? Only "real" p2p ?
> >
> > The block stack has been trying to do something like this.
> >
> > The flag doesn't solve the DMA API/IOMMU problems though.
> hmm, I thought using dma_map_resource will solve the IOMMU issues,
> no ?

dma_map_resource() will configure the IOMMU but it is not the correct
API to use when building a SG list for DMA, that would be dma_map_sg
or sgtable.

So it works, but it is an API abuse to build things this way.

> If I use dma_map_resource to set the addresses inside the SGL before I
> export the dma-buf, and guarantee no one will use the SGL in the
> dma-buf for any other purpose than device p2p, what else is needed ?

You still have to check the p2p stuff to ensure that p2p is even
possible

And this approach is misusing all the APIs and has been NAK'd by
Christoph, so up to Greg if he wants to take it or insist you work
with Logan to get the proper generlized solution finished.

Jason

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  0:45                                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 143+ messages in thread
From: Jason Gunthorpe @ 2021-06-24  0:45 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Leon Romanovsky,
	Gal Pressman, dri-devel, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Christoph Hellwig, Oded Gabbay, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:39:48PM +0300, Oded Gabbay wrote:
> On Wed, Jun 23, 2021 at 10:34 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> > > On Wed, Jun 23, 2021 at 9:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Jun 23, 2021 at 09:43:04PM +0300, Oded Gabbay wrote:
> > > >
> > > > > Can you please explain why it is so important to (allow) access them
> > > > > through the CPU ?
> > > >
> > > > It is not so much important, as it reflects significant design choices
> > > > that are already tightly baked into alot of our stacks.
> > > >
> > > > A SGL is CPU accessible by design - that is baked into this thing and
> > > > places all over the place assume it. Even in RDMA we have
> > > > RXE/SWI/HFI1/qib that might want to use the CPU side (grep for sg_page
> > > > to see)
> > > >
> > > > So, the thing at the top of the stack - in this case the gaudi driver
> > > > - simply can't assume what the rest of the stack is going to do and
> > > > omit the CPU side. It breaks everything.
> > > >
> > > > Logan's patch series is the most fully developed way out of this
> > > > predicament so far.
> > >
> > > I understand the argument and I agree that for the generic case, the
> > > top of the stack can't assume anything.
> > > Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> > >
> > > Maybe its a stupid/over-simplified suggestion, but can't we add a
> > > property to the dma-buf object,
> > > that will be set by the exporter, which will "tell" the importer it
> > > can't use any CPU fallback ? Only "real" p2p ?
> >
> > The block stack has been trying to do something like this.
> >
> > The flag doesn't solve the DMA API/IOMMU problems though.
> hmm, I thought using dma_map_resource will solve the IOMMU issues,
> no ?

dma_map_resource() will configure the IOMMU but it is not the correct
API to use when building a SG list for DMA, that would be dma_map_sg
or sgtable.

So it works, but it is an API abuse to build things this way.

> If I use dma_map_resource to set the addresses inside the SGL before I
> export the dma-buf, and guarantee no one will use the SGL in the
> dma-buf for any other purpose than device p2p, what else is needed ?

You still have to check the p2p stuff to ensure that p2p is even
possible

And this approach is misusing all the APIs and has been NAK'd by
Christoph, so up to Greg if he wants to take it or insist you work
with Logan to get the proper generlized solution finished.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 19:00                                             ` Oded Gabbay
@ 2021-06-24  5:34                                               ` Christoph Hellwig
  -1 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  5:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jason Gunthorpe, Christian König, Christian König,
	Gal Pressman, sleybo, linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> I understand the argument and I agree that for the generic case, the
> top of the stack can't assume anything.
> Having said that, in this case the SGL is encapsulated inside a dma-buf object.

But the scatterlist is defined to have a valid page.  If in dma-bufs you
can't do that dmabufs are completely broken.  Apparently the gpu folks
can somehow live with that and deal with the pitfals, but for dma-buf
users outside of their little fiefdom were they arbitrarily break rules
it simply is not acceptable.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  5:34                                               ` Christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  5:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Oded Gabbay,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christian König, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
> I understand the argument and I agree that for the generic case, the
> top of the stack can't assume anything.
> Having said that, in this case the SGL is encapsulated inside a dma-buf object.

But the scatterlist is defined to have a valid page.  If in dma-bufs you
can't do that dmabufs are completely broken.  Apparently the gpu folks
can somehow live with that and deal with the pitfals, but for dma-buf
users outside of their little fiefdom were they arbitrarily break rules
it simply is not acceptable.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-23 19:39                                                 ` Oded Gabbay
@ 2021-06-24  5:40                                                   ` Christoph Hellwig
  -1 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  5:40 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jason Gunthorpe, Christian König, Christian König,
	Gal Pressman, sleybo, linux-rdma, Oded Gabbay, Christoph Hellwig,
	Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:39:48PM +0300, Oded Gabbay wrote:
> hmm, I thought using dma_map_resource will solve the IOMMU issues, no ?
> We talked about it yesterday, and you said that it will "work"
> (although I noticed a tone of reluctance when you said that).
> 
> If I use dma_map_resource to set the addresses inside the SGL before I
> export the dma-buf, and guarantee no one will use the SGL in the
> dma-buf for any other purpose than device p2p, what else is needed ?

dma_map_resource works in the sense of that helps with mapping an
arbitrary phys_addr_t for DMA.  It does not take various pitfalls of
PCI P2P into account, such as the offset between the CPU physical
address and the PCIe bus address, or the whole support of mapping between
two devices behding a switch and not going through the limited root
port support.

Comparing dma_direct_map_resource/iommu_dma_map_resource with
with pci_p2pdma_map_sg_attrs/__pci_p2pdma_map_sg should make that
very clear.

So if you want a non-page based mapping you need a "resource"-level
version of pci_p2pdma_map_sg_attrs.  Which totall doable, and in fact
mostly trivial.  But no one has even looked into providing one and just
keeps arguing.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  5:40                                                   ` Christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  5:40 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Oded Gabbay,
	Gal Pressman, dri-devel, Christoph Hellwig,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christian König, Linux Kernel Mailing List,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 23, 2021 at 10:39:48PM +0300, Oded Gabbay wrote:
> hmm, I thought using dma_map_resource will solve the IOMMU issues, no ?
> We talked about it yesterday, and you said that it will "work"
> (although I noticed a tone of reluctance when you said that).
> 
> If I use dma_map_resource to set the addresses inside the SGL before I
> export the dma-buf, and guarantee no one will use the SGL in the
> dma-buf for any other purpose than device p2p, what else is needed ?

dma_map_resource works in the sense of that helps with mapping an
arbitrary phys_addr_t for DMA.  It does not take various pitfalls of
PCI P2P into account, such as the offset between the CPU physical
address and the PCIe bus address, or the whole support of mapping between
two devices behding a switch and not going through the limited root
port support.

Comparing dma_direct_map_resource/iommu_dma_map_resource with
with pci_p2pdma_map_sg_attrs/__pci_p2pdma_map_sg should make that
very clear.

So if you want a non-page based mapping you need a "resource"-level
version of pci_p2pdma_map_sg_attrs.  Which totall doable, and in fact
mostly trivial.  But no one has even looked into providing one and just
keeps arguing.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-24  5:34                                               ` Christoph Hellwig
  (?)
@ 2021-06-24  8:07                                                 ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  8:07 UTC (permalink / raw)
  To: Christoph Hellwig, Oded Gabbay
  Cc: Jason Gunthorpe, Christian König, Gal Pressman, sleybo,
	linux-rdma, Oded Gabbay, Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 07:34 schrieb Christoph Hellwig:
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
>> I understand the argument and I agree that for the generic case, the
>> top of the stack can't assume anything.
>> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> But the scatterlist is defined to have a valid page.  If in dma-bufs you
> can't do that dmabufs are completely broken.  Apparently the gpu folks
> can somehow live with that and deal with the pitfals, but for dma-buf
> users outside of their little fiefdom were they arbitrarily break rules
> it simply is not acceptable.

The key point is that accessing the underlying pages even when DMA-bufs 
are backed by system memory is illegal. Daniel even created a patch 
which mangles the page pointers in sg_tables used by DMA-buf to make 
sure that people don't try to use them.

So the conclusion is that using sg_table in the DMA-buf framework was 
just the wrong data structure and we should have invented a new one.

But then people would have complained that we have a duplicated 
infrastructure (which is essentially true).

My best plan to get out of this mess is that we change the DMA-buf 
interface to use an array of dma_addresses instead of the sg_table 
object and I have already been working on this actively the last few month.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  8:07                                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  8:07 UTC (permalink / raw)
  To: Christoph Hellwig, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 07:34 schrieb Christoph Hellwig:
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
>> I understand the argument and I agree that for the generic case, the
>> top of the stack can't assume anything.
>> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> But the scatterlist is defined to have a valid page.  If in dma-bufs you
> can't do that dmabufs are completely broken.  Apparently the gpu folks
> can somehow live with that and deal with the pitfals, but for dma-buf
> users outside of their little fiefdom were they arbitrarily break rules
> it simply is not acceptable.

The key point is that accessing the underlying pages even when DMA-bufs 
are backed by system memory is illegal. Daniel even created a patch 
which mangles the page pointers in sg_tables used by DMA-buf to make 
sure that people don't try to use them.

So the conclusion is that using sg_table in the DMA-buf framework was 
just the wrong data structure and we should have invented a new one.

But then people would have complained that we have a duplicated 
infrastructure (which is essentially true).

My best plan to get out of this mess is that we change the DMA-buf 
interface to use an array of dma_addresses instead of the sg_table 
object and I have already been working on this actively the last few month.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  8:07                                                 ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  8:07 UTC (permalink / raw)
  To: Christoph Hellwig, Oded Gabbay
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 07:34 schrieb Christoph Hellwig:
> On Wed, Jun 23, 2021 at 10:00:29PM +0300, Oded Gabbay wrote:
>> I understand the argument and I agree that for the generic case, the
>> top of the stack can't assume anything.
>> Having said that, in this case the SGL is encapsulated inside a dma-buf object.
> But the scatterlist is defined to have a valid page.  If in dma-bufs you
> can't do that dmabufs are completely broken.  Apparently the gpu folks
> can somehow live with that and deal with the pitfals, but for dma-buf
> users outside of their little fiefdom were they arbitrarily break rules
> it simply is not acceptable.

The key point is that accessing the underlying pages even when DMA-bufs 
are backed by system memory is illegal. Daniel even created a patch 
which mangles the page pointers in sg_tables used by DMA-buf to make 
sure that people don't try to use them.

So the conclusion is that using sg_table in the DMA-buf framework was 
just the wrong data structure and we should have invented a new one.

But then people would have complained that we have a duplicated 
infrastructure (which is essentially true).

My best plan to get out of this mess is that we change the DMA-buf 
interface to use an array of dma_addresses instead of the sg_table 
object and I have already been working on this actively the last few month.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-24  8:07                                                 ` Christian König
@ 2021-06-24  8:12                                                   ` Christoph Hellwig
  -1 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  8:12 UTC (permalink / raw)
  To: Christian König
  Cc: Christoph Hellwig, Oded Gabbay, Jason Gunthorpe,
	Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 24, 2021 at 10:07:14AM +0200, Christian König wrote:
> The key point is that accessing the underlying pages even when DMA-bufs are 
> backed by system memory is illegal. Daniel even created a patch which 
> mangles the page pointers in sg_tables used by DMA-buf to make sure that 
> people don't try to use them.

Which is another goddamn layering violation of a subsystem that has no
business at all poking into the scatterlist structure, yes.

> So the conclusion is that using sg_table in the DMA-buf framework was just 
> the wrong data structure and we should have invented a new one.

I think so.

> But then people would have complained that we have a duplicated 
> infrastructure (which is essentially true).

I doubt it.  At least if you had actually talked to the relevant people.
Which seems to be a major issue with what is going on GPU land.

> My best plan to get out of this mess is that we change the DMA-buf 
> interface to use an array of dma_addresses instead of the sg_table object 
> and I have already been working on this actively the last few month.

Awesome!  I have a bit of related work on the DMA mapping subsystems, so
let's sync up as soon as you have some first sketches.

Btw, one thing I noticed when looking over the dma-buf instances is that
there is a lot of duplicated code for creating a sg_table from pages,
and then mapping it.  It would be good if we could move toward common
helpers instead of duplicating that all over again.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  8:12                                                   ` Christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24  8:12 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Oded Gabbay, Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christoph Hellwig,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 24, 2021 at 10:07:14AM +0200, Christian König wrote:
> The key point is that accessing the underlying pages even when DMA-bufs are 
> backed by system memory is illegal. Daniel even created a patch which 
> mangles the page pointers in sg_tables used by DMA-buf to make sure that 
> people don't try to use them.

Which is another goddamn layering violation of a subsystem that has no
business at all poking into the scatterlist structure, yes.

> So the conclusion is that using sg_table in the DMA-buf framework was just 
> the wrong data structure and we should have invented a new one.

I think so.

> But then people would have complained that we have a duplicated 
> infrastructure (which is essentially true).

I doubt it.  At least if you had actually talked to the relevant people.
Which seems to be a major issue with what is going on GPU land.

> My best plan to get out of this mess is that we change the DMA-buf 
> interface to use an array of dma_addresses instead of the sg_table object 
> and I have already been working on this actively the last few month.

Awesome!  I have a bit of related work on the DMA mapping subsystems, so
let's sync up as soon as you have some first sketches.

Btw, one thing I noticed when looking over the dma-buf instances is that
there is a lot of duplicated code for creating a sg_table from pages,
and then mapping it.  It would be good if we could move toward common
helpers instead of duplicating that all over again.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-24  8:12                                                   ` Christoph Hellwig
  (?)
@ 2021-06-24  9:52                                                     ` Christian König
  -1 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  9:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oded Gabbay, Jason Gunthorpe, Christian König, Gal Pressman,
	sleybo, linux-rdma, Oded Gabbay, Linux Kernel Mailing List,
	dri-devel, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 10:12 schrieb Christoph Hellwig:
> On Thu, Jun 24, 2021 at 10:07:14AM +0200, Christian König wrote:
>> The key point is that accessing the underlying pages even when DMA-bufs are
>> backed by system memory is illegal. Daniel even created a patch which
>> mangles the page pointers in sg_tables used by DMA-buf to make sure that
>> people don't try to use them.
> Which is another goddamn layering violation of a subsystem that has no
> business at all poking into the scatterlist structure, yes.

Completely agree, but it is also the easiest way to get away from the 
scatterlist as trasnport vehicle for the dma_addresses.

[SNIP]

>> My best plan to get out of this mess is that we change the DMA-buf
>> interface to use an array of dma_addresses instead of the sg_table object
>> and I have already been working on this actively the last few month.
> Awesome!  I have a bit of related work on the DMA mapping subsystems, so
> let's sync up as soon as you have some first sketches.

Don't start cheering to fast.

I've already converted a bunch of the GPU drivers, but there are at 
least 6 GPU still needing to be fixed and on top of that comes VA-API 
and a few others.

What are your plans for the DMA mapping subsystem?

> Btw, one thing I noticed when looking over the dma-buf instances is that
> there is a lot of duplicated code for creating a sg_table from pages,
> and then mapping it.  It would be good if we could move toward common
> helpers instead of duplicating that all over again.

Can you give an example?

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  9:52                                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  9:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rdma, Christian König, sleybo, Gal Pressman,
	dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 10:12 schrieb Christoph Hellwig:
> On Thu, Jun 24, 2021 at 10:07:14AM +0200, Christian König wrote:
>> The key point is that accessing the underlying pages even when DMA-bufs are
>> backed by system memory is illegal. Daniel even created a patch which
>> mangles the page pointers in sg_tables used by DMA-buf to make sure that
>> people don't try to use them.
> Which is another goddamn layering violation of a subsystem that has no
> business at all poking into the scatterlist structure, yes.

Completely agree, but it is also the easiest way to get away from the 
scatterlist as trasnport vehicle for the dma_addresses.

[SNIP]

>> My best plan to get out of this mess is that we change the DMA-buf
>> interface to use an array of dma_addresses instead of the sg_table object
>> and I have already been working on this actively the last few month.
> Awesome!  I have a bit of related work on the DMA mapping subsystems, so
> let's sync up as soon as you have some first sketches.

Don't start cheering to fast.

I've already converted a bunch of the GPU drivers, but there are at 
least 6 GPU still needing to be fixed and on top of that comes VA-API 
and a few others.

What are your plans for the DMA mapping subsystem?

> Btw, one thing I noticed when looking over the dma-buf instances is that
> there is a lot of duplicated code for creating a sg_table from pages,
> and then mapping it.  It would be good if we could move toward common
> helpers instead of duplicating that all over again.

Can you give an example?

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24  9:52                                                     ` Christian König
  0 siblings, 0 replies; 143+ messages in thread
From: Christian König @ 2021-06-24  9:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Oded Gabbay,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 24.06.21 um 10:12 schrieb Christoph Hellwig:
> On Thu, Jun 24, 2021 at 10:07:14AM +0200, Christian König wrote:
>> The key point is that accessing the underlying pages even when DMA-bufs are
>> backed by system memory is illegal. Daniel even created a patch which
>> mangles the page pointers in sg_tables used by DMA-buf to make sure that
>> people don't try to use them.
> Which is another goddamn layering violation of a subsystem that has no
> business at all poking into the scatterlist structure, yes.

Completely agree, but it is also the easiest way to get away from the 
scatterlist as trasnport vehicle for the dma_addresses.

[SNIP]

>> My best plan to get out of this mess is that we change the DMA-buf
>> interface to use an array of dma_addresses instead of the sg_table object
>> and I have already been working on this actively the last few month.
> Awesome!  I have a bit of related work on the DMA mapping subsystems, so
> let's sync up as soon as you have some first sketches.

Don't start cheering to fast.

I've already converted a bunch of the GPU drivers, but there are at 
least 6 GPU still needing to be fixed and on top of that comes VA-API 
and a few others.

What are your plans for the DMA mapping subsystem?

> Btw, one thing I noticed when looking over the dma-buf instances is that
> there is a lot of duplicated code for creating a sg_table from pages,
> and then mapping it.  It would be good if we could move toward common
> helpers instead of duplicating that all over again.

Can you give an example?

Thanks,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
  2021-06-24  9:52                                                     ` Christian König
@ 2021-06-24 13:22                                                       ` Christoph Hellwig
  -1 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24 13:22 UTC (permalink / raw)
  To: Christian König
  Cc: Christoph Hellwig, Oded Gabbay, Jason Gunthorpe,
	Christian König, Gal Pressman, sleybo, linux-rdma,
	Oded Gabbay, Linux Kernel Mailing List, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Doug Ledford,
	Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 24, 2021 at 11:52:47AM +0200, Christian König wrote:
> I've already converted a bunch of the GPU drivers, but there are at least 6 
> GPU still needing to be fixed and on top of that comes VA-API and a few 
> others.
>
> What are your plans for the DMA mapping subsystem?

Building a new API that allows batched DMA mapping without the scatterlist.
The main input for my use case would be bio_vecs, but I plan to make it
a little flexible, and the output would be a list of [dma_addr,len]
tuples, with the API being flexible enough to just return a single
[dma_addr,len] for the common IOMMU coalescing case.

>
>> Btw, one thing I noticed when looking over the dma-buf instances is that
>> there is a lot of duplicated code for creating a sg_table from pages,
>> and then mapping it.  It would be good if we could move toward common
>> helpers instead of duplicating that all over again.
>
> Can you give an example?

Take a look at the get_sg_table and put_sg_table helpers in udmabuf.
Those would also be useful in armda, i915, tegra, gntdev-dmabuf, mbochs
in one form or another.

Similar for variants that use a contigous regions.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF
@ 2021-06-24 13:22                                                       ` Christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: Christoph Hellwig @ 2021-06-24 13:22 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, linux-rdma, Christian König, sleybo,
	Oded Gabbay, Gal Pressman, dri-devel, Linux Kernel Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Jason Gunthorpe,
	Doug Ledford, Tomer Tayar, amd-gfx list, Greg KH, Alex Deucher,
	Leon Romanovsky, Christoph Hellwig,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 24, 2021 at 11:52:47AM +0200, Christian König wrote:
> I've already converted a bunch of the GPU drivers, but there are at least 6 
> GPU still needing to be fixed and on top of that comes VA-API and a few 
> others.
>
> What are your plans for the DMA mapping subsystem?

Building a new API that allows batched DMA mapping without the scatterlist.
The main input for my use case would be bio_vecs, but I plan to make it
a little flexible, and the output would be a list of [dma_addr,len]
tuples, with the API being flexible enough to just return a single
[dma_addr,len] for the common IOMMU coalescing case.

>
>> Btw, one thing I noticed when looking over the dma-buf instances is that
>> there is a lot of duplicated code for creating a sg_table from pages,
>> and then mapping it.  It would be good if we could move toward common
>> helpers instead of duplicating that all over again.
>
> Can you give an example?

Take a look at the get_sg_table and put_sg_table helpers in udmabuf.
Those would also be useful in armda, i915, tegra, gntdev-dmabuf, mbochs
in one form or another.

Similar for variants that use a contigous regions.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 143+ messages in thread

end of thread, other threads:[~2021-06-25  7:28 UTC | newest]

Thread overview: 143+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 12:36 [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF Oded Gabbay
2021-06-18 12:36 ` Oded Gabbay
2021-06-18 12:36 ` [PATCH v3 2/2] habanalabs: add support for dma-buf exporter Oded Gabbay
2021-06-18 12:36   ` Oded Gabbay
2021-06-21 12:28 ` [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF Daniel Vetter
2021-06-21 12:28   ` Daniel Vetter
2021-06-21 12:28   ` Daniel Vetter
2021-06-21 13:02   ` Greg KH
2021-06-21 13:02     ` Greg KH
2021-06-21 13:02     ` Greg KH
2021-06-21 14:12     ` Jason Gunthorpe
2021-06-21 14:12       ` Jason Gunthorpe
2021-06-21 14:12       ` Jason Gunthorpe
2021-06-21 16:26       ` Oded Gabbay
2021-06-21 16:26         ` Oded Gabbay
2021-06-21 16:26         ` Oded Gabbay
2021-06-21 17:55         ` Jason Gunthorpe
2021-06-21 17:55           ` Jason Gunthorpe
2021-06-21 17:55           ` Jason Gunthorpe
2021-06-21 18:27           ` Daniel Vetter
2021-06-21 18:27             ` Daniel Vetter
2021-06-21 18:27             ` Daniel Vetter
2021-06-21 19:24             ` Oded Gabbay
2021-06-21 19:24               ` Oded Gabbay
2021-06-21 19:24               ` Oded Gabbay
2021-06-21 23:29               ` Jason Gunthorpe
2021-06-21 23:29                 ` Jason Gunthorpe
2021-06-21 23:29                 ` Jason Gunthorpe
2021-06-22  6:37                 ` [Linaro-mm-sig] " Christian König
2021-06-22  6:37                   ` Christian König
2021-06-22  6:37                   ` Christian König
2021-06-22  8:42                   ` Oded Gabbay
2021-06-22  8:42                     ` Oded Gabbay
2021-06-22  8:42                     ` Oded Gabbay
2021-06-22 12:01                     ` Jason Gunthorpe
2021-06-22 12:01                       ` Jason Gunthorpe
2021-06-22 12:01                       ` Jason Gunthorpe
2021-06-22 12:04                       ` Oded Gabbay
2021-06-22 12:04                         ` Oded Gabbay
2021-06-22 12:04                         ` Oded Gabbay
2021-06-22 12:15                         ` Jason Gunthorpe
2021-06-22 12:15                           ` Jason Gunthorpe
2021-06-22 12:15                           ` Jason Gunthorpe
2021-06-22 13:12                           ` Oded Gabbay
2021-06-22 13:12                             ` Oded Gabbay
2021-06-22 13:12                             ` Oded Gabbay
2021-06-22 15:11                             ` Jason Gunthorpe
2021-06-22 15:11                               ` Jason Gunthorpe
2021-06-22 15:11                               ` Jason Gunthorpe
2021-06-22 15:24                               ` Christian König
2021-06-22 15:24                                 ` Christian König
2021-06-22 15:24                                 ` Christian König
2021-06-22 15:28                                 ` Jason Gunthorpe
2021-06-22 15:28                                   ` Jason Gunthorpe
2021-06-22 15:28                                   ` Jason Gunthorpe
2021-06-22 15:31                                   ` Oded Gabbay
2021-06-22 15:31                                     ` Oded Gabbay
2021-06-22 15:31                                     ` Oded Gabbay
2021-06-22 15:31                                   ` Christian König
2021-06-22 15:31                                     ` Christian König
2021-06-22 15:31                                     ` Christian König
2021-06-22 15:40                                     ` Oded Gabbay
2021-06-22 15:40                                       ` Oded Gabbay
2021-06-22 15:40                                       ` Oded Gabbay
2021-06-22 15:49                                       ` Christian König
2021-06-22 15:49                                         ` Christian König
2021-06-22 15:49                                         ` Christian König
2021-06-22 15:24                               ` Oded Gabbay
2021-06-22 15:24                                 ` Oded Gabbay
2021-06-22 15:24                                 ` Oded Gabbay
2021-06-22 15:34                                 ` Jason Gunthorpe
2021-06-22 15:34                                   ` Jason Gunthorpe
2021-06-22 15:34                                   ` Jason Gunthorpe
2021-06-22 12:23                       ` Christian König
2021-06-22 12:23                         ` Christian König
2021-06-22 12:23                         ` Christian König
2021-06-22 15:23                         ` Jason Gunthorpe
2021-06-22 15:23                           ` Jason Gunthorpe
2021-06-22 15:23                           ` Jason Gunthorpe
2021-06-22 15:29                           ` Christian König
2021-06-22 15:29                             ` Christian König
2021-06-22 15:29                             ` Christian König
2021-06-22 15:40                             ` Jason Gunthorpe
2021-06-22 15:40                               ` Jason Gunthorpe
2021-06-22 15:40                               ` Jason Gunthorpe
2021-06-22 15:48                               ` Christian König
2021-06-22 15:48                                 ` Christian König
2021-06-22 15:48                                 ` Christian König
2021-06-22 16:05                                 ` Jason Gunthorpe
2021-06-22 16:05                                   ` Jason Gunthorpe
2021-06-22 16:05                                   ` Jason Gunthorpe
2021-06-23  8:57                                   ` Christian König
2021-06-23  8:57                                     ` Christian König
2021-06-23  8:57                                     ` Christian König
2021-06-23  9:14                                     ` Oded Gabbay
2021-06-23  9:14                                       ` Oded Gabbay
2021-06-23  9:14                                       ` Oded Gabbay
2021-06-23 18:24                                     ` Jason Gunthorpe
2021-06-23 18:24                                       ` Jason Gunthorpe
2021-06-23 18:24                                       ` Jason Gunthorpe
2021-06-23 18:43                                       ` Oded Gabbay
2021-06-23 18:43                                         ` Oded Gabbay
2021-06-23 18:43                                         ` Oded Gabbay
2021-06-23 18:50                                         ` Jason Gunthorpe
2021-06-23 18:50                                           ` Jason Gunthorpe
2021-06-23 18:50                                           ` Jason Gunthorpe
2021-06-23 19:00                                           ` Oded Gabbay
2021-06-23 19:00                                             ` Oded Gabbay
2021-06-23 19:00                                             ` Oded Gabbay
2021-06-23 19:34                                             ` Jason Gunthorpe
2021-06-23 19:34                                               ` Jason Gunthorpe
2021-06-23 19:34                                               ` Jason Gunthorpe
2021-06-23 19:39                                               ` Oded Gabbay
2021-06-23 19:39                                                 ` Oded Gabbay
2021-06-23 19:39                                                 ` Oded Gabbay
2021-06-24  0:45                                                 ` Jason Gunthorpe
2021-06-24  0:45                                                   ` Jason Gunthorpe
2021-06-24  0:45                                                   ` Jason Gunthorpe
2021-06-24  5:40                                                 ` Christoph Hellwig
2021-06-24  5:40                                                   ` Christoph Hellwig
2021-06-24  5:34                                             ` Christoph Hellwig
2021-06-24  5:34                                               ` Christoph Hellwig
2021-06-24  8:07                                               ` Christian König
2021-06-24  8:07                                                 ` Christian König
2021-06-24  8:07                                                 ` Christian König
2021-06-24  8:12                                                 ` Christoph Hellwig
2021-06-24  8:12                                                   ` Christoph Hellwig
2021-06-24  9:52                                                   ` Christian König
2021-06-24  9:52                                                     ` Christian König
2021-06-24  9:52                                                     ` Christian König
2021-06-24 13:22                                                     ` Christoph Hellwig
2021-06-24 13:22                                                       ` Christoph Hellwig
2021-06-22 16:50                             ` Felix Kuehling
2021-06-22 16:50                               ` Felix Kuehling
2021-06-22 16:50                               ` Felix Kuehling
2021-06-21 14:20     ` Daniel Vetter
2021-06-21 14:20       ` Daniel Vetter
2021-06-21 14:20       ` Daniel Vetter
2021-06-21 14:49       ` Jason Gunthorpe
2021-06-21 14:49         ` Jason Gunthorpe
2021-06-21 14:17   ` Jason Gunthorpe
2021-06-21 14:17     ` Jason Gunthorpe
2021-06-21 14:17     ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.