linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] habanalabs: support vmalloc memory mapping
@ 2019-10-10 14:06 Omer Shpigelman
  2019-10-10 14:06 ` [PATCH 2/2] habanalabs: handle large memory on MMU Omer Shpigelman
  2019-10-10 14:09 ` [PATCH 1/2] habanalabs: support vmalloc memory mapping Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Omer Shpigelman @ 2019-10-10 14:06 UTC (permalink / raw)
  To: oded.gabbay; +Cc: linux-kernel

This patch adds support in mapping of vmalloc memory.
In contrary to user memory, vmalloc memory is already pinned and has no
vm_area structure. Therefore a new capability was needed in order to map
this memory.
Mapping vmalloc memory is needed for Gaudi ASIC.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
---
 drivers/misc/habanalabs/goya/goya.c  |   2 +-
 drivers/misc/habanalabs/habanalabs.h |  13 +-
 drivers/misc/habanalabs/memory.c     | 405 ++++++++++++++++-----------
 3 files changed, 261 insertions(+), 159 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index d49f5ecd903b..949d3c54b351 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -3936,7 +3936,7 @@ static int goya_parse_cb_no_ext_queue(struct hl_device *hdev,
 		return 0;
 
 	dev_err(hdev->dev,
-		"Internal CB address %px + 0x%x is not in SRAM nor in DRAM\n",
+		"Internal CB address 0x%px + 0x%x is not in SRAM nor in DRAM\n",
 		parser->user_cb, parser->user_cb_size);
 
 	return -EFAULT;
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 91445371b08b..8577f29f54fe 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -1527,9 +1527,20 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx);
 int hl_vm_init(struct hl_device *hdev);
 void hl_vm_fini(struct hl_device *hdev);
 
+int hl_dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
+				struct hl_userptr **p_userptr);
+void hl_dma_unmap_host_va(struct hl_device *hdev, struct hl_userptr *userptr);
+int hl_init_phys_pg_pack_from_userptr(u32 asid, struct hl_userptr *userptr,
+				struct hl_vm_phys_pg_pack **pphys_pg_pack);
+void hl_free_phys_pg_pack(struct hl_device *hdev,
+				struct hl_vm_phys_pg_pack *phys_pg_pack);
+int hl_map_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+				struct hl_vm_phys_pg_pack *phys_pg_pack);
+void hl_unmap_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+				struct hl_vm_phys_pg_pack *phys_pg_pack);
 int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
 			struct hl_userptr *userptr);
-int hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr);
+void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr);
 void hl_userptr_delete_list(struct hl_device *hdev,
 				struct list_head *userptr_list);
 bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr, u32 size,
diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c
index 365fb0cb8dff..cec4155533af 100644
--- a/drivers/misc/habanalabs/memory.c
+++ b/drivers/misc/habanalabs/memory.c
@@ -159,20 +159,20 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
 }
 
 /*
- * get_userptr_from_host_va - initialize userptr structure from given host
- *                            virtual address
- *
- * @hdev                : habanalabs device structure
- * @args                : parameters containing the virtual address and size
- * @p_userptr           : pointer to result userptr structure
+ * hl_dma_map_host_va() - DMA mapping of the given host virtual address.
+ * @hdev: habanalabs device structure.
+ * @addr: the host virtual address of the memory area.
+ * @size: the size of the memory area.
+ * @p_userptr: pointer to result userptr structure.
  *
  * This function does the following:
- * - Allocate userptr structure
- * - Pin the given host memory using the userptr structure
- * - Perform DMA mapping to have the DMA addresses of the pages
+ * - Allocate userptr structure.
+ * - Pin the given host memory using the userptr structure - for user memory
+ *   only.
+ * - Perform DMA mapping to have the DMA addresses of the pages.
  */
-static int get_userptr_from_host_va(struct hl_device *hdev,
-		struct hl_mem_in *args, struct hl_userptr **p_userptr)
+int hl_dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
+					struct hl_userptr **p_userptr)
 {
 	struct hl_userptr *userptr;
 	int rc;
@@ -183,8 +183,7 @@ static int get_userptr_from_host_va(struct hl_device *hdev,
 		goto userptr_err;
 	}
 
-	rc = hl_pin_host_memory(hdev, args->map_host.host_virt_addr,
-			args->map_host.mem_size, userptr);
+	rc = hl_pin_host_memory(hdev, addr, size, userptr);
 	if (rc) {
 		dev_err(hdev->dev, "Failed to pin host memory\n");
 		goto pin_err;
@@ -215,16 +214,15 @@ static int get_userptr_from_host_va(struct hl_device *hdev,
 }
 
 /*
- * free_userptr - free userptr structure
- *
- * @hdev                : habanalabs device structure
- * @userptr             : userptr to free
+ * hl_dma_unmap_host_va() - DMA unmapping of the given host virtual address.
+ * @hdev: habanalabs device structure.
+ * @userptr: userptr to free.
  *
  * This function does the following:
- * - Unpins the physical pages
- * - Frees the userptr structure
+ * - Unpins the physical pages.
+ * - Frees the userptr structure.
  */
-static void free_userptr(struct hl_device *hdev, struct hl_userptr *userptr)
+void hl_dma_unmap_host_va(struct hl_device *hdev, struct hl_userptr *userptr)
 {
 	hl_unpin_host_memory(hdev, userptr);
 	kfree(userptr);
@@ -253,18 +251,17 @@ static void dram_pg_pool_do_release(struct kref *ref)
 }
 
 /*
- * free_phys_pg_pack   - free physical page pack
- *
- * @hdev               : habanalabs device structure
- * @phys_pg_pack       : physical page pack to free
+ * hl_free_phys_pg_pack() - free physical page pack.
+ * @hdev: habanalabs device structure.
+ * @phys_pg_pack: physical page pack to free.
  *
  * This function does the following:
- * - For DRAM memory only, iterate over the pack and free each physical block
- *   structure by returning it to the general pool
- * - Free the hl_vm_phys_pg_pack structure
+ * - For DRAM memory only, iterate over the pack and free each physical block.
+ *   structure by returning it to the general pool.
+ * - Free the hl_vm_phys_pg_pack structure.
  */
-static void free_phys_pg_pack(struct hl_device *hdev,
-		struct hl_vm_phys_pg_pack *phys_pg_pack)
+void hl_free_phys_pg_pack(struct hl_device *hdev,
+				struct hl_vm_phys_pg_pack *phys_pg_pack)
 {
 	struct hl_vm *vm = &hdev->vm;
 	u64 i;
@@ -328,7 +325,7 @@ static int free_device_memory(struct hl_ctx *ctx, u32 handle)
 		atomic64_sub(phys_pg_pack->total_size, &ctx->dram_phys_mem);
 		atomic64_sub(phys_pg_pack->total_size, &hdev->dram_used_mem);
 
-		free_phys_pg_pack(hdev, phys_pg_pack);
+		hl_free_phys_pg_pack(hdev, phys_pg_pack);
 	} else {
 		spin_unlock(&vm->idr_lock);
 		dev_err(hdev->dev,
@@ -630,21 +627,19 @@ static u32 get_sg_info(struct scatterlist *sg, dma_addr_t *dma_addr)
 }
 
 /*
- * init_phys_pg_pack_from_userptr - initialize physical page pack from host
- *                                   memory
- *
- * @ctx                : current context
- * @userptr            : userptr to initialize from
- * @pphys_pg_pack      : res pointer
+ * hl_init_phys_pg_pack_from_userptr() - initialize physical page pack from host
+ *                                       memory
+ * @asid: current context ASID.
+ * @userptr: userptr to initialize from.
+ * @pphys_pg_pack: result pointer.
  *
  * This function does the following:
- * - Pin the physical pages related to the given virtual block
+ * - Pin the physical pages related to the given virtual block.
  * - Create a physical page pack from the physical pages related to the given
- *   virtual block
+ *   virtual block.
  */
-static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
-		struct hl_userptr *userptr,
-		struct hl_vm_phys_pg_pack **pphys_pg_pack)
+int hl_init_phys_pg_pack_from_userptr(u32 asid, struct hl_userptr *userptr,
+				struct hl_vm_phys_pg_pack **pphys_pg_pack)
 {
 	struct hl_vm_phys_pg_pack *phys_pg_pack;
 	struct scatterlist *sg;
@@ -660,7 +655,7 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
 
 	phys_pg_pack->vm_type = userptr->vm_type;
 	phys_pg_pack->created_from_userptr = true;
-	phys_pg_pack->asid = ctx->asid;
+	phys_pg_pack->asid = asid;
 	atomic_set(&phys_pg_pack->mapping_cnt, 1);
 
 	/* Only if all dma_addrs are aligned to 2MB and their
@@ -731,19 +726,18 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
 }
 
 /*
- * map_phys_page_pack - maps the physical page pack
- *
- * @ctx                : current context
- * @vaddr              : start address of the virtual area to map from
- * @phys_pg_pack       : the pack of physical pages to map to
+ * hl_map_phys_pg_pack() - maps the physical page pack.
+ * @ctx: current context.
+ * @vaddr: start address of the virtual area to map from.
+ * @phys_pg_pack: the pack of physical pages to map to.
  *
  * This function does the following:
- * - Maps each chunk of virtual memory to matching physical chunk
- * - Stores number of successful mappings in the given argument
+ * - Maps each chunk of virtual memory to matching physical chunk.
+ * - Stores number of successful mappings in the given argument.
  * - Returns 0 on success, error code otherwise.
  */
-static int map_phys_page_pack(struct hl_ctx *ctx, u64 vaddr,
-		struct hl_vm_phys_pg_pack *phys_pg_pack)
+int hl_map_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+				struct hl_vm_phys_pg_pack *phys_pg_pack)
 {
 	struct hl_device *hdev = ctx->hdev;
 	u64 next_vaddr = vaddr, paddr, mapped_pg_cnt = 0, i;
@@ -783,6 +777,36 @@ static int map_phys_page_pack(struct hl_ctx *ctx, u64 vaddr,
 	return rc;
 }
 
+/*
+ * hl_unmap_phys_pg_pack() - unmaps the physical page pack.
+ * @ctx: current context.
+ * @vaddr: start address of the virtual area to unmap.
+ * @phys_pg_pack: the pack of physical pages to unmap.
+ */
+void hl_unmap_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+				struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+	struct hl_device *hdev = ctx->hdev;
+	u64 next_vaddr, i;
+	u32 page_size;
+
+	page_size = phys_pg_pack->page_size;
+	next_vaddr = vaddr;
+
+	for (i = 0 ; i < phys_pg_pack->npages ; i++, next_vaddr += page_size) {
+		if (hl_mmu_unmap(ctx, next_vaddr, page_size))
+			dev_warn_ratelimited(hdev->dev,
+			"unmap failed for vaddr: 0x%llx\n", next_vaddr);
+
+		/*
+		 * unmapping on Palladium can be really long, so avoid a CPU
+		 * soft lockup bug by sleeping a little between unmapping pages
+		 */
+		if (hdev->pldm)
+			usleep_range(500, 1000);
+	}
+}
+
 static int get_paddr_from_handle(struct hl_ctx *ctx, struct hl_mem_in *args,
 				u64 *paddr)
 {
@@ -827,7 +851,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 	struct hl_device *hdev = ctx->hdev;
 	struct hl_vm *vm = &hdev->vm;
 	struct hl_vm_phys_pg_pack *phys_pg_pack;
-	struct hl_userptr *userptr = NULL;
+	struct hl_userptr *userptr;
 	struct hl_vm_hash_node *hnode;
 	enum vm_type_t *vm_type;
 	u64 ret_vaddr, hint_addr;
@@ -839,18 +863,21 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 	*device_addr = 0;
 
 	if (is_userptr) {
-		rc = get_userptr_from_host_va(hdev, args, &userptr);
+		u64 addr = args->map_host.host_virt_addr,
+			size = args->map_host.mem_size;
+
+		rc = hl_dma_map_host_va(hdev, addr, size, &userptr);
 		if (rc) {
 			dev_err(hdev->dev, "failed to get userptr from va\n");
 			return rc;
 		}
 
-		rc = init_phys_pg_pack_from_userptr(ctx, userptr,
+		rc = hl_init_phys_pg_pack_from_userptr(ctx->asid, userptr,
 				&phys_pg_pack);
 		if (rc) {
 			dev_err(hdev->dev,
 				"unable to init page pack for vaddr 0x%llx\n",
-				args->map_host.host_virt_addr);
+				addr);
 			goto init_page_pack_err;
 		}
 
@@ -909,7 +936,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 
 	mutex_lock(&ctx->mmu_lock);
 
-	rc = map_phys_page_pack(ctx, ret_vaddr, phys_pg_pack);
+	rc = hl_map_phys_pg_pack(ctx, ret_vaddr, phys_pg_pack);
 	if (rc) {
 		mutex_unlock(&ctx->mmu_lock);
 		dev_err(hdev->dev, "mapping page pack failed for handle %u\n",
@@ -933,7 +960,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 	*device_addr = ret_vaddr;
 
 	if (is_userptr)
-		free_phys_pg_pack(hdev, phys_pg_pack);
+		hl_free_phys_pg_pack(hdev, phys_pg_pack);
 
 	return 0;
 
@@ -952,10 +979,10 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 shared_err:
 	atomic_dec(&phys_pg_pack->mapping_cnt);
 	if (is_userptr)
-		free_phys_pg_pack(hdev, phys_pg_pack);
+		hl_free_phys_pg_pack(hdev, phys_pg_pack);
 init_page_pack_err:
 	if (is_userptr)
-		free_userptr(hdev, userptr);
+		hl_dma_unmap_host_va(hdev, userptr);
 
 	return rc;
 }
@@ -977,8 +1004,6 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr)
 	struct hl_vm_hash_node *hnode = NULL;
 	struct hl_userptr *userptr = NULL;
 	enum vm_type_t *vm_type;
-	u64 next_vaddr, i;
-	u32 page_size;
 	bool is_userptr;
 	int rc;
 
@@ -1004,8 +1029,8 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr)
 	if (*vm_type == VM_TYPE_USERPTR) {
 		is_userptr = true;
 		userptr = hnode->ptr;
-		rc = init_phys_pg_pack_from_userptr(ctx, userptr,
-				&phys_pg_pack);
+		rc = hl_init_phys_pg_pack_from_userptr(ctx->asid, userptr,
+							&phys_pg_pack);
 		if (rc) {
 			dev_err(hdev->dev,
 				"unable to init page pack for vaddr 0x%llx\n",
@@ -1029,24 +1054,11 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr)
 		goto mapping_cnt_err;
 	}
 
-	page_size = phys_pg_pack->page_size;
-	vaddr &= ~(((u64) page_size) - 1);
-
-	next_vaddr = vaddr;
+	vaddr &= ~(((u64) phys_pg_pack->page_size) - 1);
 
 	mutex_lock(&ctx->mmu_lock);
 
-	for (i = 0 ; i < phys_pg_pack->npages ; i++, next_vaddr += page_size) {
-		if (hl_mmu_unmap(ctx, next_vaddr, page_size))
-			dev_warn_ratelimited(hdev->dev,
-			"unmap failed for vaddr: 0x%llx\n", next_vaddr);
-
-		/* unmapping on Palladium can be really long, so avoid a CPU
-		 * soft lockup bug by sleeping a little between unmapping pages
-		 */
-		if (hdev->pldm)
-			usleep_range(500, 1000);
-	}
+	hl_unmap_phys_pg_pack(ctx, vaddr, phys_pg_pack);
 
 	hdev->asic_funcs->mmu_invalidate_cache(hdev, true);
 
@@ -1063,15 +1075,15 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr)
 	kfree(hnode);
 
 	if (is_userptr) {
-		free_phys_pg_pack(hdev, phys_pg_pack);
-		free_userptr(hdev, userptr);
+		hl_free_phys_pg_pack(hdev, phys_pg_pack);
+		hl_dma_unmap_host_va(hdev, userptr);
 	}
 
 	return 0;
 
 mapping_cnt_err:
 	if (is_userptr)
-		free_phys_pg_pack(hdev, phys_pg_pack);
+		hl_free_phys_pg_pack(hdev, phys_pg_pack);
 vm_type_err:
 	mutex_lock(&ctx->mem_hash_lock);
 	hash_add(ctx->mem_hash, &hnode->node, vaddr);
@@ -1203,56 +1215,60 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 	return rc;
 }
 
-/*
- * hl_pin_host_memory - pins a chunk of host memory
- *
- * @hdev                : pointer to the habanalabs device structure
- * @addr                : the user-space virtual address of the memory area
- * @size                : the size of the memory area
- * @userptr	        : pointer to hl_userptr structure
- *
- * This function does the following:
- * - Pins the physical pages
- * - Create a SG list from those pages
- */
-int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
-			struct hl_userptr *userptr)
+static int init_sg_list_for_vmalloc_memory(struct hl_device *hdev, u64 addr,
+						u64 size, u32 npages,
+						u32 offset,
+						struct sg_table *sgt)
 {
-	u64 start, end;
-	u32 npages, offset;
-	int rc;
+	struct page *page, **pages;
+	u64 tmp_addr;
+	int i, rc;
 
-	if (!size) {
-		dev_err(hdev->dev, "size to pin is invalid - %llu\n", size);
-		return -EINVAL;
-	}
+	pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
 
-	if (!access_ok((void __user *) (uintptr_t) addr, size)) {
-		dev_err(hdev->dev, "user pointer is invalid - 0x%llx\n", addr);
-		return -EFAULT;
+	tmp_addr = addr;
+
+	for (i = 0 ; i < npages ; i++) {
+		page = vmalloc_to_page((void *) (uintptr_t) tmp_addr);
+		if (!page) {
+			dev_err(hdev->dev,
+				"failed to get physical page for va 0x%llx\n",
+				tmp_addr);
+			rc = -EFAULT;
+			goto free_pages;
+		}
+		pages[i] = page;
+		tmp_addr += PAGE_SIZE;
 	}
 
-	/*
-	 * If the combination of the address and size requested for this memory
-	 * region causes an integer overflow, return error.
-	 */
-	if (((addr + size) < addr) ||
-			PAGE_ALIGN(addr + size) < (addr + size)) {
-		dev_err(hdev->dev,
-			"user pointer 0x%llx + %llu causes integer overflow\n",
-			addr, size);
-		return -EINVAL;
+	rc = sg_alloc_table_from_pages(sgt, pages, npages, offset, size,
+					GFP_KERNEL);
+	if (rc < 0) {
+		dev_err(hdev->dev, "failed to create SG table from pages\n");
+		goto free_pages;
 	}
 
-	start = addr & PAGE_MASK;
-	offset = addr & ~PAGE_MASK;
-	end = PAGE_ALIGN(addr + size);
-	npages = (end - start) >> PAGE_SHIFT;
+	kfree(pages);
 
-	userptr->size = size;
-	userptr->addr = addr;
-	userptr->dma_mapped = false;
-	INIT_LIST_HEAD(&userptr->job_node);
+	return 0;
+
+free_pages:
+	kfree(pages);
+	return rc;
+}
+
+static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
+				u32 npages, u64 start, u32 offset,
+				struct hl_userptr *userptr)
+{
+	int rc;
+
+	if (!access_ok((void __user *) (uintptr_t) addr, size)) {
+		dev_err(hdev->dev, "user pointer is invalid - 0x%llx\n", addr);
+		return -EFAULT;
+	}
 
 	userptr->vec = frame_vector_create(npages);
 	if (!userptr->vec) {
@@ -1279,26 +1295,16 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
 		goto put_framevec;
 	}
 
-	userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_ATOMIC);
-	if (!userptr->sgt) {
-		rc = -ENOMEM;
-		goto put_framevec;
-	}
-
 	rc = sg_alloc_table_from_pages(userptr->sgt,
 					frame_vector_pages(userptr->vec),
 					npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto free_sgt;
+		goto put_framevec;
 	}
 
-	hl_debugfs_add_userptr(hdev, userptr);
-
 	return 0;
 
-free_sgt:
-	kfree(userptr->sgt);
 put_framevec:
 	put_vaddr_frames(userptr->vec);
 destroy_framevec:
@@ -1307,43 +1313,128 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
 }
 
 /*
- * hl_unpin_host_memory - unpins a chunk of host memory
+ * hl_pin_host_memory() - pins a chunk of host memory.
+ * @hdev: pointer to the habanalabs device structure.
+ * @addr: the host virtual address of the memory area.
+ * @size: the size of the memory area.
+ * @userptr: pointer to hl_userptr structure.
  *
- * @hdev                : pointer to the habanalabs device structure
- * @userptr             : pointer to hl_userptr structure
+ * This function does the following:
+ * - Pins the physical pages - for user memory only, as vmalloc pages are
+ *   already pinned.
+ * - Create an SG list from those pages.
+ */
+int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
+					struct hl_userptr *userptr)
+{
+	u64 start, end;
+	u32 npages, offset;
+	int rc;
+
+	if (!size) {
+		dev_err(hdev->dev, "size to pin is invalid - %llu\n", size);
+		return -EINVAL;
+	}
+
+	/*
+	 * If the combination of the address and size requested for this memory
+	 * region causes an integer overflow, return error.
+	 */
+	if (((addr + size) < addr) ||
+			PAGE_ALIGN(addr + size) < (addr + size)) {
+		dev_err(hdev->dev,
+			"user pointer 0x%llx + %llu causes integer overflow\n",
+			addr, size);
+		return -EINVAL;
+	}
+
+	/*
+	 * This function can be called also from data path, hence use atomic
+	 * always as it is not a big allocation.
+	 */
+	userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_ATOMIC);
+	if (!userptr->sgt)
+		return -ENOMEM;
+
+	start = addr & PAGE_MASK;
+	offset = addr & ~PAGE_MASK;
+	end = PAGE_ALIGN(addr + size);
+	npages = (end - start) >> PAGE_SHIFT;
+
+	userptr->size = size;
+	userptr->addr = addr;
+	userptr->dma_mapped = false;
+	INIT_LIST_HEAD(&userptr->job_node);
+
+	if (is_vmalloc_addr((void *) (uintptr_t) addr)) {
+		/* no need in pinning, only init an SG list */
+		rc = init_sg_list_for_vmalloc_memory(hdev, addr, size, npages,
+							offset, userptr->sgt);
+		if (rc) {
+			dev_err(hdev->dev,
+				"failed to init an SG list for vmalloc address 0x%llx\n",
+				addr);
+			goto free_sgt;
+		}
+	} else {
+		rc = get_user_memory(hdev, addr, size, npages, start, offset,
+					userptr);
+		if (rc) {
+			dev_err(hdev->dev,
+				"failed to get user memory for address 0x%llx\n",
+				addr);
+			goto free_sgt;
+		}
+	}
+
+	hl_debugfs_add_userptr(hdev, userptr);
+
+	return 0;
+
+free_sgt:
+	kfree(userptr->sgt);
+	return rc;
+}
+
+/*
+ * hl_unpin_host_memory() - unpins a chunk of host memory.
+ * @hdev: pointer to the habanalabs device structure.
+ * @userptr: pointer to hl_userptr structure.
  *
  * This function does the following:
- * - Unpins the physical pages related to the host memory
- * - Free the SG list
+ * - Unpins the physical pages related to the host memory - for user memory
+ *   only.
+ * - Free the SG list.
  */
-int hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
+void hl_unpin_host_memory(struct hl_device *hdev,
+					struct hl_userptr *userptr)
 {
 	struct page **pages;
 
 	hl_debugfs_remove_userptr(hdev, userptr);
 
 	if (userptr->dma_mapped)
-		hdev->asic_funcs->hl_dma_unmap_sg(hdev,
-				userptr->sgt->sgl,
-				userptr->sgt->nents,
-				userptr->dir);
-
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
+		hdev->asic_funcs->hl_dma_unmap_sg(hdev, userptr->sgt->sgl,
+							userptr->sgt->nents,
+							userptr->dir);
+
+	/* only user memory should be manually unpinned */
+	if (!is_vmalloc_addr((void *) (uintptr_t) userptr->addr)) {
+		pages = frame_vector_pages(userptr->vec);
+		if (!IS_ERR(pages)) {
+			int i;
+
+			for (i = 0; i < frame_vector_count(userptr->vec); i++)
+				set_page_dirty_lock(pages[i]);
+		}
+		put_vaddr_frames(userptr->vec);
+		frame_vector_destroy(userptr->vec);
 	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
 
 	list_del(&userptr->job_node);
 
 	sg_free_table(userptr->sgt);
 	kfree(userptr->sgt);
-
-	return 0;
 }
 
 /*
@@ -1627,11 +1718,11 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx)
 	idr_for_each_entry(&vm->phys_pg_pack_handles, phys_pg_list, i)
 		if (phys_pg_list->asid == ctx->asid) {
 			dev_dbg(hdev->dev,
-				"page list 0x%p of asid %d is still alive\n",
+				"page list 0x%px of asid %d is still alive\n",
 				phys_pg_list, ctx->asid);
 			atomic64_sub(phys_pg_list->total_size,
 					&hdev->dram_used_mem);
-			free_phys_pg_pack(hdev, phys_pg_list);
+			hl_free_phys_pg_pack(hdev, phys_pg_list);
 			idr_remove(&vm->phys_pg_pack_handles, i);
 		}
 	spin_unlock(&vm->idr_lock);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] habanalabs: handle large memory on MMU
  2019-10-10 14:06 [PATCH 1/2] habanalabs: support vmalloc memory mapping Omer Shpigelman
@ 2019-10-10 14:06 ` Omer Shpigelman
  2019-10-10 14:09 ` [PATCH 1/2] habanalabs: support vmalloc memory mapping Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Omer Shpigelman @ 2019-10-10 14:06 UTC (permalink / raw)
  To: oded.gabbay; +Cc: linux-kernel

This patch changes the allocation of the host memory pages array to use
vmalloc if needed. This in order to support mapping of large memory
chunks.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
---
 drivers/misc/habanalabs/memory.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c
index cec4155533af..0cce30922871 100644
--- a/drivers/misc/habanalabs/memory.c
+++ b/drivers/misc/habanalabs/memory.c
@@ -1224,7 +1224,7 @@ static int init_sg_list_for_vmalloc_memory(struct hl_device *hdev, u64 addr,
 	u64 tmp_addr;
 	int i, rc;
 
-	pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
+	pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
 	if (!pages)
 		return -ENOMEM;
 
@@ -1245,17 +1245,11 @@ static int init_sg_list_for_vmalloc_memory(struct hl_device *hdev, u64 addr,
 
 	rc = sg_alloc_table_from_pages(sgt, pages, npages, offset, size,
 					GFP_KERNEL);
-	if (rc < 0) {
+	if (rc < 0)
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto free_pages;
-	}
-
-	kfree(pages);
-
-	return 0;
 
 free_pages:
-	kfree(pages);
+	kvfree(pages);
 	return rc;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-10 14:06 [PATCH 1/2] habanalabs: support vmalloc memory mapping Omer Shpigelman
  2019-10-10 14:06 ` [PATCH 2/2] habanalabs: handle large memory on MMU Omer Shpigelman
@ 2019-10-10 14:09 ` Christoph Hellwig
  2019-10-10 19:54   ` Omer Shpigelman
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-10 14:09 UTC (permalink / raw)
  To: Omer Shpigelman; +Cc: oded.gabbay, linux-kernel

On Thu, Oct 10, 2019 at 02:06:22PM +0000, Omer Shpigelman wrote:
> This patch adds support in mapping of vmalloc memory.
> In contrary to user memory, vmalloc memory is already pinned and has no
> vm_area structure. Therefore a new capability was needed in order to map
> this memory.

Unless I am missing something you mix user and kernel pointers in
your is_vmalloc_addr checks.  That will break on those architectures
that have separate kernel and user address spaces.

> Mapping vmalloc memory is needed for Gaudi ASIC.

How does that ASIC pass in the vmalloc memory?  I don't fully understand
the code, but it seems like the addresses are fed from ioctl, which
means they only come from userspace.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-10 14:09 ` [PATCH 1/2] habanalabs: support vmalloc memory mapping Christoph Hellwig
@ 2019-10-10 19:54   ` Omer Shpigelman
  2019-10-11  8:10     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Omer Shpigelman @ 2019-10-10 19:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: oded.gabbay, linux-kernel

On Thu, Oct 10, 2019 at 5:10 PM Christoph Hellwig <hch@infradead.org> wrote: 
> On Thu, Oct 10, 2019 at 02:06:22PM +0000, Omer Shpigelman wrote:
> > This patch adds support in mapping of vmalloc memory.
> > In contrary to user memory, vmalloc memory is already pinned and has
> > no vm_area structure. Therefore a new capability was needed in order
> > to map this memory.
> 
> Unless I am missing something you mix user and kernel pointers in your
> is_vmalloc_addr checks.  That will break on those architectures that have
> separate kernel and user address spaces.

The is_vmalloc_addr checks are for user pointers and for memory which was allocated by the driver with vmalloc_user.
> 
> > Mapping vmalloc memory is needed for Gaudi ASIC.
> 
> How does that ASIC pass in the vmalloc memory?  I don't fully understand
> the code, but it seems like the addresses are fed from ioctl, which means
> they only come from userspace.

The user pointers are indeed fed from ioctl for DMA purpose, but as I wrote above the vmalloc memory is allocated by the driver with vmalloc_user which will be mmapped later on in order to create a shared buffer between the driver and the userspace process.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-10 19:54   ` Omer Shpigelman
@ 2019-10-11  8:10     ` Christoph Hellwig
  2019-10-11  9:19       ` Oded Gabbay
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-11  8:10 UTC (permalink / raw)
  To: Omer Shpigelman; +Cc: Christoph Hellwig, oded.gabbay, linux-kernel

On Thu, Oct 10, 2019 at 07:54:07PM +0000, Omer Shpigelman wrote:
> The is_vmalloc_addr checks are for user pointers and for memory which was allocated by the driver with vmalloc_user.

This does not make any sense whatsoever.  vmalloc_user returns a kernel
address, it just does a GFP_USER instead of GFP_KERNEL allocation, which
is just accounting differences.

> > > Mapping vmalloc memory is needed for Gaudi ASIC.
> > 
> > How does that ASIC pass in the vmalloc memory?  I don't fully understand
> > the code, but it seems like the addresses are fed from ioctl, which means
> > they only come from userspace.
> 
> The user pointers are indeed fed from ioctl for DMA purpose, but as I wrote above the vmalloc memory is allocated by the driver with vmalloc_user which will be mmapped later on in order to create a shared buffer between the driver and the userspace process.

Again, you can't pass pointers obtained from vmalloc* to userspace.  You
can map the underlying pages into user pagetables, but is_vmalloc_addr
won't know that.  I think you guys need to read up on virtual memory 101
first and then come back and actually explain what you are trying to do.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-11  8:10     ` Christoph Hellwig
@ 2019-10-11  9:19       ` Oded Gabbay
  2019-10-11  9:26         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Oded Gabbay @ 2019-10-11  9:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Omer Shpigelman, linux-kernel

On Fri, Oct 11, 2019 at 11:10 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Oct 10, 2019 at 07:54:07PM +0000, Omer Shpigelman wrote:
> > The is_vmalloc_addr checks are for user pointers and for memory which was allocated by the driver with vmalloc_user.
>
> This does not make any sense whatsoever.  vmalloc_user returns a kernel
> address, it just does a GFP_USER instead of GFP_KERNEL allocation, which
> is just accounting differences.
>
> > > > Mapping vmalloc memory is needed for Gaudi ASIC.
> > >
> > > How does that ASIC pass in the vmalloc memory?  I don't fully understand
> > > the code, but it seems like the addresses are fed from ioctl, which means
> > > they only come from userspace.
> >
> > The user pointers are indeed fed from ioctl for DMA purpose, but as I wrote above the vmalloc memory is allocated by the driver with vmalloc_user which will be mmapped later on in order to create a shared buffer between the driver and the userspace process.
>
> Again, you can't pass pointers obtained from vmalloc* to userspace.  You
> can map the underlying pages into user pagetables, but is_vmalloc_addr
> won't know that.  I think you guys need to read up on virtual memory 101
> first and then come back and actually explain what you are trying to do.

Hi Christoph,
I think the confusion here is because this is only a pre-requisite
patch, which doesn't show the full code of how we use vmalloc_user.
So I want to describe the full flow here and please tell me if and
what we are doing wrong.

We first allocate, using vmalloc_user, a certain memory block that
will be used by the ASIC and the user (ASIC is producer, user is
consumer).
After we use vmalloc_user, we map the *kernel* pointer we got from the
vmalloc_user() to the ASIC MMU. We reuse our driver's generic code
path to map host memory to ASIC MMU and that's why we need the patch
above. The user does NOT send us the pointer. He doesn't have this
pointer. It is internal to the kernel driver. To do this reuse, we
added a call to the is_vmalloc_addr(), so the function will know if it
is called to work on user pointers, or on vmalloc *kernel* pointers.

What the user does get is an opaque handle. Later he calls mmap() with
our FD and this handle. In the driver, we do remap_vmalloc_range() on
the *kernel* pointer we retrieved using the opaque handle we got from
the user, and return a valid user-space process address that points to
this memory block, so the user can access it.

AFAIK from my GPU days, this is totally valid. The user does NOT see
kernel pointers. There is total separation between kernel and user
pointers. User get pointers only by calling mmap(). We NEVER return a
pointer to a user in an IOCTL. Only through mmap.
Please tell me if you see a problem here. If you need more details,
I'll happy to provide them.

Side note:
The reason the driver allocates the memory block and not the user
itself (like we did so far in all our code), is because this block is
mapped to our ASIC internal MMU using a priviliged ASID. I don't want
to give the user a uapi that will enable him to request to map memory
to the ASIC using a priviliged ASID. Therefore, the kernel driver does
it internally.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-11  9:19       ` Oded Gabbay
@ 2019-10-11  9:26         ` Christoph Hellwig
  2019-10-11  9:27           ` Oded Gabbay
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-11  9:26 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Christoph Hellwig, Omer Shpigelman, linux-kernel

On Fri, Oct 11, 2019 at 12:19:36PM +0300, Oded Gabbay wrote:
> We first allocate, using vmalloc_user, a certain memory block that
> will be used by the ASIC and the user (ASIC is producer, user is
> consumer).
> After we use vmalloc_user, we map the *kernel* pointer we got from the
> vmalloc_user() to the ASIC MMU. We reuse our driver's generic code
> path to map host memory to ASIC MMU and that's why we need the patch
> above. The user does NOT send us the pointer. He doesn't have this
> pointer. It is internal to the kernel driver. To do this reuse, we
> added a call to the is_vmalloc_addr(), so the function will know if it
> is called to work on user pointers, or on vmalloc *kernel* pointers.

But the function can't decided that.  As I said before you can't just
take a value that possibly contains user pointers and call
is_vmalloc_addr on it, as kernel and user address can overlap on
various architectures.

You need to restructure your code to keep the kernel and user pointer
code paths entirely separate.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] habanalabs: support vmalloc memory mapping
  2019-10-11  9:26         ` Christoph Hellwig
@ 2019-10-11  9:27           ` Oded Gabbay
  0 siblings, 0 replies; 8+ messages in thread
From: Oded Gabbay @ 2019-10-11  9:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Omer Shpigelman, linux-kernel

On Fri, Oct 11, 2019 at 12:26 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Oct 11, 2019 at 12:19:36PM +0300, Oded Gabbay wrote:
> > We first allocate, using vmalloc_user, a certain memory block that
> > will be used by the ASIC and the user (ASIC is producer, user is
> > consumer).
> > After we use vmalloc_user, we map the *kernel* pointer we got from the
> > vmalloc_user() to the ASIC MMU. We reuse our driver's generic code
> > path to map host memory to ASIC MMU and that's why we need the patch
> > above. The user does NOT send us the pointer. He doesn't have this
> > pointer. It is internal to the kernel driver. To do this reuse, we
> > added a call to the is_vmalloc_addr(), so the function will know if it
> > is called to work on user pointers, or on vmalloc *kernel* pointers.
>
> But the function can't decided that.  As I said before you can't just
> take a value that possibly contains user pointers and call
> is_vmalloc_addr on it, as kernel and user address can overlap on
> various architectures.
>
> You need to restructure your code to keep the kernel and user pointer
> code paths entirely separate.

ah, ok. I didn't know that.
Now I understand your point.
We will do that, thanks for the review and help.

Oded

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-11  9:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 14:06 [PATCH 1/2] habanalabs: support vmalloc memory mapping Omer Shpigelman
2019-10-10 14:06 ` [PATCH 2/2] habanalabs: handle large memory on MMU Omer Shpigelman
2019-10-10 14:09 ` [PATCH 1/2] habanalabs: support vmalloc memory mapping Christoph Hellwig
2019-10-10 19:54   ` Omer Shpigelman
2019-10-11  8:10     ` Christoph Hellwig
2019-10-11  9:19       ` Oded Gabbay
2019-10-11  9:26         ` Christoph Hellwig
2019-10-11  9:27           ` Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).