linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Fixing DMA mask issues in habanalabs driver
@ 2019-06-11  5:50 Oded Gabbay
  2019-06-11  5:50 ` [PATCH 1/8] habanalabs: initialize device CPU queues after MMU init Oded Gabbay
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch-set changes the way the Goya internal CPU access memory on the
Host machine. This is needed to prevent the non-standard way the driver
used the PCI DMA set mask kernel API so far.

The DMA set mask should be called only once at the start of the driver.
This is because changing the DMA mask to a new value after allocations
were made using a previous mask value, might cause the previous allocations
to become unaccessible (usually if there is IOMMU present).

The driver did that because of a limitation in Goya's internal CPU. The
limitation was that the internal CPU can only access 40-bit addresses,
while the entire ASIC can access 50-bit addresses. Therefore, the driver
set the DMA mask to 39-bits, allocated memory for the internal CPU on the
host and then changed the DMA mask to 48-bits.

This patch-set eliminates the double DMA set by using Goya's MMU to
overcome the limitation. The driver now sets the DMA mask only once to
48-bits and allocates a single DMA region of 2MB for the internal CPU. It
then maps that region in Goya's MMU to a device virtual address under 40-bits.

In addition, this patch-set enables the use of 64-bit mask on POWER9
systems. POWER9 DMA mask can be set ONLY to 32-bit or 64-bit. To use
64-bit, the device must set bit 59 to 1 in all its outbound transactions. 
This is achieved by setting a special configuration in Goya's PCIe
controller. The configuration must be done only in POWER9 machines, as it
will make the device non-functional on other architectures 
(e.g. x86-64, ARM).

Thanks,
Oded

Oded Gabbay (8):
  habanalabs: initialize device CPU queues after MMU init
  habanalabs: de-couple MMU and VM module initialization
  habanalabs: initialize MMU context for driver
  habanalabs: add MMU mappings for Goya CPU
  habanalabs: set Goya CPU to use ASIC MMU
  habanalabs: remove DMA mask hack for Goya
  habanalabs: add WARN in case of bad MMU mapping
  habanalabs: enable 64-bit DMA mask in POWER9

 drivers/misc/habanalabs/asid.c           |   2 +-
 drivers/misc/habanalabs/context.c        |   7 +
 drivers/misc/habanalabs/debugfs.c        |   7 +-
 drivers/misc/habanalabs/device.c         |  45 +++--
 drivers/misc/habanalabs/goya/goya.c      | 234 +++++++++++++++++------
 drivers/misc/habanalabs/goya/goyaP.h     |  12 +-
 drivers/misc/habanalabs/habanalabs.h     |   9 +-
 drivers/misc/habanalabs/habanalabs_drv.c |   7 +
 drivers/misc/habanalabs/memory.c         |  13 +-
 drivers/misc/habanalabs/mmu.c            |  20 +-
 drivers/misc/habanalabs/pci.c            |   7 +-
 11 files changed, 259 insertions(+), 104 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] habanalabs: initialize device CPU queues after MMU init
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 2/8] habanalabs: de-couple MMU and VM module initialization Oded Gabbay
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch changes the order of H/W IP initializations. The MMU needs to
be initialized before the device CPU queues, because the CPU will go
through the ASIC MMU in order to reach the host memory (where the queues
are located).

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/asid.c      |  2 +-
 drivers/misc/habanalabs/device.c    | 22 +++++-----
 drivers/misc/habanalabs/goya/goya.c | 64 +++++++++++++----------------
 3 files changed, 40 insertions(+), 48 deletions(-)

diff --git a/drivers/misc/habanalabs/asid.c b/drivers/misc/habanalabs/asid.c
index f54e7971a762..2c01461701a3 100644
--- a/drivers/misc/habanalabs/asid.c
+++ b/drivers/misc/habanalabs/asid.c
@@ -18,7 +18,7 @@ int hl_asid_init(struct hl_device *hdev)
 
 	mutex_init(&hdev->asid_mutex);
 
-	/* ASID 0 is reserved for KMD */
+	/* ASID 0 is reserved for KMD and device CPU */
 	set_bit(0, hdev->asid_bitmap);
 
 	return 0;
diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index cca4af29daf7..4df8ef88ce2d 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -326,7 +326,15 @@ static int device_late_init(struct hl_device *hdev)
 {
 	int rc;
 
-	INIT_DELAYED_WORK(&hdev->work_freq, set_freq_to_low_job);
+	if (hdev->asic_funcs->late_init) {
+		rc = hdev->asic_funcs->late_init(hdev);
+		if (rc) {
+			dev_err(hdev->dev,
+				"failed late initialization for the H/W\n");
+			return rc;
+		}
+	}
+
 	hdev->high_pll = hdev->asic_prop.high_pll;
 
 	/* force setting to low frequency */
@@ -337,17 +345,9 @@ static int device_late_init(struct hl_device *hdev)
 	else
 		hdev->asic_funcs->set_pll_profile(hdev, PLL_LAST);
 
-	if (hdev->asic_funcs->late_init) {
-		rc = hdev->asic_funcs->late_init(hdev);
-		if (rc) {
-			dev_err(hdev->dev,
-				"failed late initialization for the H/W\n");
-			return rc;
-		}
-	}
-
+	INIT_DELAYED_WORK(&hdev->work_freq, set_freq_to_low_job);
 	schedule_delayed_work(&hdev->work_freq,
-			usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
+	usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
 
 	if (hdev->heartbeat) {
 		INIT_DELAYED_WORK(&hdev->work_heartbeat, hl_device_heartbeat);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 81c1d576783f..106074466dca 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -539,9 +539,32 @@ int goya_late_init(struct hl_device *hdev)
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	int rc;
 
+	goya_fetch_psoc_frequency(hdev);
+
+	rc = goya_mmu_clear_pgt_range(hdev);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to clear MMU page tables range %d\n", rc);
+		return rc;
+	}
+
+	rc = goya_mmu_set_dram_default_page(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to set DRAM default page %d\n", rc);
+		return rc;
+	}
+
+	rc = goya_init_cpu_queues(hdev);
+	if (rc)
+		return rc;
+
+	rc = goya_test_cpu_queue(hdev);
+	if (rc)
+		return rc;
+
 	rc = goya_armcp_info_get(hdev);
 	if (rc) {
-		dev_err(hdev->dev, "Failed to get armcp info\n");
+		dev_err(hdev->dev, "Failed to get armcp info %d\n", rc);
 		return rc;
 	}
 
@@ -553,33 +576,15 @@ int goya_late_init(struct hl_device *hdev)
 
 	rc = hl_fw_send_pci_access_msg(hdev, ARMCP_PACKET_ENABLE_PCI_ACCESS);
 	if (rc) {
-		dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
+		dev_err(hdev->dev,
+			"Failed to enable PCI access from CPU %d\n", rc);
 		return rc;
 	}
 
 	WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 			GOYA_ASYNC_EVENT_ID_INTS_REGISTER);
 
-	goya_fetch_psoc_frequency(hdev);
-
-	rc = goya_mmu_clear_pgt_range(hdev);
-	if (rc) {
-		dev_err(hdev->dev, "Failed to clear MMU page tables range\n");
-		goto disable_pci_access;
-	}
-
-	rc = goya_mmu_set_dram_default_page(hdev);
-	if (rc) {
-		dev_err(hdev->dev, "Failed to set DRAM default page\n");
-		goto disable_pci_access;
-	}
-
 	return 0;
-
-disable_pci_access:
-	hl_fw_send_pci_access_msg(hdev, ARMCP_PACKET_DISABLE_PCI_ACCESS);
-
-	return rc;
 }
 
 /*
@@ -1000,7 +1005,7 @@ int goya_init_cpu_queues(struct hl_device *hdev)
 
 	if (err) {
 		dev_err(hdev->dev,
-			"Failed to communicate with ARM CPU (ArmCP timeout)\n");
+			"Failed to setup communication with device CPU\n");
 		return -EIO;
 	}
 
@@ -2465,13 +2470,6 @@ static int goya_hw_init(struct hl_device *hdev)
 	if (rc)
 		goto disable_queues;
 
-	rc = goya_init_cpu_queues(hdev);
-	if (rc) {
-		dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n",
-			rc);
-		goto disable_msix;
-	}
-
 	/*
 	 * Check if we managed to set the DMA mask to more then 32 bits. If so,
 	 * let's try to increase it again because in Goya we set the initial
@@ -2481,7 +2479,7 @@ static int goya_hw_init(struct hl_device *hdev)
 	if (hdev->dma_mask > 32) {
 		rc = hl_pci_set_dma_mask(hdev, 48);
 		if (rc)
-			goto disable_pci_access;
+			goto disable_msix;
 	}
 
 	/* Perform read from the device to flush all MSI-X configuration */
@@ -2489,8 +2487,6 @@ static int goya_hw_init(struct hl_device *hdev)
 
 	return 0;
 
-disable_pci_access:
-	hl_fw_send_pci_access_msg(hdev, ARMCP_PACKET_DISABLE_PCI_ACCESS);
 disable_msix:
 	goya_disable_msix(hdev);
 disable_queues:
@@ -2972,10 +2968,6 @@ int goya_test_queues(struct hl_device *hdev)
 			ret_val = -EINVAL;
 	}
 
-	rc = goya_test_cpu_queue(hdev);
-	if (rc)
-		ret_val = -EINVAL;
-
 	return ret_val;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] habanalabs: de-couple MMU and VM module initialization
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
  2019-06-11  5:50 ` [PATCH 1/8] habanalabs: initialize device CPU queues after MMU init Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 3/8] habanalabs: initialize MMU context for driver Oded Gabbay
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch initializes the MMU S/W structures before the VM S/W
structures, instead of doing that as part of the VM S/W initialization.

This is done because we need to configure some MMU mappings for the kernel
context, before the VM is initialized. The VM initialization can't be
moved earlier because it depends on the size of the DRAM, which is
retrieved from the device CPU. Communication with the device CPU will
require the MMU mappings to be configured and hence the de-coupling.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/device.c | 23 ++++++++++++++++++++---
 drivers/misc/habanalabs/memory.c | 13 +------------
 drivers/misc/habanalabs/mmu.c    |  6 +-----
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 4df8ef88ce2d..0c4894dd9c02 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -745,6 +745,7 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 
 	if (hard_reset) {
 		hl_vm_fini(hdev);
+		hl_mmu_fini(hdev);
 		hl_eq_reset(hdev, &hdev->event_queue);
 	}
 
@@ -772,6 +773,13 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 			goto out_err;
 		}
 
+		rc = hl_mmu_init(hdev);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Failed to initialize MMU S/W after hard reset\n");
+			goto out_err;
+		}
+
 		/* Allocate the kernel context */
 		hdev->kernel_ctx = kzalloc(sizeof(*hdev->kernel_ctx),
 						GFP_KERNEL);
@@ -943,11 +951,18 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 		goto cq_fini;
 	}
 
+	/* MMU S/W must be initialized before kernel context is created */
+	rc = hl_mmu_init(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to initialize MMU S/W structures\n");
+		goto eq_fini;
+	}
+
 	/* Allocate the kernel context */
 	hdev->kernel_ctx = kzalloc(sizeof(*hdev->kernel_ctx), GFP_KERNEL);
 	if (!hdev->kernel_ctx) {
 		rc = -ENOMEM;
-		goto eq_fini;
+		goto mmu_fini;
 	}
 
 	hdev->user_ctx = NULL;
@@ -995,8 +1010,6 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 		goto out_disabled;
 	}
 
-	/* After test_queues, KMD can start sending messages to device CPU */
-
 	rc = device_late_init(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "Failed late initialization\n");
@@ -1042,6 +1055,8 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 			"kernel ctx is still alive on initialization failure\n");
 free_ctx:
 	kfree(hdev->kernel_ctx);
+mmu_fini:
+	hl_mmu_fini(hdev);
 eq_fini:
 	hl_eq_fini(hdev, &hdev->event_queue);
 cq_fini:
@@ -1146,6 +1161,8 @@ void hl_device_fini(struct hl_device *hdev)
 
 	hl_vm_fini(hdev);
 
+	hl_mmu_fini(hdev);
+
 	hl_eq_fini(hdev, &hdev->event_queue);
 
 	for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c
index 693877e37fd8..42d237cae1dc 100644
--- a/drivers/misc/habanalabs/memory.c
+++ b/drivers/misc/habanalabs/memory.c
@@ -1657,17 +1657,10 @@ int hl_vm_init(struct hl_device *hdev)
 	struct hl_vm *vm = &hdev->vm;
 	int rc;
 
-	rc = hl_mmu_init(hdev);
-	if (rc) {
-		dev_err(hdev->dev, "Failed to init MMU\n");
-		return rc;
-	}
-
 	vm->dram_pg_pool = gen_pool_create(__ffs(prop->dram_page_size), -1);
 	if (!vm->dram_pg_pool) {
 		dev_err(hdev->dev, "Failed to create dram page pool\n");
-		rc = -ENOMEM;
-		goto pool_create_err;
+		return -ENOMEM;
 	}
 
 	kref_init(&vm->dram_pg_pool_refcount);
@@ -1693,8 +1686,6 @@ int hl_vm_init(struct hl_device *hdev)
 
 pool_add_err:
 	gen_pool_destroy(vm->dram_pg_pool);
-pool_create_err:
-	hl_mmu_fini(hdev);
 
 	return rc;
 }
@@ -1724,7 +1715,5 @@ void hl_vm_fini(struct hl_device *hdev)
 		dev_warn(hdev->dev, "dram_pg_pool was not destroyed on %s\n",
 				__func__);
 
-	hl_mmu_fini(hdev);
-
 	vm->init_done = false;
 }
diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c
index 10aee3141444..87968f32e718 100644
--- a/drivers/misc/habanalabs/mmu.c
+++ b/drivers/misc/habanalabs/mmu.c
@@ -385,12 +385,8 @@ static void dram_default_mapping_fini(struct hl_ctx *ctx)
  * @hdev: habanalabs device structure.
  *
  * This function does the following:
- * - Allocate max_asid zeroed hop0 pgts so no mapping is available.
- * - Enable MMU in H/W.
- * - Invalidate the MMU cache.
  * - Create a pool of pages for pgt_infos.
- *
- * This function depends on DMA QMAN to be working!
+ * - Create a shadow table for pgt
  *
  * Return: 0 for success, non-zero for failure.
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] habanalabs: initialize MMU context for driver
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
  2019-06-11  5:50 ` [PATCH 1/8] habanalabs: initialize device CPU queues after MMU init Oded Gabbay
  2019-06-11  5:50 ` [PATCH 2/8] habanalabs: de-couple MMU and VM module initialization Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 4/8] habanalabs: add MMU mappings for Goya CPU Oded Gabbay
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch initializes the MMU structures for the kernel context. This is
needed before we can configure mappings for the kernel context.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/context.c |  7 +++++++
 drivers/misc/habanalabs/mmu.c     | 10 ++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/habanalabs/context.c b/drivers/misc/habanalabs/context.c
index 280f4625e313..8682590e3f6e 100644
--- a/drivers/misc/habanalabs/context.c
+++ b/drivers/misc/habanalabs/context.c
@@ -36,6 +36,8 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 
 		hl_vm_ctx_fini(ctx);
 		hl_asid_free(hdev, ctx->asid);
+	} else {
+		hl_mmu_ctx_fini(ctx);
 	}
 }
 
@@ -119,6 +121,11 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
 
 	if (is_kernel_ctx) {
 		ctx->asid = HL_KERNEL_ASID_ID; /* KMD gets ASID 0 */
+		rc = hl_mmu_ctx_init(ctx);
+		if (rc) {
+			dev_err(hdev->dev, "Failed to init mmu ctx module\n");
+			goto mem_ctx_err;
+		}
 	} else {
 		ctx->asid = hl_asid_alloc(hdev);
 		if (!ctx->asid) {
diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c
index 87968f32e718..a80162c5c373 100644
--- a/drivers/misc/habanalabs/mmu.c
+++ b/drivers/misc/habanalabs/mmu.c
@@ -241,8 +241,9 @@ static int dram_default_mapping_init(struct hl_ctx *ctx)
 		hop2_pte_addr, hop3_pte_addr, pte_val;
 	int rc, i, j, hop3_allocated = 0;
 
-	if (!hdev->dram_supports_virtual_memory ||
-			!hdev->dram_default_page_mapping)
+	if ((!hdev->dram_supports_virtual_memory) ||
+			(!hdev->dram_default_page_mapping) ||
+			(ctx->asid == HL_KERNEL_ASID_ID))
 		return 0;
 
 	num_of_hop3 = prop->dram_size_for_default_page_mapping;
@@ -340,8 +341,9 @@ static void dram_default_mapping_fini(struct hl_ctx *ctx)
 		hop2_pte_addr, hop3_pte_addr;
 	int i, j;
 
-	if (!hdev->dram_supports_virtual_memory ||
-			!hdev->dram_default_page_mapping)
+	if ((!hdev->dram_supports_virtual_memory) ||
+			(!hdev->dram_default_page_mapping) ||
+			(ctx->asid == HL_KERNEL_ASID_ID))
 		return;
 
 	num_of_hop3 = prop->dram_size_for_default_page_mapping;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] habanalabs: add MMU mappings for Goya CPU
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
                   ` (2 preceding siblings ...)
  2019-06-11  5:50 ` [PATCH 3/8] habanalabs: initialize MMU context for driver Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 5/8] habanalabs: set Goya CPU to use ASIC MMU Oded Gabbay
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch adds the necessary MMU mappings for the Goya CPU to access the
device DRAM and the host memory.

The first 256MB of the device DRAM is being mapped. That's where the F/W
is running.

The 2MB area located on the host memory for the purpose of communication
between the driver and the device CPU is also being mapped.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/debugfs.c    |   7 +-
 drivers/misc/habanalabs/goya/goya.c  | 126 +++++++++++++++++++++++++--
 drivers/misc/habanalabs/goya/goyaP.h |  12 ++-
 drivers/misc/habanalabs/habanalabs.h |   6 +-
 4 files changed, 137 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/habanalabs/debugfs.c b/drivers/misc/habanalabs/debugfs.c
index ba418aaa404c..886f8ea82499 100644
--- a/drivers/misc/habanalabs/debugfs.c
+++ b/drivers/misc/habanalabs/debugfs.c
@@ -355,7 +355,7 @@ static int mmu_show(struct seq_file *s, void *data)
 	struct hl_debugfs_entry *entry = s->private;
 	struct hl_dbg_device_entry *dev_entry = entry->dev_entry;
 	struct hl_device *hdev = dev_entry->hdev;
-	struct hl_ctx *ctx = hdev->user_ctx;
+	struct hl_ctx *ctx;
 
 	u64 hop0_addr = 0, hop0_pte_addr = 0, hop0_pte = 0,
 		hop1_addr = 0, hop1_pte_addr = 0, hop1_pte = 0,
@@ -367,6 +367,11 @@ static int mmu_show(struct seq_file *s, void *data)
 	if (!hdev->mmu_enable)
 		return 0;
 
+	if (dev_entry->mmu_asid == HL_KERNEL_ASID_ID)
+		ctx = hdev->kernel_ctx;
+	else
+		ctx = hdev->user_ctx;
+
 	if (!ctx) {
 		dev_err(hdev->dev, "no ctx available\n");
 		return 0;
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 106074466dca..4e41f2669e6d 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -297,6 +297,11 @@ static u32 goya_all_events[] = {
 	GOYA_ASYNC_EVENT_ID_DMA_BM_CH4
 };
 
+static int goya_mmu_clear_pgt_range(struct hl_device *hdev);
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
+static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev);
+static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
+
 void goya_get_fixed_properties(struct hl_device *hdev)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
@@ -554,6 +559,10 @@ int goya_late_init(struct hl_device *hdev)
 		return rc;
 	}
 
+	rc = goya_mmu_add_mappings_for_device_cpu(hdev);
+	if (rc)
+		return rc;
+
 	rc = goya_init_cpu_queues(hdev);
 	if (rc)
 		return rc;
@@ -2065,10 +2074,12 @@ static void goya_halt_engines(struct hl_device *hdev, bool hard_reset)
 	goya_disable_external_queues(hdev);
 	goya_disable_internal_queues(hdev);
 
-	if (hard_reset)
+	if (hard_reset) {
 		goya_disable_msix(hdev);
-	else
+		goya_mmu_remove_device_cpu_mappings(hdev);
+	} else {
 		goya_sync_irqs(hdev);
+	}
 }
 
 /*
@@ -4584,7 +4595,7 @@ int goya_context_switch(struct hl_device *hdev, u32 asid)
 	return 0;
 }
 
-int goya_mmu_clear_pgt_range(struct hl_device *hdev)
+static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	struct goya_device *goya = hdev->asic_specific;
@@ -4598,7 +4609,7 @@ int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 	return goya_memset_device_memory(hdev, addr, size, 0, true);
 }
 
-int goya_mmu_set_dram_default_page(struct hl_device *hdev)
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev)
 {
 	struct goya_device *goya = hdev->asic_specific;
 	u64 addr = hdev->asic_prop.mmu_dram_default_page_addr;
@@ -4611,7 +4622,112 @@ int goya_mmu_set_dram_default_page(struct hl_device *hdev)
 	return goya_memset_device_memory(hdev, addr, size, val, true);
 }
 
-void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
+static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct goya_device *goya = hdev->asic_specific;
+	s64 off, cpu_off;
+	int rc;
+
+	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+		return 0;
+
+	for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB) {
+		rc = hl_mmu_map(hdev->kernel_ctx, prop->dram_base_address + off,
+				prop->dram_base_address + off, PAGE_SIZE_2MB);
+		if (rc) {
+			dev_err(hdev->dev, "Map failed for address 0x%llx\n",
+				prop->dram_base_address + off);
+			goto unmap;
+		}
+	}
+
+	if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
+		rc = hl_mmu_map(hdev->kernel_ctx, VA_CPU_ACCESSIBLE_MEM_ADDR,
+			hdev->cpu_accessible_dma_address, PAGE_SIZE_2MB);
+
+		if (rc) {
+			dev_err(hdev->dev,
+				"Map failed for CPU accessible memory\n");
+			off -= PAGE_SIZE_2MB;
+			goto unmap;
+		}
+	} else {
+		for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB) {
+			rc = hl_mmu_map(hdev->kernel_ctx,
+				VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+				hdev->cpu_accessible_dma_address + cpu_off,
+				PAGE_SIZE_4KB);
+			if (rc) {
+				dev_err(hdev->dev,
+					"Map failed for CPU accessible memory\n");
+				cpu_off -= PAGE_SIZE_4KB;
+				goto unmap_cpu;
+			}
+		}
+	}
+
+	goya->device_cpu_mmu_mappings_done = true;
+
+	return 0;
+
+unmap_cpu:
+	for (; cpu_off >= 0 ; cpu_off -= PAGE_SIZE_4KB)
+		if (hl_mmu_unmap(hdev->kernel_ctx,
+				VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+				PAGE_SIZE_4KB))
+			dev_warn_ratelimited(hdev->dev,
+				"failed to unmap address 0x%llx\n",
+				VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
+unmap:
+	for (; off >= 0 ; off -= PAGE_SIZE_2MB)
+		if (hl_mmu_unmap(hdev->kernel_ctx,
+				prop->dram_base_address + off, PAGE_SIZE_2MB))
+			dev_warn_ratelimited(hdev->dev,
+				"failed to unmap address 0x%llx\n",
+				prop->dram_base_address + off);
+
+	return rc;
+}
+
+void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct goya_device *goya = hdev->asic_specific;
+	u32 off, cpu_off;
+
+	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+		return;
+
+	if (!goya->device_cpu_mmu_mappings_done)
+		return;
+
+	if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
+		if (hl_mmu_unmap(hdev->kernel_ctx, VA_CPU_ACCESSIBLE_MEM_ADDR,
+				PAGE_SIZE_2MB))
+			dev_warn(hdev->dev,
+				"Failed to unmap CPU accessible memory\n");
+	} else {
+		for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB)
+			if (hl_mmu_unmap(hdev->kernel_ctx,
+					VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+					PAGE_SIZE_4KB))
+				dev_warn_ratelimited(hdev->dev,
+					"failed to unmap address 0x%llx\n",
+					VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
+	}
+
+	for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB)
+		if (hl_mmu_unmap(hdev->kernel_ctx,
+				prop->dram_base_address + off, PAGE_SIZE_2MB))
+			dev_warn_ratelimited(hdev->dev,
+					"Failed to unmap address 0x%llx\n",
+					prop->dram_base_address + off);
+
+	goya->device_cpu_mmu_mappings_done = false;
+}
+
+static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
 {
 	struct goya_device *goya = hdev->asic_specific;
 	int i;
diff --git a/drivers/misc/habanalabs/goya/goyaP.h b/drivers/misc/habanalabs/goya/goyaP.h
index 066b1d306977..f8c611883dc1 100644
--- a/drivers/misc/habanalabs/goya/goyaP.h
+++ b/drivers/misc/habanalabs/goya/goyaP.h
@@ -126,6 +126,12 @@
 #define VA_DDR_SPACE_SIZE	(VA_DDR_SPACE_END - \
 					VA_DDR_SPACE_START)	/* 128GB */
 
+#if (HL_CPU_ACCESSIBLE_MEM_SIZE != SZ_2M)
+#error "HL_CPU_ACCESSIBLE_MEM_SIZE must be exactly 2MB to enable MMU mapping"
+#endif
+
+#define VA_CPU_ACCESSIBLE_MEM_ADDR	0x8000000000ull
+
 #define DMA_MAX_TRANSFER_SIZE	U32_MAX
 
 #define HW_CAP_PLL		0x00000001
@@ -157,6 +163,7 @@ struct goya_device {
 	u64		ddr_bar_cur_addr;
 	u32		events_stat[GOYA_ASYNC_EVENT_ID_SIZE];
 	u32		hw_cap_initialized;
+	u8		device_cpu_mmu_mappings_done;
 };
 
 void goya_get_fixed_properties(struct hl_device *hdev);
@@ -204,10 +211,6 @@ int goya_armcp_info_get(struct hl_device *hdev);
 int goya_debug_coresight(struct hl_device *hdev, void *data);
 void goya_halt_coresight(struct hl_device *hdev);
 
-void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
-int goya_mmu_clear_pgt_range(struct hl_device *hdev);
-int goya_mmu_set_dram_default_page(struct hl_device *hdev);
-
 int goya_suspend(struct hl_device *hdev);
 int goya_resume(struct hl_device *hdev);
 
@@ -225,5 +228,6 @@ void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 					dma_addr_t *dma_handle);
 void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
 					void *vaddr);
+void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev);
 
 #endif /* GOYAP_H_ */
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 0462b7727da7..5e4a631b3d88 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -320,10 +320,8 @@ struct hl_cs_job;
 #define HL_EQ_LENGTH			64
 #define HL_EQ_SIZE_IN_BYTES		(HL_EQ_LENGTH * HL_EQ_ENTRY_SIZE)
 
-/* KMD <-> ArmCP shared memory size (EQ + PQ + 2MB for packets) */
-#define HL_CPU_ACCESSIBLE_MEM_SIZE	(HL_EQ_SIZE_IN_BYTES + \
-					 HL_QUEUE_SIZE_IN_BYTES + \
-					 SZ_2M)
+/* KMD <-> ArmCP shared memory size */
+#define HL_CPU_ACCESSIBLE_MEM_SIZE	SZ_2M
 
 /**
  * struct hl_hw_queue - describes a H/W transport queue.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] habanalabs: set Goya CPU to use ASIC MMU
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
                   ` (3 preceding siblings ...)
  2019-06-11  5:50 ` [PATCH 4/8] habanalabs: add MMU mappings for Goya CPU Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 6/8] habanalabs: remove DMA mask hack for Goya Oded Gabbay
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch configures the Goya CPU to actually go through the MMU for
translation. The configuration is done after the configuration of the
relevant MMU mappings.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 4e41f2669e6d..9f1f47770afa 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -986,9 +986,9 @@ int goya_init_cpu_queues(struct hl_device *hdev)
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_3, upper_32_bits(eq->bus_address));
 
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_8,
-			lower_32_bits(hdev->cpu_accessible_dma_address));
+			lower_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_9,
-			upper_32_bits(hdev->cpu_accessible_dma_address));
+			upper_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_5, HL_QUEUE_SIZE_IN_BYTES);
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_4, HL_EQ_SIZE_IN_BYTES);
@@ -3011,7 +3011,13 @@ static void goya_dma_pool_free(struct hl_device *hdev, void *vaddr,
 void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 					dma_addr_t *dma_handle)
 {
-	return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
+	void *vaddr;
+
+	vaddr = hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
+	*dma_handle = (*dma_handle) - hdev->cpu_accessible_dma_address +
+			VA_CPU_ACCESSIBLE_MEM_ADDR;
+
+	return vaddr;
 }
 
 void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
@@ -4667,6 +4673,14 @@ static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev)
 		}
 	}
 
+	goya_mmu_prepare_reg(hdev, mmCPU_IF_ARUSER_OVR, HL_KERNEL_ASID_ID);
+	goya_mmu_prepare_reg(hdev, mmCPU_IF_AWUSER_OVR, HL_KERNEL_ASID_ID);
+	WREG32(mmCPU_IF_ARUSER_OVR_EN, 0x7FF);
+	WREG32(mmCPU_IF_AWUSER_OVR_EN, 0x7FF);
+
+	/* Make sure configuration is flushed to device */
+	RREG32(mmCPU_IF_AWUSER_OVR_EN);
+
 	goya->device_cpu_mmu_mappings_done = true;
 
 	return 0;
@@ -4702,6 +4716,9 @@ void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev)
 	if (!goya->device_cpu_mmu_mappings_done)
 		return;
 
+	WREG32(mmCPU_IF_ARUSER_OVR_EN, 0);
+	WREG32(mmCPU_IF_AWUSER_OVR_EN, 0);
+
 	if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
 		if (hl_mmu_unmap(hdev->kernel_ctx, VA_CPU_ACCESSIBLE_MEM_ADDR,
 				PAGE_SIZE_2MB))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] habanalabs: remove DMA mask hack for Goya
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
                   ` (4 preceding siblings ...)
  2019-06-11  5:50 ` [PATCH 5/8] habanalabs: set Goya CPU to use ASIC MMU Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 7/8] habanalabs: add WARN in case of bad MMU mapping Oded Gabbay
  2019-06-11  5:50 ` [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9 Oded Gabbay
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch removes the non-standard DMA mask setting for Goya. Now that
the device CPU goes through the MMU, we are not limited to allocating the
CPU accessible memory area in the address space of under 39 bits.
Therefore, we don't need to set the DMA masking twice during
initialization, a practice that is not working on POWER architecture.

The patch sets the DMA mask to 48 bits once during the initialization. The
address of the CPU accessible memory area is configured to the MMU and the
matching VA is given to the device CPU.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 9f1f47770afa..e8b3a31d211f 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -472,7 +472,7 @@ static int goya_early_init(struct hl_device *hdev)
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
 
-	rc = hl_pci_init(hdev, 39);
+	rc = hl_pci_init(hdev, 48);
 	if (rc)
 		return rc;
 
@@ -669,6 +669,9 @@ static int goya_sw_init(struct hl_device *hdev)
 		goto free_dma_pool;
 	}
 
+	dev_dbg(hdev->dev, "cpu accessible memory at bus address 0x%llx\n",
+		hdev->cpu_accessible_dma_address);
+
 	hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 	if (!hdev->cpu_accessible_dma_pool) {
 		dev_err(hdev->dev,
@@ -2481,25 +2484,11 @@ static int goya_hw_init(struct hl_device *hdev)
 	if (rc)
 		goto disable_queues;
 
-	/*
-	 * Check if we managed to set the DMA mask to more then 32 bits. If so,
-	 * let's try to increase it again because in Goya we set the initial
-	 * dma mask to less then 39 bits so that the allocation of the memory
-	 * area for the device's cpu will be under 39 bits
-	 */
-	if (hdev->dma_mask > 32) {
-		rc = hl_pci_set_dma_mask(hdev, 48);
-		if (rc)
-			goto disable_msix;
-	}
-
 	/* Perform read from the device to flush all MSI-X configuration */
 	val = RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 
 	return 0;
 
-disable_msix:
-	goya_disable_msix(hdev);
 disable_queues:
 	goya_disable_internal_queues(hdev);
 	goya_disable_external_queues(hdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] habanalabs: add WARN in case of bad MMU mapping
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
                   ` (5 preceding siblings ...)
  2019-06-11  5:50 ` [PATCH 6/8] habanalabs: remove DMA mask hack for Goya Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  5:50 ` [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9 Oded Gabbay
  7 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch checks if an MMU mapping is erroneous in that the physical
address that is being mapped is NOT divisible by the page size.

If that thing happens, then the H/W will issue a transaction which will be
translated to a wrong address, because part of the address will not be
taken (the remainder of address/page size).

Because the physical address is being handled by the driver, a WARN is
suitable here as it implies a bug in the driver code itself and not a user
bug.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c
index a80162c5c373..176c315836f1 100644
--- a/drivers/misc/habanalabs/mmu.c
+++ b/drivers/misc/habanalabs/mmu.c
@@ -913,6 +913,10 @@ int hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr, u32 page_size)
 		return -EFAULT;
 	}
 
+	WARN_ONCE((phys_addr & (real_page_size - 1)),
+		"Mapping 0x%llx with page size of 0x%x is erroneous! Address must be divisible by page size",
+		phys_addr, real_page_size);
+
 	npages = page_size / real_page_size;
 	real_virt_addr = virt_addr;
 	real_phys_addr = phys_addr;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9
  2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
                   ` (6 preceding siblings ...)
  2019-06-11  5:50 ` [PATCH 7/8] habanalabs: add WARN in case of bad MMU mapping Oded Gabbay
@ 2019-06-11  5:50 ` Oded Gabbay
  2019-06-11  7:59   ` Greg KH
  2019-06-11 15:12   ` Christoph Hellwig
  7 siblings, 2 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh

This patch enables support in the driver for 64-bit DMA mask when running
in a POWER9 machine.

POWER9 supports either 32-bit or 64-bit DMA mask. However, our ASICs
support 48-bit DMA mask. To support 64-bit, the driver needs to add a
special configuration to the ASIC's PCIe controller.

The activation of this special configuration is done via kernel module
parameter because:

1. It should affect all the habanalabs ASICs in the machine.

2. The pci_set_dma_mask() is a generic Linux kernel call, so the driver
   can't tell why it got an error when it tried to set the DMA mask to 48
   bits. And upon such failure, the driver must fall-back to set the mask
   to 32 bits.

3. There is no standard way to differentiate in runtime between POWER9 and
   other architectures.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c      | 6 +++++-
 drivers/misc/habanalabs/habanalabs.h     | 3 +++
 drivers/misc/habanalabs/habanalabs_drv.c | 7 +++++++
 drivers/misc/habanalabs/pci.c            | 7 ++++++-
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index e8b3a31d211f..eb6cd1ee06f2 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -472,7 +472,11 @@ static int goya_early_init(struct hl_device *hdev)
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
 
-	rc = hl_pci_init(hdev, 48);
+	if (hdev->power9_64bit_dma_enable)
+		rc = hl_pci_init(hdev, 64);
+	else
+		rc = hl_pci_init(hdev, 48);
+
 	if (rc)
 		return rc;
 
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 5e4a631b3d88..b6fa2df0b2d6 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -1208,6 +1208,8 @@ struct hl_device_reset_work {
  * @dma_mask: the dma mask that was set for this device
  * @in_debug: is device under debug. This, together with fd_open_cnt, enforces
  *            that only a single user is configuring the debug infrastructure.
+ * @power9_64bit_dma_enable: true to enable 64-bit DMA mask support. Relevant
+ *                           only to POWER9 machines.
  */
 struct hl_device {
 	struct pci_dev			*pdev;
@@ -1281,6 +1283,7 @@ struct hl_device {
 	u8				device_cpu_disabled;
 	u8				dma_mask;
 	u8				in_debug;
+	u8				power9_64bit_dma_enable;
 
 	/* Parameters for bring-up */
 	u8				mmu_enable;
diff --git a/drivers/misc/habanalabs/habanalabs_drv.c b/drivers/misc/habanalabs/habanalabs_drv.c
index 6f6dbe93f1df..9ca2d9d4f3fe 100644
--- a/drivers/misc/habanalabs/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/habanalabs_drv.c
@@ -28,6 +28,7 @@ static DEFINE_MUTEX(hl_devs_idr_lock);
 
 static int timeout_locked = 5;
 static int reset_on_lockup = 1;
+static int power9_64bit_dma_enable;
 
 module_param(timeout_locked, int, 0444);
 MODULE_PARM_DESC(timeout_locked,
@@ -37,6 +38,10 @@ module_param(reset_on_lockup, int, 0444);
 MODULE_PARM_DESC(reset_on_lockup,
 	"Do device reset on lockup (0 = no, 1 = yes, default yes)");
 
+module_param(power9_64bit_dma_enable, int, 0444);
+MODULE_PARM_DESC(power9_64bit_dma_enable,
+	"Enable 64-bit DMA mask. Should be set only in POWER9 machine (0 = no, 1 = yes, default no)");
+
 #define PCI_VENDOR_ID_HABANALABS	0x1da3
 
 #define PCI_IDS_GOYA			0x0001
@@ -223,6 +228,8 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 
 	hdev->major = hl_major;
 	hdev->reset_on_lockup = reset_on_lockup;
+	hdev->power9_64bit_dma_enable = power9_64bit_dma_enable;
+
 	hdev->pldm = 0;
 
 	set_driver_behavior_per_device(hdev);
diff --git a/drivers/misc/habanalabs/pci.c b/drivers/misc/habanalabs/pci.c
index c98d88c7a5c6..15954bf419fa 100644
--- a/drivers/misc/habanalabs/pci.c
+++ b/drivers/misc/habanalabs/pci.c
@@ -283,7 +283,12 @@ int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
 				upper_32_bits(host_phys_base_address));
 	rc |= hl_pci_iatu_write(hdev, 0x010, lower_32_bits(host_phys_end_addr));
 	rc |= hl_pci_iatu_write(hdev, 0x014, 0);
-	rc |= hl_pci_iatu_write(hdev, 0x018, 0);
+
+	if ((hdev->power9_64bit_dma_enable) && (hdev->dma_mask == 64))
+		rc |= hl_pci_iatu_write(hdev, 0x018, 0x08000000);
+	else
+		rc |= hl_pci_iatu_write(hdev, 0x018, 0);
+
 	rc |= hl_pci_iatu_write(hdev, 0x020, upper_32_bits(host_phys_end_addr));
 	/* Increase region size */
 	rc |= hl_pci_iatu_write(hdev, 0x000, 0x00002000);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9
  2019-06-11  5:50 ` [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9 Oded Gabbay
@ 2019-06-11  7:59   ` Greg KH
  2019-06-11  8:08     ` Oded Gabbay
  2019-06-11 15:12   ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Greg KH @ 2019-06-11  7:59 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-kernel

On Tue, Jun 11, 2019 at 08:50:45AM +0300, Oded Gabbay wrote:
> --- a/drivers/misc/habanalabs/habanalabs_drv.c
> +++ b/drivers/misc/habanalabs/habanalabs_drv.c
> @@ -28,6 +28,7 @@ static DEFINE_MUTEX(hl_devs_idr_lock);
>  
>  static int timeout_locked = 5;
>  static int reset_on_lockup = 1;
> +static int power9_64bit_dma_enable;
>  
>  module_param(timeout_locked, int, 0444);
>  MODULE_PARM_DESC(timeout_locked,
> @@ -37,6 +38,10 @@ module_param(reset_on_lockup, int, 0444);
>  MODULE_PARM_DESC(reset_on_lockup,
>  	"Do device reset on lockup (0 = no, 1 = yes, default yes)");
>  
> +module_param(power9_64bit_dma_enable, int, 0444);
> +MODULE_PARM_DESC(power9_64bit_dma_enable,
> +	"Enable 64-bit DMA mask. Should be set only in POWER9 machine (0 = no, 1 = yes, default no)");
> +
>  #define PCI_VENDOR_ID_HABANALABS	0x1da3
>  
>  #define PCI_IDS_GOYA			0x0001


This is not the 1990's, please do not use module parameters.  Yeah, you
have a bunch of them already, but do not add additional ones that can be
easily determined at runtime, like this one.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9
  2019-06-11  7:59   ` Greg KH
@ 2019-06-11  8:08     ` Oded Gabbay
  2019-06-11  8:47       ` Oded Gabbay
  0 siblings, 1 reply; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  8:08 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org

On Tue, Jun 11, 2019 at 10:59 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 11, 2019 at 08:50:45AM +0300, Oded Gabbay wrote:
> > --- a/drivers/misc/habanalabs/habanalabs_drv.c
> > +++ b/drivers/misc/habanalabs/habanalabs_drv.c
> > @@ -28,6 +28,7 @@ static DEFINE_MUTEX(hl_devs_idr_lock);
> >
> >  static int timeout_locked = 5;
> >  static int reset_on_lockup = 1;
> > +static int power9_64bit_dma_enable;
> >
> >  module_param(timeout_locked, int, 0444);
> >  MODULE_PARM_DESC(timeout_locked,
> > @@ -37,6 +38,10 @@ module_param(reset_on_lockup, int, 0444);
> >  MODULE_PARM_DESC(reset_on_lockup,
> >       "Do device reset on lockup (0 = no, 1 = yes, default yes)");
> >
> > +module_param(power9_64bit_dma_enable, int, 0444);
> > +MODULE_PARM_DESC(power9_64bit_dma_enable,
> > +     "Enable 64-bit DMA mask. Should be set only in POWER9 machine (0 = no, 1 = yes, default no)");
> > +
> >  #define PCI_VENDOR_ID_HABANALABS     0x1da3
> >
> >  #define PCI_IDS_GOYA                 0x0001
>
>
> This is not the 1990's, please do not use module parameters.  Yeah, you
> have a bunch of them already, but do not add additional ones that can be
> easily determined at runtime, like this one.
>
> thanks,
>
> greg k-h

Hi Greg,
I would love to do this in runtime and that was my intent all along
until I hit a wall on *how* to find out it in runtime if I'm running
on POWER9 with PHB4 or not.
I did a search in the kernel code, consulted with a couple of people
but I didn't get any way of doing this in runtime.
If you have some way, please share it with me because I hit a wall
with this issue.

The fact of the matter is, I have two different configurations of *my*
device's PCIe controller. One is only suitable to POWER9 with PHB4 and
the other one suits all the rest architectures/systems (that we have
tested so far). So I have to know which system I'm running on and as I
said, I didn't find a kernel API which can help me do that.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9
  2019-06-11  8:08     ` Oded Gabbay
@ 2019-06-11  8:47       ` Oded Gabbay
  0 siblings, 0 replies; 13+ messages in thread
From: Oded Gabbay @ 2019-06-11  8:47 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org

On Tue, Jun 11, 2019 at 11:08 AM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Tue, Jun 11, 2019 at 10:59 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Jun 11, 2019 at 08:50:45AM +0300, Oded Gabbay wrote:
> > > --- a/drivers/misc/habanalabs/habanalabs_drv.c
> > > +++ b/drivers/misc/habanalabs/habanalabs_drv.c
> > > @@ -28,6 +28,7 @@ static DEFINE_MUTEX(hl_devs_idr_lock);
> > >
> > >  static int timeout_locked = 5;
> > >  static int reset_on_lockup = 1;
> > > +static int power9_64bit_dma_enable;
> > >
> > >  module_param(timeout_locked, int, 0444);
> > >  MODULE_PARM_DESC(timeout_locked,
> > > @@ -37,6 +38,10 @@ module_param(reset_on_lockup, int, 0444);
> > >  MODULE_PARM_DESC(reset_on_lockup,
> > >       "Do device reset on lockup (0 = no, 1 = yes, default yes)");
> > >
> > > +module_param(power9_64bit_dma_enable, int, 0444);
> > > +MODULE_PARM_DESC(power9_64bit_dma_enable,
> > > +     "Enable 64-bit DMA mask. Should be set only in POWER9 machine (0 = no, 1 = yes, default no)");
> > > +
> > >  #define PCI_VENDOR_ID_HABANALABS     0x1da3
> > >
> > >  #define PCI_IDS_GOYA                 0x0001
> >
> >
> > This is not the 1990's, please do not use module parameters.  Yeah, you
> > have a bunch of them already, but do not add additional ones that can be
> > easily determined at runtime, like this one.
> >
> > thanks,
> >
> > greg k-h
>
> Hi Greg,
> I would love to do this in runtime and that was my intent all along
> until I hit a wall on *how* to find out it in runtime if I'm running
> on POWER9 with PHB4 or not.
> I did a search in the kernel code, consulted with a couple of people
> but I didn't get any way of doing this in runtime.
> If you have some way, please share it with me because I hit a wall
> with this issue.
>
> The fact of the matter is, I have two different configurations of *my*
> device's PCIe controller. One is only suitable to POWER9 with PHB4 and
> the other one suits all the rest architectures/systems (that we have
> tested so far). So I have to know which system I'm running on and as I
> said, I didn't find a kernel API which can help me do that.
>
> Thanks,
> Oded

btw, even the powernv code determines the PHB model by reading the
device-tree file. They don't even read it from the controller.

Having said that, it occurred to me that I may be able to determine
this by the PCI ID of the parent bus of my device. It has a unique PCI
ID so hopefully that will be enough.
I will check and update here.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9
  2019-06-11  5:50 ` [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9 Oded Gabbay
  2019-06-11  7:59   ` Greg KH
@ 2019-06-11 15:12   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:12 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-kernel, gregkh

On Tue, Jun 11, 2019 at 08:50:45AM +0300, Oded Gabbay wrote:
> 2. The pci_set_dma_mask() is a generic Linux kernel call, so the driver
>    can't tell why it got an error when it tried to set the DMA mask to 48
>    bits. And upon such failure, the driver must fall-back to set the mask
>    to 32 bits.

In the current kernel pci_set_dma_mask only fails if the DMA mask is
to small to be supportable at all.  So you very obviously did not
actually test this against mainline.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-11 15:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-11  5:50 [PATCH 0/8] Fixing DMA mask issues in habanalabs driver Oded Gabbay
2019-06-11  5:50 ` [PATCH 1/8] habanalabs: initialize device CPU queues after MMU init Oded Gabbay
2019-06-11  5:50 ` [PATCH 2/8] habanalabs: de-couple MMU and VM module initialization Oded Gabbay
2019-06-11  5:50 ` [PATCH 3/8] habanalabs: initialize MMU context for driver Oded Gabbay
2019-06-11  5:50 ` [PATCH 4/8] habanalabs: add MMU mappings for Goya CPU Oded Gabbay
2019-06-11  5:50 ` [PATCH 5/8] habanalabs: set Goya CPU to use ASIC MMU Oded Gabbay
2019-06-11  5:50 ` [PATCH 6/8] habanalabs: remove DMA mask hack for Goya Oded Gabbay
2019-06-11  5:50 ` [PATCH 7/8] habanalabs: add WARN in case of bad MMU mapping Oded Gabbay
2019-06-11  5:50 ` [PATCH 8/8] habanalabs: enable 64-bit DMA mask in POWER9 Oded Gabbay
2019-06-11  7:59   ` Greg KH
2019-06-11  8:08     ` Oded Gabbay
2019-06-11  8:47       ` Oded Gabbay
2019-06-11 15:12   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).