amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault
@ 2020-11-03 13:58 Christian König
  2020-11-03 13:58 ` [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address Christian König
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

The value is inclusive, not exclusive.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index dc0bc550e42b..fdbe7d4e8b8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3351,7 +3351,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
 	}
 
 	r = amdgpu_vm_bo_update_mapping(adev, adev, vm, true, false, NULL, addr,
-					addr + 1, flags, value, NULL, NULL,
+					addr, flags, value, NULL, NULL,
 					NULL);
 	if (r)
 		goto error_unlock;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-12  6:18   ` Chauhan, Madhav
  2020-11-03 13:58 ` [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit Christian König
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

The address space is only 48bit, not 64bit. And the VMHUBs work with
sign extended addresses.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 0c3421d587e8..e86ef0c36596 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -557,7 +557,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
 		entry->src_id, entry->ring_id, entry->vmid,
 		entry->pasid, task_info.process_name, task_info.tgid,
 		task_info.task_name, task_info.pid);
-	dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
+	dev_err(adev->dev, "  in page starting at address 0x%012llx from client %d\n",
 		addr, entry->client_id);
 
 	if (amdgpu_sriov_vf(adev))
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
  2020-11-03 13:58 ` [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-12  6:26   ` Chauhan, Madhav
  2020-11-03 13:58 ` [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring Christian König
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Return early in case of a ratelimit and don't print leading zeros for
the address.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 38 ++++++++++++++------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d9399324be47..cffc3ca8fcde 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -95,6 +95,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 				       struct amdgpu_iv_entry *entry)
 {
 	struct amdgpu_vmhub *hub = &adev->vmhub[entry->vmid_src];
+	struct amdgpu_task_info task_info;
 	uint32_t status = 0;
 	u64 addr;
 
@@ -115,24 +116,25 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 		WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
 	}
 
-	if (printk_ratelimit()) {
-		struct amdgpu_task_info task_info;
-
-		memset(&task_info, 0, sizeof(struct amdgpu_task_info));
-		amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
-
-		dev_err(adev->dev,
-			"[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, "
-			"for process %s pid %d thread %s pid %d)\n",
-			entry->vmid_src ? "mmhub" : "gfxhub",
-			entry->src_id, entry->ring_id, entry->vmid,
-			entry->pasid, task_info.process_name, task_info.tgid,
-			task_info.task_name, task_info.pid);
-		dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
-			addr, entry->client_id);
-		if (!amdgpu_sriov_vf(adev))
-			hub->vmhub_funcs->print_l2_protection_fault_status(adev, status);
-	}
+	if (!printk_ratelimit())
+		return 0;
+
+	memset(&task_info, 0, sizeof(struct amdgpu_task_info));
+	amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
+
+	dev_err(adev->dev,
+		"[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, "
+		"for process %s pid %d thread %s pid %d)\n",
+		entry->vmid_src ? "mmhub" : "gfxhub",
+		entry->src_id, entry->ring_id, entry->vmid,
+		entry->pasid, task_info.process_name, task_info.tgid,
+		task_info.task_name, task_info.pid);
+	dev_err(adev->dev, "  in page starting at address 0x%012llx from client %d\n",
+		addr, entry->client_id);
+
+	if (!amdgpu_sriov_vf(adev))
+		hub->vmhub_funcs->print_l2_protection_fault_status(adev,
+								   status);
 
 	return 0;
 }
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
  2020-11-03 13:58 ` [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address Christian König
  2020-11-03 13:58 ` [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-03 16:00   ` Felix Kuehling
  2020-11-03 13:58 ` [PATCH 5/8] drm/amdgpu: enabled software IH ring for Vega Christian König
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Add a soft IH ring implementation similar to the hardware IH1/2.

This can be used if the hardware delegation of interrupts to IH1/2
doesn't work for some reason.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 29 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 35 +++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  8 ++++--
 4 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
index 111a301ce878..dcd9b4a8e20b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
@@ -131,6 +131,35 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
 	}
 }
 
+/**
+ * amdgpu_ih_ring_write - write IV to the ring buffer
+ *
+ * @ih: ih ring to write to
+ * @iv: the iv to write
+ * @num_dw: size of the iv in dw
+ *
+ * Writes an IV to the ring buffer using the CPU and increment the wptr.
+ * Used for testing and delegating IVs to a software ring.
+ */
+void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
+			  unsigned int num_dw)
+{
+	uint32_t wptr = le32_to_cpu(*ih->wptr_cpu) >> 2;
+	unsigned int i;
+
+	for (i = 0; i < num_dw; ++i)
+	        ih->ring[wptr++] = cpu_to_le32(iv[i]);
+
+	wptr <<= 2;
+	wptr &= ih->ptr_mask;
+
+	/* Only commit the new wptr if we don't overflow */
+	if (wptr != READ_ONCE(ih->rptr)) {
+		wmb();
+		WRITE_ONCE(*ih->wptr_cpu, cpu_to_le32(wptr));
+	}
+}
+
 /**
  * amdgpu_ih_process - interrupt handler
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index 4e0bb645176d..3c9cfe7eecff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -72,6 +72,8 @@ struct amdgpu_ih_funcs {
 int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih,
 			unsigned ring_size, bool use_bus_addr);
 void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
+void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
+			  unsigned int num_dw);
 int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 300ac73b4738..bea57e8e793f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -206,6 +206,21 @@ static void amdgpu_irq_handle_ih2(struct work_struct *work)
 	amdgpu_ih_process(adev, &adev->irq.ih2);
 }
 
+/**
+ * amdgpu_irq_handle_ih_soft - kick of processing for ih_soft
+ *
+ * @work: work structure in struct amdgpu_irq
+ *
+ * Kick of processing IH soft ring.
+ */
+static void amdgpu_irq_handle_ih_soft(struct work_struct *work)
+{
+	struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+						  irq.ih_soft_work);
+
+	amdgpu_ih_process(adev, &adev->irq.ih_soft);
+}
+
 /**
  * amdgpu_msi_ok - check whether MSI functionality is enabled
  *
@@ -281,6 +296,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 
 	INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1);
 	INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2);
+	INIT_WORK(&adev->irq.ih_soft_work, amdgpu_irq_handle_ih_soft);
 
 	adev->irq.installed = true;
 	/* Use vector 0 for MSI-X */
@@ -413,6 +429,7 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
 	bool handled = false;
 	int r;
 
+	entry.ih = ih;
 	entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
 	amdgpu_ih_decode_iv(adev, &entry);
 
@@ -450,6 +467,24 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
 		amdgpu_amdkfd_interrupt(adev, entry.iv_entry);
 }
 
+/**
+ * amdgpu_irq_delegate - delegate IV to soft IH ring
+ *
+ * @adev: amdgpu device pointer
+ * @entry: IV entry
+ * @num_dw: size of IV
+ *
+ * Delegate the IV to the soft IH ring and schedule processing of it. Used
+ * if the hardware delegation to IH1 or IH2 doesn't work for some reason.
+ */
+void amdgpu_irq_delegate(struct amdgpu_device *adev,
+			 struct amdgpu_iv_entry *entry,
+			 unsigned int num_dw)
+{
+	amdgpu_ih_ring_write(&adev->irq.ih_soft, entry->iv_entry, num_dw);
+	schedule_work(&adev->irq.ih_soft_work);
+}
+
 /**
  * amdgpu_irq_update - update hardware interrupt state
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index c718e94a55c9..ac527e5deae6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -44,6 +44,7 @@ enum amdgpu_interrupt_state {
 };
 
 struct amdgpu_iv_entry {
+	struct amdgpu_ih_ring *ih;
 	unsigned client_id;
 	unsigned src_id;
 	unsigned ring_id;
@@ -88,9 +89,9 @@ struct amdgpu_irq {
 	bool				msi_enabled; /* msi enabled */
 
 	/* interrupt rings */
-	struct amdgpu_ih_ring		ih, ih1, ih2;
+	struct amdgpu_ih_ring		ih, ih1, ih2, ih_soft;
 	const struct amdgpu_ih_funcs    *ih_funcs;
-	struct work_struct		ih1_work, ih2_work;
+	struct work_struct		ih1_work, ih2_work, ih_soft_work;
 	struct amdgpu_irq_src		self_irq;
 
 	/* gen irq stuff */
@@ -109,6 +110,9 @@ int amdgpu_irq_add_id(struct amdgpu_device *adev,
 		      struct amdgpu_irq_src *source);
 void amdgpu_irq_dispatch(struct amdgpu_device *adev,
 			 struct amdgpu_ih_ring *ih);
+void amdgpu_irq_delegate(struct amdgpu_device *adev,
+			 struct amdgpu_iv_entry *entry,
+			 unsigned int num_dw);
 int amdgpu_irq_update(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
 		      unsigned type);
 int amdgpu_irq_get(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] drm/amdgpu: enabled software IH ring for Vega
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
                   ` (2 preceding siblings ...)
  2020-11-03 13:58 ` [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-03 13:58 ` [PATCH 6/8] drm/amdgpu: make sure retry faults are handled in a work item on Vega Christian König
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Seems like we won't get the hardware IH1/2 rings on Vega20 working.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 407c6093c2ec..cef61dd46a37 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -91,6 +91,9 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev)
 		}
 		adev->irq.ih2.enabled = true;
 	}
+
+	if (adev->irq.ih_soft.ring_size)
+		adev->irq.ih_soft.enabled = true;
 }
 
 /**
@@ -606,6 +609,10 @@ static int vega10_ih_sw_init(void *handle)
 	adev->irq.ih2.use_doorbell = true;
 	adev->irq.ih2.doorbell_index = (adev->doorbell_index.ih + 2) << 1;
 
+	r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, PAGE_SIZE, true);
+	if (r)
+		return r;
+
 	r = amdgpu_irq_init(adev);
 
 	return r;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/8] drm/amdgpu: make sure retry faults are handled in a work item on Vega
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
                   ` (3 preceding siblings ...)
  2020-11-03 13:58 ` [PATCH 5/8] drm/amdgpu: enabled software IH ring for Vega Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-03 13:58 ` [PATCH 7/8] drm/amdgpu: enabled software IH ring for Navi Christian König
  2020-11-03 13:58 ` [PATCH 8/8] drm/amdgpu: implement retry fault handling " Christian König
  6 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Looks like we can't enabled the IH1/IH2 feature for Vega20, make sure
retry faults are handled on a separate ring anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 31 ++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index e86ef0c36596..f8b4b1ca9473 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -524,14 +524,29 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
 	addr = (u64)entry->src_data[0] << 12;
 	addr |= ((u64)entry->src_data[1] & 0xf) << 44;
 
-	if (retry_fault && amdgpu_gmc_filter_faults(adev, addr, entry->pasid,
-						    entry->timestamp))
-		return 1; /* This also prevents sending it to KFD */
-
-	/* If it's the first fault for this address, process it normally */
-	if (retry_fault && !in_interrupt() &&
-	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
-		return 1; /* This also prevents sending it to KFD */
+	if (retry_fault) {
+		/* Returning 1 here also prevents sending the IV to the KFD */
+
+		/* Process it onyl if it's the first fault for this address */
+		if (entry->ih != &adev->irq.ih_soft &&
+		    amdgpu_gmc_filter_faults(adev, addr, entry->pasid,
+					     entry->timestamp))
+			return 1;
+
+		/* Delegate it to a different ring if the hardware hasn't
+		 * already done it.
+		 */
+		if (in_interrupt()) {
+			amdgpu_irq_delegate(adev, entry, 8);
+			return 1;
+		}
+
+		/* Try to handle the recoverable page faults by filling page
+		 * tables
+		 */
+		if (amdgpu_vm_handle_fault(adev, entry->pasid, addr))
+			return 1;
+	}
 
 	if (!printk_ratelimit())
 		return 0;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/8] drm/amdgpu: enabled software IH ring for Navi
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
                   ` (4 preceding siblings ...)
  2020-11-03 13:58 ` [PATCH 6/8] drm/amdgpu: make sure retry faults are handled in a work item on Vega Christian König
@ 2020-11-03 13:58 ` Christian König
  2020-11-03 13:58 ` [PATCH 8/8] drm/amdgpu: implement retry fault handling " Christian König
  6 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Felix pointed out that we need this for Navi as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 837769fcb35b..ce0a02a4b5d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -136,6 +136,9 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev)
 		}
 		adev->irq.ih2.enabled = true;
 	}
+
+	if (adev->irq.ih_soft.ring_size)
+		adev->irq.ih_soft.enabled = true;
 }
 
 /**
@@ -695,6 +698,10 @@ static int navi10_ih_sw_init(void *handle)
 					(adev->doorbell_index.ih + 2) << 1;
 	}
 
+	r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, PAGE_SIZE, true);
+	if (r)
+		return r;
+
 	r = amdgpu_irq_init(adev);
 
 	return r;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/8] drm/amdgpu: implement retry fault handling for Navi
  2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
                   ` (5 preceding siblings ...)
  2020-11-03 13:58 ` [PATCH 7/8] drm/amdgpu: enabled software IH ring for Navi Christian König
@ 2020-11-03 13:58 ` Christian König
  6 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 13:58 UTC (permalink / raw)
  To: amd-gfx

Same as gmc9, basically filter the fault, reroute or handle it.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index cffc3ca8fcde..4f6e44e21691 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -94,6 +94,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 				       struct amdgpu_irq_src *source,
 				       struct amdgpu_iv_entry *entry)
 {
+	bool retry_fault = !!(entry->src_data[1] & 0x80);
 	struct amdgpu_vmhub *hub = &adev->vmhub[entry->vmid_src];
 	struct amdgpu_task_info task_info;
 	uint32_t status = 0;
@@ -102,6 +103,30 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 	addr = (u64)entry->src_data[0] << 12;
 	addr |= ((u64)entry->src_data[1] & 0xf) << 44;
 
+	if (retry_fault) {
+		/* Returning 1 here also prevents sending the IV to the KFD */
+
+		/* Process it onyl if it's the first fault for this address */
+		if (entry->ih != &adev->irq.ih_soft &&
+		    amdgpu_gmc_filter_faults(adev, addr, entry->pasid,
+					     entry->timestamp))
+			return 1;
+
+		/* Delegate it to a different ring if the hardware hasn't
+		 * already done it.
+		 */
+		if (in_interrupt()) {
+			amdgpu_irq_delegate(adev, entry, 8);
+			return 1;
+		}
+
+		/* Try to handle the recoverable page faults by filling page
+		 * tables
+		 */
+		if (amdgpu_vm_handle_fault(adev, entry->pasid, addr))
+			return 1;
+	}
+
 	if (!amdgpu_sriov_vf(adev)) {
 		/*
 		 * Issue a dummy read to wait for the status register to
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring
  2020-11-03 13:58 ` [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring Christian König
@ 2020-11-03 16:00   ` Felix Kuehling
  2020-11-03 16:05     ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Felix Kuehling @ 2020-11-03 16:00 UTC (permalink / raw)
  To: amd-gfx, Koenig, Christian

Am 2020-11-03 um 8:58 a.m. schrieb Christian König:
> Add a soft IH ring implementation similar to the hardware IH1/2.
>
> This can be used if the hardware delegation of interrupts to IH1/2
> doesn't work for some reason.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 29 ++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 35 +++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  8 ++++--
>  4 files changed, 72 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index 111a301ce878..dcd9b4a8e20b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -131,6 +131,35 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>  	}
>  }
>  
> +/**
> + * amdgpu_ih_ring_write - write IV to the ring buffer
> + *
> + * @ih: ih ring to write to
> + * @iv: the iv to write
> + * @num_dw: size of the iv in dw
> + *
> + * Writes an IV to the ring buffer using the CPU and increment the wptr.
> + * Used for testing and delegating IVs to a software ring.
> + */
> +void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
> +			  unsigned int num_dw)
> +{
> +	uint32_t wptr = le32_to_cpu(*ih->wptr_cpu) >> 2;
> +	unsigned int i;
> +
> +	for (i = 0; i < num_dw; ++i)
> +	        ih->ring[wptr++] = cpu_to_le32(iv[i]);
> +
> +	wptr <<= 2;
> +	wptr &= ih->ptr_mask;
> +
> +	/* Only commit the new wptr if we don't overflow */
> +	if (wptr != READ_ONCE(ih->rptr)) {
> +		wmb();
> +		WRITE_ONCE(*ih->wptr_cpu, cpu_to_le32(wptr));
> +	}

If you return a status here (interrupt delegated or dropped?), you can
make the schedule_work call conditional below and avoid unnecessary
scheduling.


> +}
> +
>  /**
>   * amdgpu_ih_process - interrupt handler
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> index 4e0bb645176d..3c9cfe7eecff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> @@ -72,6 +72,8 @@ struct amdgpu_ih_funcs {
>  int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih,
>  			unsigned ring_size, bool use_bus_addr);
>  void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
> +void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
> +			  unsigned int num_dw);
>  int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
>  
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 300ac73b4738..bea57e8e793f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -206,6 +206,21 @@ static void amdgpu_irq_handle_ih2(struct work_struct *work)
>  	amdgpu_ih_process(adev, &adev->irq.ih2);
>  }
>  
> +/**
> + * amdgpu_irq_handle_ih_soft - kick of processing for ih_soft
> + *
> + * @work: work structure in struct amdgpu_irq
> + *
> + * Kick of processing IH soft ring.
> + */
> +static void amdgpu_irq_handle_ih_soft(struct work_struct *work)
> +{
> +	struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
> +						  irq.ih_soft_work);
> +
> +	amdgpu_ih_process(adev, &adev->irq.ih_soft);
> +}
> +
>  /**
>   * amdgpu_msi_ok - check whether MSI functionality is enabled
>   *
> @@ -281,6 +296,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  
>  	INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1);
>  	INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2);
> +	INIT_WORK(&adev->irq.ih_soft_work, amdgpu_irq_handle_ih_soft);
>  
>  	adev->irq.installed = true;
>  	/* Use vector 0 for MSI-X */
> @@ -413,6 +429,7 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>  	bool handled = false;
>  	int r;
>  
> +	entry.ih = ih;
>  	entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
>  	amdgpu_ih_decode_iv(adev, &entry);
>  
> @@ -450,6 +467,24 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>  		amdgpu_amdkfd_interrupt(adev, entry.iv_entry);
>  }
>  
> +/**
> + * amdgpu_irq_delegate - delegate IV to soft IH ring
> + *
> + * @adev: amdgpu device pointer
> + * @entry: IV entry
> + * @num_dw: size of IV
> + *
> + * Delegate the IV to the soft IH ring and schedule processing of it. Used
> + * if the hardware delegation to IH1 or IH2 doesn't work for some reason.
> + */
> +void amdgpu_irq_delegate(struct amdgpu_device *adev,
> +			 struct amdgpu_iv_entry *entry,
> +			 unsigned int num_dw)
> +{
> +	amdgpu_ih_ring_write(&adev->irq.ih_soft, entry->iv_entry, num_dw);
> +	schedule_work(&adev->irq.ih_soft_work);

I'd make this conditional, only if amdgpu_ih_ring_write actually wrote
something. When the soft ring is overflowing you don't need to schedule.
I'm not sure if this makes any practical difference. When the ring is
overflowing, the worker is already scheduled or running, so
schedule_work should in theory do nothing. It just may be less efficient
at figuring that out.

Also, should this work be scheduled on the system_highpri_wq?

Other than that, the series is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> +}
> +
>  /**
>   * amdgpu_irq_update - update hardware interrupt state
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index c718e94a55c9..ac527e5deae6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -44,6 +44,7 @@ enum amdgpu_interrupt_state {
>  };
>  
>  struct amdgpu_iv_entry {
> +	struct amdgpu_ih_ring *ih;
>  	unsigned client_id;
>  	unsigned src_id;
>  	unsigned ring_id;
> @@ -88,9 +89,9 @@ struct amdgpu_irq {
>  	bool				msi_enabled; /* msi enabled */
>  
>  	/* interrupt rings */
> -	struct amdgpu_ih_ring		ih, ih1, ih2;
> +	struct amdgpu_ih_ring		ih, ih1, ih2, ih_soft;
>  	const struct amdgpu_ih_funcs    *ih_funcs;
> -	struct work_struct		ih1_work, ih2_work;
> +	struct work_struct		ih1_work, ih2_work, ih_soft_work;
>  	struct amdgpu_irq_src		self_irq;
>  
>  	/* gen irq stuff */
> @@ -109,6 +110,9 @@ int amdgpu_irq_add_id(struct amdgpu_device *adev,
>  		      struct amdgpu_irq_src *source);
>  void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>  			 struct amdgpu_ih_ring *ih);
> +void amdgpu_irq_delegate(struct amdgpu_device *adev,
> +			 struct amdgpu_iv_entry *entry,
> +			 unsigned int num_dw);
>  int amdgpu_irq_update(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>  		      unsigned type);
>  int amdgpu_irq_get(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring
  2020-11-03 16:00   ` Felix Kuehling
@ 2020-11-03 16:05     ` Christian König
  0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2020-11-03 16:05 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx

Am 03.11.20 um 17:00 schrieb Felix Kuehling:
> Am 2020-11-03 um 8:58 a.m. schrieb Christian König:
>> Add a soft IH ring implementation similar to the hardware IH1/2.
>>
>> This can be used if the hardware delegation of interrupts to IH1/2
>> doesn't work for some reason.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 29 ++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 35 +++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  8 ++++--
>>   4 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> index 111a301ce878..dcd9b4a8e20b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> @@ -131,6 +131,35 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>>   	}
>>   }
>>   
>> +/**
>> + * amdgpu_ih_ring_write - write IV to the ring buffer
>> + *
>> + * @ih: ih ring to write to
>> + * @iv: the iv to write
>> + * @num_dw: size of the iv in dw
>> + *
>> + * Writes an IV to the ring buffer using the CPU and increment the wptr.
>> + * Used for testing and delegating IVs to a software ring.
>> + */
>> +void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
>> +			  unsigned int num_dw)
>> +{
>> +	uint32_t wptr = le32_to_cpu(*ih->wptr_cpu) >> 2;
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < num_dw; ++i)
>> +	        ih->ring[wptr++] = cpu_to_le32(iv[i]);
>> +
>> +	wptr <<= 2;
>> +	wptr &= ih->ptr_mask;
>> +
>> +	/* Only commit the new wptr if we don't overflow */
>> +	if (wptr != READ_ONCE(ih->rptr)) {
>> +		wmb();
>> +		WRITE_ONCE(*ih->wptr_cpu, cpu_to_le32(wptr));
>> +	}
> If you return a status here (interrupt delegated or dropped?), you can
> make the schedule_work call conditional below and avoid unnecessary
> scheduling.
>
>
>> +}
>> +
>>   /**
>>    * amdgpu_ih_process - interrupt handler
>>    *
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>> index 4e0bb645176d..3c9cfe7eecff 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>> @@ -72,6 +72,8 @@ struct amdgpu_ih_funcs {
>>   int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih,
>>   			unsigned ring_size, bool use_bus_addr);
>>   void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
>> +void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
>> +			  unsigned int num_dw);
>>   int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
>>   
>>   #endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 300ac73b4738..bea57e8e793f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -206,6 +206,21 @@ static void amdgpu_irq_handle_ih2(struct work_struct *work)
>>   	amdgpu_ih_process(adev, &adev->irq.ih2);
>>   }
>>   
>> +/**
>> + * amdgpu_irq_handle_ih_soft - kick of processing for ih_soft
>> + *
>> + * @work: work structure in struct amdgpu_irq
>> + *
>> + * Kick of processing IH soft ring.
>> + */
>> +static void amdgpu_irq_handle_ih_soft(struct work_struct *work)
>> +{
>> +	struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
>> +						  irq.ih_soft_work);
>> +
>> +	amdgpu_ih_process(adev, &adev->irq.ih_soft);
>> +}
>> +
>>   /**
>>    * amdgpu_msi_ok - check whether MSI functionality is enabled
>>    *
>> @@ -281,6 +296,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>   
>>   	INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1);
>>   	INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2);
>> +	INIT_WORK(&adev->irq.ih_soft_work, amdgpu_irq_handle_ih_soft);
>>   
>>   	adev->irq.installed = true;
>>   	/* Use vector 0 for MSI-X */
>> @@ -413,6 +429,7 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>>   	bool handled = false;
>>   	int r;
>>   
>> +	entry.ih = ih;
>>   	entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
>>   	amdgpu_ih_decode_iv(adev, &entry);
>>   
>> @@ -450,6 +467,24 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>>   		amdgpu_amdkfd_interrupt(adev, entry.iv_entry);
>>   }
>>   
>> +/**
>> + * amdgpu_irq_delegate - delegate IV to soft IH ring
>> + *
>> + * @adev: amdgpu device pointer
>> + * @entry: IV entry
>> + * @num_dw: size of IV
>> + *
>> + * Delegate the IV to the soft IH ring and schedule processing of it. Used
>> + * if the hardware delegation to IH1 or IH2 doesn't work for some reason.
>> + */
>> +void amdgpu_irq_delegate(struct amdgpu_device *adev,
>> +			 struct amdgpu_iv_entry *entry,
>> +			 unsigned int num_dw)
>> +{
>> +	amdgpu_ih_ring_write(&adev->irq.ih_soft, entry->iv_entry, num_dw);
>> +	schedule_work(&adev->irq.ih_soft_work);
> I'd make this conditional, only if amdgpu_ih_ring_write actually wrote
> something. When the soft ring is overflowing you don't need to schedule.
> I'm not sure if this makes any practical difference. When the ring is
> overflowing, the worker is already scheduled or running, so
> schedule_work should in theory do nothing. It just may be less efficient
> at figuring that out.

Yeah, I considered that as well.

But then came to the conclusion that when the ring is full scheduling 
the work item again is probably a good idea.

Not 100% sure, but it might make a difference if the irq load balance 
decides to shift the driver to a different CPU.

> Also, should this work be scheduled on the system_highpri_wq?

Good question, I'm torn apart on that.

On the one hand we want to have it handled as fast as possible.

On the other hand we have the potential of clogging a whole CPU with the 
work.

>
> Other than that, the series is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks,
Christian.

>
>
>> +}
>> +
>>   /**
>>    * amdgpu_irq_update - update hardware interrupt state
>>    *
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> index c718e94a55c9..ac527e5deae6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> @@ -44,6 +44,7 @@ enum amdgpu_interrupt_state {
>>   };
>>   
>>   struct amdgpu_iv_entry {
>> +	struct amdgpu_ih_ring *ih;
>>   	unsigned client_id;
>>   	unsigned src_id;
>>   	unsigned ring_id;
>> @@ -88,9 +89,9 @@ struct amdgpu_irq {
>>   	bool				msi_enabled; /* msi enabled */
>>   
>>   	/* interrupt rings */
>> -	struct amdgpu_ih_ring		ih, ih1, ih2;
>> +	struct amdgpu_ih_ring		ih, ih1, ih2, ih_soft;
>>   	const struct amdgpu_ih_funcs    *ih_funcs;
>> -	struct work_struct		ih1_work, ih2_work;
>> +	struct work_struct		ih1_work, ih2_work, ih_soft_work;
>>   	struct amdgpu_irq_src		self_irq;
>>   
>>   	/* gen irq stuff */
>> @@ -109,6 +110,9 @@ int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>   		      struct amdgpu_irq_src *source);
>>   void amdgpu_irq_dispatch(struct amdgpu_device *adev,
>>   			 struct amdgpu_ih_ring *ih);
>> +void amdgpu_irq_delegate(struct amdgpu_device *adev,
>> +			 struct amdgpu_iv_entry *entry,
>> +			 unsigned int num_dw);
>>   int amdgpu_irq_update(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>>   		      unsigned type);
>>   int amdgpu_irq_get(struct amdgpu_device *adev, struct amdgpu_irq_src *src,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address
  2020-11-03 13:58 ` [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address Christian König
@ 2020-11-12  6:18   ` Chauhan, Madhav
  0 siblings, 0 replies; 12+ messages in thread
From: Chauhan, Madhav @ 2020-11-12  6:18 UTC (permalink / raw)
  To: Christian König, amd-gfx

[AMD Public Use]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Tuesday, November 3, 2020 7:29 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address

The address space is only 48bit, not 64bit. And the VMHUBs work with sign extended addresses.

Looks fine to me, Reviewed-by: Madhav Chauhan <madhav.chauhan@amd.com>

Regards,
Madhav

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 0c3421d587e8..e86ef0c36596 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -557,7 +557,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
 		entry->src_id, entry->ring_id, entry->vmid,
 		entry->pasid, task_info.process_name, task_info.tgid,
 		task_info.task_name, task_info.pid);
-	dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
+	dev_err(adev->dev, "  in page starting at address 0x%012llx from 
+client %d\n",
 		addr, entry->client_id);
 
 	if (amdgpu_sriov_vf(adev))
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cmadhav.chauhan%40amd.com%7C84c6c5a2c35f48bef42708d88000938a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637400087280302611%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=atlWsalQb3Ig0MrKVJezjjsxg5IZLynGiHfDFlWKwu0%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit
  2020-11-03 13:58 ` [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit Christian König
@ 2020-11-12  6:26   ` Chauhan, Madhav
  0 siblings, 0 replies; 12+ messages in thread
From: Chauhan, Madhav @ 2020-11-12  6:26 UTC (permalink / raw)
  To: Christian König, amd-gfx

[AMD Public Use]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Tuesday, November 3, 2020 7:29 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit

Return early in case of a ratelimit and don't print leading zeros for the address.

Looks fine to me: Reviewed-by: Madhav Chauhan <madhav.chauhan@amd.com>

Regards,
Madhav

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 38 ++++++++++++++------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d9399324be47..cffc3ca8fcde 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -95,6 +95,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 				       struct amdgpu_iv_entry *entry)  {
 	struct amdgpu_vmhub *hub = &adev->vmhub[entry->vmid_src];
+	struct amdgpu_task_info task_info;
 	uint32_t status = 0;
 	u64 addr;
 
@@ -115,24 +116,25 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
 		WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
 	}
 
-	if (printk_ratelimit()) {
-		struct amdgpu_task_info task_info;
-
-		memset(&task_info, 0, sizeof(struct amdgpu_task_info));
-		amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
-
-		dev_err(adev->dev,
-			"[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, "
-			"for process %s pid %d thread %s pid %d)\n",
-			entry->vmid_src ? "mmhub" : "gfxhub",
-			entry->src_id, entry->ring_id, entry->vmid,
-			entry->pasid, task_info.process_name, task_info.tgid,
-			task_info.task_name, task_info.pid);
-		dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
-			addr, entry->client_id);
-		if (!amdgpu_sriov_vf(adev))
-			hub->vmhub_funcs->print_l2_protection_fault_status(adev, status);
-	}
+	if (!printk_ratelimit())
+		return 0;
+
+	memset(&task_info, 0, sizeof(struct amdgpu_task_info));
+	amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
+
+	dev_err(adev->dev,
+		"[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, "
+		"for process %s pid %d thread %s pid %d)\n",
+		entry->vmid_src ? "mmhub" : "gfxhub",
+		entry->src_id, entry->ring_id, entry->vmid,
+		entry->pasid, task_info.process_name, task_info.tgid,
+		task_info.task_name, task_info.pid);
+	dev_err(adev->dev, "  in page starting at address 0x%012llx from client %d\n",
+		addr, entry->client_id);
+
+	if (!amdgpu_sriov_vf(adev))
+		hub->vmhub_funcs->print_l2_protection_fault_status(adev,
+								   status);
 
 	return 0;
 }
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cmadhav.chauhan%40amd.com%7Cf1eeafb1173c41100e6e08d880009473%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637400087287148200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NWf8Sa9ygSAhuVWismqhKqSguIRBTTc3AGcEi7DFgTA%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-12  6:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 13:58 [PATCH 1/8] drm/amdgpu: fix off by one in amdgpu_vm_handle_fault Christian König
2020-11-03 13:58 ` [PATCH 2/8] drm/amdgpu: drop leading zeros from the gmc9 fault address Christian König
2020-11-12  6:18   ` Chauhan, Madhav
2020-11-03 13:58 ` [PATCH 3/8] drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit Christian König
2020-11-12  6:26   ` Chauhan, Madhav
2020-11-03 13:58 ` [PATCH 4/8] drm/amdgpu: add infrastructure for soft IH ring Christian König
2020-11-03 16:00   ` Felix Kuehling
2020-11-03 16:05     ` Christian König
2020-11-03 13:58 ` [PATCH 5/8] drm/amdgpu: enabled software IH ring for Vega Christian König
2020-11-03 13:58 ` [PATCH 6/8] drm/amdgpu: make sure retry faults are handled in a work item on Vega Christian König
2020-11-03 13:58 ` [PATCH 7/8] drm/amdgpu: enabled software IH ring for Navi Christian König
2020-11-03 13:58 ` [PATCH 8/8] drm/amdgpu: implement retry fault handling " Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).