All of lore.kernel.org
 help / color / mirror / Atom feed
* Init timing changes for SRIOV VF
@ 2017-10-23 10:03 Pixel Ding
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, Bingley.Li-5C7GfCeVMHo

This is the second patch series merged or reimplemented from SRIOV
branch. It changes the init time consuming.

Exclusive mode means that a VF occupies hardware and other VFs need
to wait until this VF releases exclusive mode. The timing of exclusive
mode is limited to avoid starvation causing unavailable service. If
driver don't finish exclusive mode init in time, the VF will be reset
and driver init fails.

The target of this series is to decrease the time consuming in
exclusive mode as much as possible, and retry init if it fails due
to timeout.

Please help reviewing.

[PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init
[PATCH 2/7] drm/amdgpu: add init_log param to control logs in
[PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes
[PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing
[PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
[PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
[PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-2-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 2/7] drm/amdgpu: add init_log param to control logs in exclusive mode Pixel Ding
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, Pixel Ding, Bingley.Li-5C7GfCeVMHo

The subsequent operations don't need exclusive accessing hardware.

Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 99acf29..286ba3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1716,6 +1716,11 @@ static int amdgpu_init(struct amdgpu_device *adev)
 		adev->ip_blocks[i].status.hw = true;
 	}
 
+	if (amdgpu_sriov_vf(adev)) {
+		DRM_INFO("rel_init\n");
+		amdgpu_virt_release_full_gpu(adev, true);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 4a9f749..f2eb7ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -171,9 +171,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 		pm_runtime_put_autosuspend(dev->dev);
 	}
 
-	if (amdgpu_sriov_vf(adev))
-		amdgpu_virt_release_full_gpu(adev, true);
-
 out:
 	if (r) {
 		/* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] drm/amdgpu: add init_log param to control logs in exclusive mode
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-3-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes Pixel Ding
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

When this VF stays in exclusive mode for long, other VFs will be
impacted.

The redundant messages causes exclusive mode timeout when they're
redirected. That is a normal use case for cloud service to redirect
guest log to virtual serial port.

Introduce init_log param to control logs during exclusive mode. The
default behavior is not changed. Exclusive time decreases 200ms if log
direction is enabled with this change.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 13 +++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  8 ++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 20 +++++++++-----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c      |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 10 +++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  8 ++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  6 +++---
 drivers/gpu/drm/amd/amdgpu/atom.c             |  2 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c         |  6 +++---
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c         |  6 +++---
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c         |  6 +++---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |  6 +++---
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c        |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c        |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  2 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c           |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c         |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c         |  4 ++--
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c |  3 ++-
 31 files changed, 82 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 774edc1..f08bb9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -126,6 +126,7 @@ extern int amdgpu_param_buf_per_se;
 extern int amdgpu_job_hang_limit;
 extern int amdgpu_lbpw;
 extern int amdgpu_compute_multipipe;
+extern int amdgpu_init_log;
 
 #ifdef CONFIG_DRM_AMDGPU_SI
 extern int amdgpu_si_support;
@@ -134,6 +135,18 @@ extern int amdgpu_si_support;
 extern int amdgpu_cik_support;
 #endif
 
+#define INIT_INFO(fmt, ...)					\
+	do {							\
+		if (amdgpu_init_log)				\
+			DRM_INFO(fmt, ##__VA_ARGS__);		\
+	} while (0)						\
+
+#define INIT_DEV_INFO(dev, fmt, ...)				\
+	do {							\
+		if (amdgpu_init_log)				\
+			dev_info(dev, fmt, ##__VA_ARGS__);	\
+	} while (0)						\
+
 #define AMDGPU_DEFAULT_GTT_SIZE_MB		3072ULL /* 3GB by default */
 #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS	        3000
 #define AMDGPU_MAX_USEC_TIMEOUT			100000	/* 100 ms */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index f66d33e..5ff786a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -690,12 +690,12 @@ int amdgpu_atombios_get_clock_info(struct amdgpu_device *adev)
 			le32_to_cpu(firmware_info->info_21.ulDefaultDispEngineClkFreq);
 		/* set a reasonable default for DP */
 		if (adev->clock.default_dispclk < 53900) {
-			DRM_INFO("Changing default dispclk from %dMhz to 600Mhz\n",
-				 adev->clock.default_dispclk / 100);
+			INIT_INFO("Changing default dispclk from %dMhz to 600Mhz\n",
+				  adev->clock.default_dispclk / 100);
 			adev->clock.default_dispclk = 60000;
 		} else if (adev->clock.default_dispclk <= 60000) {
-			DRM_INFO("Changing default dispclk from %dMhz to 625Mhz\n",
-				 adev->clock.default_dispclk / 100);
+			INIT_INFO("Changing default dispclk from %dMhz to 625Mhz\n",
+				  adev->clock.default_dispclk / 100);
 			adev->clock.default_dispclk = 62500;
 		}
 		adev->clock.dp_extclk =
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 286ba3c..3458d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -613,9 +613,9 @@ void amdgpu_vram_location(struct amdgpu_device *adev, struct amdgpu_mc *mc, u64
 	mc->vram_end = mc->vram_start + mc->mc_vram_size - 1;
 	if (limit && limit < mc->real_vram_size)
 		mc->real_vram_size = limit;
-	dev_info(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX (%lluM used)\n",
-			mc->mc_vram_size >> 20, mc->vram_start,
-			mc->vram_end, mc->real_vram_size >> 20);
+	INIT_DEV_INFO(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX (%lluM used)\n",
+		      mc->mc_vram_size >> 20, mc->vram_start,
+		      mc->vram_end, mc->real_vram_size >> 20);
 }
 
 /**
@@ -650,8 +650,8 @@ void amdgpu_gart_location(struct amdgpu_device *adev, struct amdgpu_mc *mc)
 		mc->gart_start = mc->vram_end + 1;
 	}
 	mc->gart_end = mc->gart_start + mc->gart_size - 1;
-	dev_info(adev->dev, "GTT: %lluM 0x%016llX - 0x%016llX\n",
-			mc->gart_size >> 20, mc->gart_start, mc->gart_end);
+	INIT_DEV_INFO(adev->dev, "GTT: %lluM 0x%016llX - 0x%016llX\n",
+		      mc->gart_size >> 20, mc->gart_start, mc->gart_end);
 }
 
 /*
@@ -1029,7 +1029,7 @@ static int amdgpu_atombios_init(struct amdgpu_device *adev)
 		atom_card_info->ioreg_read = cail_ioreg_read;
 		atom_card_info->ioreg_write = cail_ioreg_write;
 	} else {
-		DRM_INFO("PCI I/O BAR is not found. Using MMIO to access ATOM BIOS\n");
+		INIT_INFO("PCI I/O BAR is not found. Using MMIO to access ATOM BIOS\n");
 		atom_card_info->ioreg_read = cail_reg_read;
 		atom_card_info->ioreg_write = cail_reg_write;
 	}
@@ -1716,10 +1716,8 @@ static int amdgpu_init(struct amdgpu_device *adev)
 		adev->ip_blocks[i].status.hw = true;
 	}
 
-	if (amdgpu_sriov_vf(adev)) {
-		DRM_INFO("rel_init\n");
+	if (amdgpu_sriov_vf(adev))
 		amdgpu_virt_release_full_gpu(adev, true);
-	}
 
 	return 0;
 }
@@ -2264,14 +2262,14 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 			r = -EINVAL;
 			goto failed;
 		}
-		DRM_INFO("GPU posting now...\n");
+		INIT_INFO("GPU posting now...\n");
 		r = amdgpu_atom_asic_init(adev->mode_info.atom_context);
 		if (r) {
 			dev_err(adev->dev, "gpu post error!\n");
 			goto failed;
 		}
 	} else {
-		DRM_INFO("GPU post is not needed\n");
+		INIT_INFO("GPU post is not needed\n");
 	}
 
 	if (adev->is_atom_fw) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index c2f414f..6230adc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -128,6 +128,7 @@ int amdgpu_param_buf_per_se = 0;
 int amdgpu_job_hang_limit = 0;
 int amdgpu_lbpw = -1;
 int amdgpu_compute_multipipe = -1;
+int amdgpu_init_log = 1;
 
 MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -280,6 +281,9 @@ module_param_named(lbpw, amdgpu_lbpw, int, 0444);
 MODULE_PARM_DESC(compute_multipipe, "Force compute queues to be spread across pipes (1 = enable, 0 = disable, -1 = auto)");
 module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444);
 
+MODULE_PARM_DESC(init_log, "log output during initialization (1 = enable, 0 = disable)");
+module_param_named(init_log, amdgpu_init_log, int, 0444);
+
 #ifdef CONFIG_DRM_AMDGPU_SI
 
 #if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index fb9f88ef..6c4c50f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -390,9 +390,9 @@ int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
 	ring->fence_drv.irq_type = irq_type;
 	ring->fence_drv.initialized = true;
 
-	dev_info(adev->dev, "fence driver on ring %d use gpu addr 0x%016llx, "
-		 "cpu addr 0x%p\n", ring->idx,
-		 ring->fence_drv.gpu_addr, ring->fence_drv.cpu_addr);
+	INIT_DEV_INFO(adev->dev, "fence driver on ring %d use gpu addr 0x%016llx, "
+		      "cpu addr 0x%p\n", ring->idx,
+		      ring->fence_drv.gpu_addr, ring->fence_drv.cpu_addr);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index f437008..7ce8105 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -370,8 +370,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
 	/* Compute table size */
 	adev->gart.num_cpu_pages = adev->mc.gart_size / PAGE_SIZE;
 	adev->gart.num_gpu_pages = adev->mc.gart_size / AMDGPU_GPU_PAGE_SIZE;
-	DRM_INFO("GART: num cpu pages %u, num gpu pages %u\n",
-		 adev->gart.num_cpu_pages, adev->gart.num_gpu_pages);
+	INIT_INFO("GART: num cpu pages %u, num gpu pages %u\n",
+		  adev->gart.num_cpu_pages, adev->gart.num_gpu_pages);
 
 #ifdef CONFIG_DRM_AMDGPU_GART_DEBUGFS
 	/* Allocate pages table */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 47c5ce9..c2d8255 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -232,7 +232,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 		int ret = pci_enable_msi(adev->pdev);
 		if (!ret) {
 			adev->irq.msi_enabled = true;
-			dev_info(adev->dev, "amdgpu: using MSI.\n");
+			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
 		}
 	}
 
@@ -262,7 +262,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 		return r;
 	}
 
-	DRM_INFO("amdgpu: irq initialized.\n");
+	INIT_INFO("amdgpu: irq initialized.\n");
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8b4ed8a..d86805a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -778,11 +778,11 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
 	/* Add an MTRR for the VRAM */
 	adev->mc.vram_mtrr = arch_phys_wc_add(adev->mc.aper_base,
 					      adev->mc.aper_size);
-	DRM_INFO("Detected VRAM RAM=%lluM, BAR=%lluM\n",
-		adev->mc.mc_vram_size >> 20,
-		(unsigned long long)adev->mc.aper_size >> 20);
-	DRM_INFO("RAM width %dbits %s\n",
-		 adev->mc.vram_width, amdgpu_vram_names[adev->mc.vram_type]);
+	INIT_INFO("Detected VRAM RAM=%lluM, BAR=%lluM\n",
+		  adev->mc.mc_vram_size >> 20,
+		  (unsigned long long)adev->mc.aper_size >> 20);
+	INIT_INFO("RAM width %dbits %s\n",
+		  adev->mc.vram_width, amdgpu_vram_names[adev->mc.vram_type]);
 	return amdgpu_ttm_init(adev);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index dcdfb8d..95a50c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1348,8 +1348,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 				    NULL, NULL);
 	if (r)
 		return r;
-	DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
-		 (unsigned) (adev->mc.real_vram_size / (1024 * 1024)));
+	INIT_INFO("amdgpu: %uM of VRAM memory ready\n",
+		  (unsigned) (adev->mc.real_vram_size / (1024 * 1024)));
 
 	if (amdgpu_gtt_size == -1)
 		gtt_size = max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
@@ -1361,8 +1361,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 		DRM_ERROR("Failed initializing GTT heap.\n");
 		return r;
 	}
-	DRM_INFO("amdgpu: %uM of GTT memory ready.\n",
-		 (unsigned)(gtt_size / (1024 * 1024)));
+	INIT_INFO("amdgpu: %uM of GTT memory ready.\n",
+		  (unsigned)(gtt_size / (1024 * 1024)));
 
 	adev->gds.mem.total_size = adev->gds.mem.total_size << AMDGPU_GDS_SHIFT;
 	adev->gds.mem.gfx_partition_size = adev->gds.mem.gfx_partition_size << AMDGPU_GDS_SHIFT;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index b46280c..940f666 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -954,7 +954,7 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 041e012..5ca7d4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -261,7 +261,7 @@ int amdgpu_vcn_dec_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
@@ -500,7 +500,7 @@ int amdgpu_vcn_enc_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ef8b7a9..908779b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2594,9 +2594,9 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint64_t vm_size,
 
 	amdgpu_vm_set_fragment_size(adev, fragment_size_default);
 
-	DRM_INFO("vm size is %llu GB, block size is %u-bit, fragment size is %u-bit\n",
-		adev->vm_manager.vm_size, adev->vm_manager.block_size,
-		adev->vm_manager.fragment_size);
+	INIT_INFO("vm size is %llu GB, block size is %u-bit, fragment size is %u-bit\n",
+		  adev->vm_manager.vm_size, adev->vm_manager.block_size,
+		  adev->vm_manager.fragment_size);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c
index 69500a8..bfb308e 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -1344,7 +1344,7 @@ struct atom_context *amdgpu_atom_parse(struct card_info *card, void *bios)
 
 	str = CSTR(idx);
 	if (*str != '\0') {
-		pr_info("ATOM BIOS: %s\n", str);
+		INIT_INFO("ATOM BIOS: %s\n", str);
 		strlcpy(ctx->vbios_version, str, sizeof(ctx->vbios_version));
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 60cecd1..18268c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -657,7 +657,7 @@ static int cik_sdma_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
 			  ring->idx, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index dbbe986..a4771a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1798,7 +1798,7 @@ static int gfx_v6_0_ring_test_ring(struct amdgpu_ring *ring)
 		DRM_UDELAY(1);
 	}
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (scratch(0x%04X)=0x%08X)\n",
 			  ring->idx, scratch, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 0086876..7de5d68 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2069,7 +2069,7 @@ static int gfx_v7_0_ring_test_ring(struct amdgpu_ring *ring)
 		DRM_UDELAY(1);
 	}
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (scratch(0x%04X)=0x%08X)\n",
 			  ring->idx, scratch, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b8002ac..fd38cb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -804,7 +804,7 @@ static int gfx_v8_0_ring_test_ring(struct amdgpu_ring *ring)
 		DRM_UDELAY(1);
 	}
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (scratch(0x%04X)=0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8738b13..50124ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -318,7 +318,7 @@ static int gfx_v9_0_ring_test_ring(struct amdgpu_ring *ring)
 		DRM_UDELAY(1);
 	}
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (scratch(0x%04X)=0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index f4603a7..3b5f6bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -560,9 +560,9 @@ static int gmc_v6_0_gart_enable(struct amdgpu_device *adev)
 		gmc_v6_0_set_fault_enable_default(adev, true);
 
 	gmc_v6_0_gart_flush_gpu_tlb(adev, 0);
-	dev_info(adev->dev, "PCIE GART of %uM enabled (table at 0x%016llX).\n",
-		 (unsigned)(adev->mc.gart_size >> 20),
-		 (unsigned long long)adev->gart.table_addr);
+	INIT_DEV_INFO(adev->dev, "PCIE GART of %uM enabled (table at 0x%016llX).\n",
+		      (unsigned)(adev->mc.gart_size >> 20),
+		      (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index b0528ca..23bf504 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -674,9 +674,9 @@ static int gmc_v7_0_gart_enable(struct amdgpu_device *adev)
 	}
 
 	gmc_v7_0_gart_flush_gpu_tlb(adev, 0);
-	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
-		 (unsigned)(adev->mc.gart_size >> 20),
-		 (unsigned long long)adev->gart.table_addr);
+	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
+		  (unsigned)(adev->mc.gart_size >> 20),
+		  (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index f368cfe..84ae01b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -890,9 +890,9 @@ static int gmc_v8_0_gart_enable(struct amdgpu_device *adev)
 		gmc_v8_0_set_fault_enable_default(adev, true);
 
 	gmc_v8_0_gart_flush_gpu_tlb(adev, 0);
-	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
-		 (unsigned)(adev->mc.gart_size >> 20),
-		 (unsigned long long)adev->gart.table_addr);
+	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
+		  (unsigned)(adev->mc.gart_size >> 20),
+		  (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 6216993..3dac031 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -756,9 +756,9 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
 	mmhub_v1_0_set_fault_enable_default(adev, value);
 	gmc_v9_0_gart_flush_gpu_tlb(adev, 0);
 
-	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
-		 (unsigned)(adev->mc.gart_size >> 20),
-		 (unsigned long long)adev->gart.table_addr);
+	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
+		  (unsigned)(adev->mc.gart_size >> 20),
+		  (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 67f375b..8fd1451 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -633,7 +633,7 @@ static int sdma_v2_4_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
 			  ring->idx, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 6d06f8e..540b42a 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -893,7 +893,7 @@ static int sdma_v3_0_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
 			  ring->idx, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 46009db..bf8e06a 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -919,7 +919,7 @@ static int sdma_v4_0_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
 			  ring->idx, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index 3fa2fbf..c356338 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -252,7 +252,7 @@ static int si_dma_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
+		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
 			  ring->idx, tmp);
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
index 8ab0f78..c31f4d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
@@ -521,7 +521,7 @@ static int uvd_v4_2_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
index bb6d46e..a62804b 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
@@ -536,7 +536,7 @@ static int uvd_v5_0_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 71299c6..1f8093a 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -184,7 +184,7 @@ static int uvd_v6_0_enc_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed\n",
@@ -1010,7 +1010,7 @@ static int uvd_v6_0_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
index b8ed8fa..3d73542 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
@@ -184,7 +184,7 @@ static int uvd_v7_0_enc_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed\n",
@@ -1198,7 +1198,7 @@ static int uvd_v7_0_ring_test_ring(struct amdgpu_ring *ring)
 	}
 
 	if (i < adev->usec_timeout) {
-		DRM_INFO("ring test on %d succeeded in %d usecs\n",
+		INIT_INFO("ring test on %d succeeded in %d usecs\n",
 			 ring->idx, i);
 	} else {
 		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 6b0cf8e..4de4d63 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -29,6 +29,7 @@
 #include "amd_powerplay.h"
 #include "pp_instance.h"
 #include "power_state.h"
+#include "amdgpu.h"
 
 #define PP_DPM_DISABLED 0xCCCC
 
@@ -119,7 +120,7 @@ static int pp_sw_init(void *handle)
 
 		ret = hwmgr->smumgr_funcs->smu_init(hwmgr);
 
-		pr_info("amdgpu: powerplay sw initialized\n");
+		INIT_INFO("amdgpu: powerplay sw initialized\n");
 	}
 	return ret;
 }
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init Pixel Ding
  2017-10-23 10:03   ` [PATCH 2/7] drm/amdgpu: add init_log param to control logs in exclusive mode Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-4-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing Pixel Ding
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

Normally all waiting get timeout if there's one.
Release the lock and return immediately when timeout happens.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index fd38cb1..61753a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -3842,6 +3842,12 @@ static void gfx_v8_0_wait_for_rlc_serdes(struct amdgpu_device *adev)
 					break;
 				udelay(1);
 			}
+			if (k == adev->usec_timeout) {
+				mutex_unlock(&adev->grbm_idx_mutex);
+				DRM_INFO("Timeout wait for RLC serdes %u,%u\n",
+					 i, j);
+				return;
+			}
 		}
 	}
 	gfx_v8_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 50124ab..a2c67f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1628,6 +1628,12 @@ static void gfx_v9_0_wait_for_rlc_serdes(struct amdgpu_device *adev)
 					break;
 				udelay(1);
 			}
+			if (k == adev->usec_timeout) {
+				mutex_unlock(&adev->grbm_idx_mutex);
+				DRM_INFO("Timeout wait for RLC serdes %u,%u\n",
+					 i, j);
+				return;
+			}
 		}
 	}
 	gfx_v9_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-10-23 10:03   ` [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-5-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function Pixel Ding
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

MMIO space can be blocked on virtualised device. Add this function
to check if MMIO is blocked or not.

Todo: need a reliable method such like communation with hypervisor.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e97f80f..33dac7e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -24,6 +24,11 @@
 #include "amdgpu.h"
 #define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
 
+bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)
+{
+	return RREG32_NO_KIQ(0xc040) == 0xffffffff;
+}
+
 int amdgpu_allocate_static_csa(struct amdgpu_device *adev)
 {
 	int r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b89d37f..81efb9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -276,6 +276,7 @@ static inline bool is_virtual_machine(void)
 }
 
 struct amdgpu_vm;
+bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev);
 int amdgpu_allocate_static_csa(struct amdgpu_device *adev);
 int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			  struct amdgpu_bo_va **bo_va);
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-10-23 10:03   ` [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-6-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops Pixel Ding
  2017-10-23 10:03   ` [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout Pixel Ding
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

After calling pci_disable_msi() and pci_enable_msi(), VF can't
receive interrupt anymore. This may introduce problems in module
reloading or retrying init.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index c2d8255..a3314b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	adev->irq.msi_enabled = false;
 
 	if (amdgpu_msi_ok(adev)) {
-		int ret = pci_enable_msi(adev->pdev);
-		if (!ret) {
+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
 			adev->irq.msi_enabled = true;
 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
 		}
@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
 	if (adev->irq.installed) {
 		drm_irq_uninstall(adev->ddev);
 		adev->irq.installed = false;
-		if (adev->irq.msi_enabled)
+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
 			pci_disable_msi(adev->pdev);
 		flush_work(&adev->hotplug_work);
 		cancel_work_sync(&adev->reset_work);
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-10-23 10:03   ` [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-7-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 10:03   ` [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout Pixel Ding
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

Driver can use this interface to check if there's a function level
reset done in hypervisor.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    |  1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c    |  6 ++++++
 4 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 33dac7e..6a4a901 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device *adev)
 }
 
 /**
+ * amdgpu_virt_wait_reset() - wait for reset gpu completed
+ * @amdgpu:	amdgpu device.
+ * Wait for GPU reset completed.
+ * Return: Zero if reset success, otherwise will return error.
+ */
+int amdgpu_virt_wait_reset(struct amdgpu_device *adev)
+{
+	struct amdgpu_virt *virt = &adev->virt;
+
+	if (!virt->ops || !virt->ops->wait_reset)
+		return -EINVAL;
+
+	return virt->ops->wait_reset(adev);
+}
+
+/**
  * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
  * @amdgpu:	amdgpu device.
  * MM table is used by UVD and VCE for its initialization
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index 81efb9d..d149aca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
 	int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
 	int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
 	int (*reset_gpu)(struct amdgpu_device *adev);
+	int (*wait_reset)(struct amdgpu_device *adev);
 	void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 data2, u32 data3);
 };
 
@@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);
 int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
 int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
 int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
+int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
 int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job);
 int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);
 void amdgpu_virt_free_mm_table(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index b4906d2..f91aab3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
 	.req_full_gpu	= xgpu_ai_request_full_gpu_access,
 	.rel_full_gpu	= xgpu_ai_release_full_gpu_access,
 	.reset_gpu = xgpu_ai_request_reset,
+	.wait_reset = NULL,
 	.trans_msg = xgpu_ai_mailbox_trans_msg,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
index c25a831..27b03c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device *adev)
 	return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);
 }
 
+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev)
+{
+	return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL);
+}
+
 static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
 					   bool init)
 {
@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
 	.req_full_gpu		= xgpu_vi_request_full_gpu_access,
 	.rel_full_gpu		= xgpu_vi_release_full_gpu_access,
 	.reset_gpu		= xgpu_vi_request_reset,
+	.wait_reset             = xgpu_vi_wait_reset_cmpl,
 	.trans_msg		= NULL, /* Does not need to trans VF errors to host. */
 };
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout
       [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-10-23 10:03   ` [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops Pixel Ding
@ 2017-10-23 10:03   ` Pixel Ding
       [not found]     ` <1508753012-2196-8-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  6 siblings, 1 reply; 28+ messages in thread
From: Pixel Ding @ 2017-10-23 10:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Gary.Sun-5C7GfCeVMHo, pding, Bingley.Li-5C7GfCeVMHo

From: pding <Pixel.Ding@amd.com>

The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.

If we find the init fails due to exclusive mode timeout, try it again.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 15 +++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3458d46..1935f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2306,6 +2306,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
 	r = amdgpu_init(adev);
 	if (r) {
+		/* failed in exclusive mode due to timeout */
+		if (amdgpu_sriov_vf(adev) &&
+		    !amdgpu_sriov_runtime(adev) &&
+		    amdgpu_virt_mmio_blocked(adev) &&
+		    !amdgpu_virt_wait_reset(adev)) {
+			dev_err(adev->dev, "VF exclusive mode timeout\n");
+			r = -EAGAIN;
+			goto failed;
+		}
 		dev_err(adev->dev, "amdgpu_init failed\n");
 		amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 0);
 		amdgpu_fini(adev);
@@ -2393,6 +2402,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	amdgpu_vf_error_trans_all(adev);
 	if (runtime)
 		vga_switcheroo_fini_domain_pm_ops(adev->dev);
+
 	return r;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index f2eb7ac..fdc240a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 {
 	struct amdgpu_device *adev;
-	int r, acpi_status;
+	int r, acpi_status, retry = 0;
 
 #ifdef CONFIG_DRM_AMDGPU_SI
 	if (!amdgpu_si_support) {
@@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 		}
 	}
 #endif
+retry_init:
 
 	adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
 	if (adev == NULL) {
@@ -144,7 +145,17 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 	 * VRAM allocation
 	 */
 	r = amdgpu_device_init(adev, dev, dev->pdev, flags);
-	if (r) {
+	if (++retry != 3 && r == -EAGAIN) {
+		adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
+		adev->virt.ops = NULL;
+		amdgpu_device_fini(adev);
+		kfree(adev);
+		dev->dev_private = NULL;
+		msleep(5000);
+		dev_err(&dev->pdev->dev, "retry init %d\n", retry);
+		amdgpu_init_log = 0;
+		goto retry_init;
+	} else if (r) {
 		dev_err(&dev->pdev->dev, "Fatal error during GPU init\n");
 		goto out;
 	}
-- 
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]     ` <1508753012-2196-6-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 10:57       ` Liu, Monk
       [not found]         ` <BLUPR12MB04494B5ECC666B264207A24A84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Liu, Monk @ 2017-10-23 10:57 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
Sent: 2017年10月23日 18:04
To: amd-gfx@lists.freedesktop.org
Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

From: pding <Pixel.Ding@amd.com>

After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index c2d8255..a3314b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	adev->irq.msi_enabled = false;
 
 	if (amdgpu_msi_ok(adev)) {
-		int ret = pci_enable_msi(adev->pdev);
-		if (!ret) {
+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
 			adev->irq.msi_enabled = true;
 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
 		}
@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
 	if (adev->irq.installed) {
 		drm_irq_uninstall(adev->ddev);
 		adev->irq.installed = false;
-		if (adev->irq.msi_enabled)
+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
 			pci_disable_msi(adev->pdev);
 		flush_work(&adev->hotplug_work);
 		cancel_work_sync(&adev->reset_work);
--
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
       [not found]     ` <1508753012-2196-7-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 11:01       ` Liu, Monk
       [not found]         ` <BLUPR12MB0449ECFA70529779A41D856D84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2017-10-23 16:28       ` Deucher, Alexander
  1 sibling, 1 reply; 28+ messages in thread
From: Liu, Monk @ 2017-10-23 11:01 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

I don't see this is a necessary patch, driver already have the implement to check if VF FLR is completed or not, see "xgpu_ai/vi_mailbox_flr_work()"

Driver won't do gpu reset until this function received the NOTIFICATION_CMPL message

Do you have any particular reason to add this wait_reset ? if so please send out the patch that use this interface 

BR Monk

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
Sent: 2017年10月23日 18:04
To: amd-gfx@lists.freedesktop.org
Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
Subject: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops

From: pding <Pixel.Ding@amd.com>

Driver can use this interface to check if there's a function level reset done in hypervisor.

Signed-off-by: pding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16 ++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    |  1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c    |  6 ++++++
 4 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 33dac7e..6a4a901 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device *adev)  }
 
 /**
+ * amdgpu_virt_wait_reset() - wait for reset gpu completed
+ * @amdgpu:	amdgpu device.
+ * Wait for GPU reset completed.
+ * Return: Zero if reset success, otherwise will return error.
+ */
+int amdgpu_virt_wait_reset(struct amdgpu_device *adev) {
+	struct amdgpu_virt *virt = &adev->virt;
+
+	if (!virt->ops || !virt->ops->wait_reset)
+		return -EINVAL;
+
+	return virt->ops->wait_reset(adev);
+}
+
+/**
  * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
  * @amdgpu:	amdgpu device.
  * MM table is used by UVD and VCE for its initialization diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index 81efb9d..d149aca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
 	int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
 	int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
 	int (*reset_gpu)(struct amdgpu_device *adev);
+	int (*wait_reset)(struct amdgpu_device *adev);
 	void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 data2, u32 data3);  };
 
@@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
+int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
 int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job);  int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);  void amdgpu_virt_free_mm_table(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index b4906d2..f91aab3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
 	.req_full_gpu	= xgpu_ai_request_full_gpu_access,
 	.rel_full_gpu	= xgpu_ai_release_full_gpu_access,
 	.reset_gpu = xgpu_ai_request_reset,
+	.wait_reset = NULL,
 	.trans_msg = xgpu_ai_mailbox_trans_msg,  }; diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
index c25a831..27b03c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device *adev)
 	return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);  }
 
+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev) {
+	return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL); }
+
 static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
 					   bool init)
 {
@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
 	.req_full_gpu		= xgpu_vi_request_full_gpu_access,
 	.rel_full_gpu		= xgpu_vi_release_full_gpu_access,
 	.reset_gpu		= xgpu_vi_request_reset,
+	.wait_reset             = xgpu_vi_wait_reset_cmpl,
 	.trans_msg		= NULL, /* Does not need to trans VF errors to host. */
 };
--
2.9.5

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout
       [not found]     ` <1508753012-2196-8-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 15:44       ` Andres Rodriguez
       [not found]         ` <04e77adc-085b-6bea-d08d-73a0980bcdc4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Andres Rodriguez @ 2017-10-23 15:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017-10-23 06:03 AM, Pixel Ding wrote:
> From: pding <Pixel.Ding@amd.com>
> 
> The exclusive mode has real-time limitation in reality, such like being
> done in 300ms. It's easy observed if running many VF/VMs in single host
> with heavy CPU workload.
> 
> If we find the init fails due to exclusive mode timeout, try it again.
> 
> Signed-off-by: pding <Pixel.Ding@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 15 +++++++++++++--
>   2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 3458d46..1935f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2306,6 +2306,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   
>   	r = amdgpu_init(adev);
>   	if (r) {
> +		/* failed in exclusive mode due to timeout */
> +		if (amdgpu_sriov_vf(adev) &&
> +		    !amdgpu_sriov_runtime(adev) &&
> +		    amdgpu_virt_mmio_blocked(adev) &&
> +		    !amdgpu_virt_wait_reset(adev)) {
> +			dev_err(adev->dev, "VF exclusive mode timeout\n");
> +			r = -EAGAIN;
> +			goto failed;
> +		}
>   		dev_err(adev->dev, "amdgpu_init failed\n");
>   		amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 0);
>   		amdgpu_fini(adev);
> @@ -2393,6 +2402,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	amdgpu_vf_error_trans_all(adev);
>   	if (runtime)
>   		vga_switcheroo_fini_domain_pm_ops(adev->dev);
> +
>   	return r;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index f2eb7ac..fdc240a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>   int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>   {
>   	struct amdgpu_device *adev;
> -	int r, acpi_status;
> +	int r, acpi_status, retry = 0;
>   
>   #ifdef CONFIG_DRM_AMDGPU_SI
>   	if (!amdgpu_si_support) {
> @@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>   		}
>   	}
>   #endif
> +retry_init:
>   
>   	adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>   	if (adev == NULL) {
> @@ -144,7 +145,17 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>   	 * VRAM allocation
>   	 */
>   	r = amdgpu_device_init(adev, dev, dev->pdev, flags);
> -	if (r) {
> +	if (++retry != 3 && r == -EAGAIN) {

Minor nitpick here. Might want to rewrite the condition so that it 
evaluates to false for most values of retry (currently it evaluates to 
false only for one value of retry).

E.g. if (++retry >= 3 ...)

Or

int retry = 3;
...
if (--retry >= 0 ...)

> +		adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
> +		adev->virt.ops = NULL;
> +		amdgpu_device_fini(adev);
> +		kfree(adev);
> +		dev->dev_private = NULL;
> +		msleep(5000);
> +		dev_err(&dev->pdev->dev, "retry init %d\n", retry);
> +		amdgpu_init_log = 0;
> +		goto retry_init;
> +	} else if (r) {
>   		dev_err(&dev->pdev->dev, "Fatal error during GPU init\n");
>   		goto out;
>   	}
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes
       [not found]     ` <1508753012-2196-4-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 16:22       ` Deucher, Alexander
  0 siblings, 0 replies; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-23 16:22 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley



> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Pixel Ding
> Sent: Monday, October 23, 2017 6:03 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> Subject: [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC
> serdes
> 
> From: pding <Pixel.Ding@amd.com>
> 
> Normally all waiting get timeout if there's one.
> Release the lock and return immediately when timeout happens.
> 
> Signed-off-by: pding <Pixel.Ding@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index fd38cb1..61753a3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -3842,6 +3842,12 @@ static void gfx_v8_0_wait_for_rlc_serdes(struct
> amdgpu_device *adev)
>  					break;
>  				udelay(1);
>  			}
> +			if (k == adev->usec_timeout) {

Please reset the se_sh to broadcast here as well.
gfx_v8_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);

> +				mutex_unlock(&adev->grbm_idx_mutex);
> +				DRM_INFO("Timeout wait for RLC serdes
> %u,%u\n",
> +					 i, j);
> +				return;
> +			}
>  		}
>  	}
>  	gfx_v8_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 50124ab..a2c67f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -1628,6 +1628,12 @@ static void gfx_v9_0_wait_for_rlc_serdes(struct
> amdgpu_device *adev)
>  					break;
>  				udelay(1);
>  			}
> +			if (k == adev->usec_timeout) {

Same here.

> +				mutex_unlock(&adev->grbm_idx_mutex);
> +				DRM_INFO("Timeout wait for RLC serdes
> %u,%u\n",
> +					 i, j);
> +				return;
> +			}
>  		}
>  	}
>  	gfx_v9_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);
> --
> 2.9.5
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init
       [not found]     ` <1508753012-2196-2-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 16:23       ` Deucher, Alexander
       [not found]         ` <BN6PR12MB16528E993E985FB6B21F7022F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-23 16:23 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Pixel Ding
> Sent: Monday, October 23, 2017 6:03 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> Subject: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after
> hw_init
> 
> The subsequent operations don't need exclusive accessing hardware.
> 
> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>

Acked-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 3 ---
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 99acf29..286ba3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1716,6 +1716,11 @@ static int amdgpu_init(struct amdgpu_device
> *adev)
>  		adev->ip_blocks[i].status.hw = true;
>  	}
> 
> +	if (amdgpu_sriov_vf(adev)) {
> +		DRM_INFO("rel_init\n");
> +		amdgpu_virt_release_full_gpu(adev, true);
> +	}
> +
>  	return 0;
>  }
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 4a9f749..f2eb7ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -171,9 +171,6 @@ int amdgpu_driver_load_kms(struct drm_device
> *dev, unsigned long flags)
>  		pm_runtime_put_autosuspend(dev->dev);
>  	}
> 
> -	if (amdgpu_sriov_vf(adev))
> -		amdgpu_virt_release_full_gpu(adev, true);
> -
>  out:
>  	if (r) {
>  		/* balance pm_runtime_get_sync in
> amdgpu_driver_unload_kms */
> --
> 2.9.5
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
       [not found]     ` <1508753012-2196-7-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2017-10-23 11:01       ` Liu, Monk
@ 2017-10-23 16:28       ` Deucher, Alexander
  1 sibling, 0 replies; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-23 16:28 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Pixel Ding
> Sent: Monday, October 23, 2017 6:04 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> Subject: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
> 
> From: pding <Pixel.Ding@amd.com>
> 
> Driver can use this interface to check if there's a function level
> reset done in hypervisor.
> 

I'd suggest splitting this patch in two, one to add the new callback, and one to add the implementation for VI.

Alex

> Signed-off-by: pding <Pixel.Ding@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
>  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    |  1 +
>  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c    |  6 ++++++
>  4 files changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 33dac7e..6a4a901 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device
> *adev)
>  }
> 
>  /**
> + * amdgpu_virt_wait_reset() - wait for reset gpu completed
> + * @amdgpu:	amdgpu device.
> + * Wait for GPU reset completed.
> + * Return: Zero if reset success, otherwise will return error.
> + */
> +int amdgpu_virt_wait_reset(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_virt *virt = &adev->virt;
> +
> +	if (!virt->ops || !virt->ops->wait_reset)
> +		return -EINVAL;
> +
> +	return virt->ops->wait_reset(adev);
> +}
> +
> +/**
>   * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
>   * @amdgpu:	amdgpu device.
>   * MM table is used by UVD and VCE for its initialization
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index 81efb9d..d149aca 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
>  	int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
>  	int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
>  	int (*reset_gpu)(struct amdgpu_device *adev);
> +	int (*wait_reset)(struct amdgpu_device *adev);
>  	void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1,
> u32 data2, u32 data3);
>  };
> 
> @@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
> *adev, uint32_t reg, uint32_t v);
>  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
>  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
>  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
> +int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
>  int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct
> amdgpu_job *job);
>  int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);
>  void amdgpu_virt_free_mm_table(struct amdgpu_device *adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> index b4906d2..f91aab3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> @@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
>  	.req_full_gpu	= xgpu_ai_request_full_gpu_access,
>  	.rel_full_gpu	= xgpu_ai_release_full_gpu_access,
>  	.reset_gpu = xgpu_ai_request_reset,
> +	.wait_reset = NULL,
>  	.trans_msg = xgpu_ai_mailbox_trans_msg,
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
> index c25a831..27b03c7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
> @@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct
> amdgpu_device *adev)
>  	return xgpu_vi_send_access_requests(adev,
> IDH_REQ_GPU_RESET_ACCESS);
>  }
> 
> +static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev)
> +{
> +	return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL);
> +}
> +
>  static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
>  					   bool init)
>  {
> @@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>  	.req_full_gpu		= xgpu_vi_request_full_gpu_access,
>  	.rel_full_gpu		= xgpu_vi_release_full_gpu_access,
>  	.reset_gpu		= xgpu_vi_request_reset,
> +	.wait_reset             = xgpu_vi_wait_reset_cmpl,
>  	.trans_msg		= NULL, /* Does not need to trans VF errors
> to host. */
>  };
> --
> 2.9.5
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing
       [not found]     ` <1508753012-2196-5-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 16:33       ` Deucher, Alexander
       [not found]         ` <BN6PR12MB165261DE3E8A751BD8250C77F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-23 16:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Pixel Ding
> Sent: Monday, October 23, 2017 6:03 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> Subject: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO
> accessing
> 
> From: pding <Pixel.Ding@amd.com>
> 
> MMIO space can be blocked on virtualised device. Add this function
> to check if MMIO is blocked or not.
> 
> Todo: need a reliable method such like communation with hypervisor.
> 
> Signed-off-by: pding <Pixel.Ding@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index e97f80f..33dac7e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -24,6 +24,11 @@
>  #include "amdgpu.h"
>  #define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
> 
> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)
> +{
> +	return RREG32_NO_KIQ(0xc040) == 0xffffffff;
> +}

Is this safe?  Won't accessing non-instanced registers cause a problem?  Probably also worth commenting that register this is and a note for future asics in case the register map changes.

Alex

> +
>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev)
>  {
>  	int r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index b89d37f..81efb9d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@ -276,6 +276,7 @@ static inline bool is_virtual_machine(void)
>  }
> 
>  struct amdgpu_vm;
> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev);
>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev);
>  int amdgpu_map_static_csa(struct amdgpu_device *adev, struct
> amdgpu_vm *vm,
>  			  struct amdgpu_bo_va **bo_va);
> --
> 2.9.5
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 2/7] drm/amdgpu: add init_log param to control logs in exclusive mode
       [not found]     ` <1508753012-2196-3-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-23 16:34       ` Deucher, Alexander
  0 siblings, 0 replies; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-23 16:34 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Ding, Pixel, Li, Bingley

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Pixel Ding
> Sent: Monday, October 23, 2017 6:03 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> Subject: [PATCH 2/7] drm/amdgpu: add init_log param to control logs in
> exclusive mode
> 
> From: pding <Pixel.Ding@amd.com>
> 
> When this VF stays in exclusive mode for long, other VFs will be
> impacted.
> 
> The redundant messages causes exclusive mode timeout when they're
> redirected. That is a normal use case for cloud service to redirect
> guest log to virtual serial port.
> 
> Introduce init_log param to control logs during exclusive mode. The
> default behavior is not changed. Exclusive time decreases 200ms if log
> direction is enabled with this change.

Let's not add another module parameter.  It's getting to be a mess.  I'd prefer to just make some of these debug only or bare metal only.  I think a few others can be dropped.  See additional comments below.

> 
> Signed-off-by: pding <Pixel.Ding@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 13 +++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  8 ++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 20 +++++++++----------
> -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  4 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c      |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c       |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 10 +++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  8 ++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/atom.c             |  2 +-
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c         |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c         |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c         |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c        |  2 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c        |  2 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  2 +-
>  drivers/gpu/drm/amd/amdgpu/si_dma.c           |  2 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c         |  2 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c         |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c         |  4 ++--
>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c |  3 ++-
>  31 files changed, 82 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 774edc1..f08bb9c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -126,6 +126,7 @@ extern int amdgpu_param_buf_per_se;
>  extern int amdgpu_job_hang_limit;
>  extern int amdgpu_lbpw;
>  extern int amdgpu_compute_multipipe;
> +extern int amdgpu_init_log;
> 
>  #ifdef CONFIG_DRM_AMDGPU_SI
>  extern int amdgpu_si_support;
> @@ -134,6 +135,18 @@ extern int amdgpu_si_support;
>  extern int amdgpu_cik_support;
>  #endif
> 
> +#define INIT_INFO(fmt, ...)					\
> +	do {							\
> +		if (amdgpu_init_log)				\
> +			DRM_INFO(fmt, ##__VA_ARGS__);		\
> +	} while (0)						\
> +
> +#define INIT_DEV_INFO(dev, fmt, ...)				\
> +	do {							\
> +		if (amdgpu_init_log)				\
> +			dev_info(dev, fmt, ##__VA_ARGS__);	\
> +	} while (0)						\
> +

Rather than using a module parameter here, just disable it for SR-IOV.

>  #define AMDGPU_DEFAULT_GTT_SIZE_MB		3072ULL /* 3GB by
> default */
>  #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS	        3000
>  #define AMDGPU_MAX_USEC_TIMEOUT			100000	/* 100
> ms */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> index f66d33e..5ff786a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> @@ -690,12 +690,12 @@ int amdgpu_atombios_get_clock_info(struct
> amdgpu_device *adev)
>  			le32_to_cpu(firmware_info-
> >info_21.ulDefaultDispEngineClkFreq);
>  		/* set a reasonable default for DP */
>  		if (adev->clock.default_dispclk < 53900) {
> -			DRM_INFO("Changing default dispclk from %dMhz to
> 600Mhz\n",
> -				 adev->clock.default_dispclk / 100);
> +			INIT_INFO("Changing default dispclk from %dMhz to
> 600Mhz\n",
> +				  adev->clock.default_dispclk / 100);

You can use DRM_DEBUG here.

>  			adev->clock.default_dispclk = 60000;
>  		} else if (adev->clock.default_dispclk <= 60000) {
> -			DRM_INFO("Changing default dispclk from %dMhz to
> 625Mhz\n",
> -				 adev->clock.default_dispclk / 100);
> +			INIT_INFO("Changing default dispclk from %dMhz to
> 625Mhz\n",
> +				  adev->clock.default_dispclk / 100);
>  			adev->clock.default_dispclk = 62500;
>  		}

Same here.

>  		adev->clock.dp_extclk =
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 286ba3c..3458d46 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -613,9 +613,9 @@ void amdgpu_vram_location(struct amdgpu_device
> *adev, struct amdgpu_mc *mc, u64
>  	mc->vram_end = mc->vram_start + mc->mc_vram_size - 1;
>  	if (limit && limit < mc->real_vram_size)
>  		mc->real_vram_size = limit;
> -	dev_info(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX (%lluM
> used)\n",
> -			mc->mc_vram_size >> 20, mc->vram_start,
> -			mc->vram_end, mc->real_vram_size >> 20);
> +	INIT_DEV_INFO(adev->dev, "VRAM: %lluM 0x%016llX - 0x%016llX
> (%lluM used)\n",
> +		      mc->mc_vram_size >> 20, mc->vram_start,
> +		      mc->vram_end, mc->real_vram_size >> 20);

This one is useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  }
> 
>  /**
> @@ -650,8 +650,8 @@ void amdgpu_gart_location(struct amdgpu_device
> *adev, struct amdgpu_mc *mc)
>  		mc->gart_start = mc->vram_end + 1;
>  	}
>  	mc->gart_end = mc->gart_start + mc->gart_size - 1;
> -	dev_info(adev->dev, "GTT: %lluM 0x%016llX - 0x%016llX\n",
> -			mc->gart_size >> 20, mc->gart_start, mc->gart_end);
> +	INIT_DEV_INFO(adev->dev, "GTT: %lluM 0x%016llX - 0x%016llX\n",
> +		      mc->gart_size >> 20, mc->gart_start, mc->gart_end);

Same here.

>  }
> 
>  /*
> @@ -1029,7 +1029,7 @@ static int amdgpu_atombios_init(struct
> amdgpu_device *adev)
>  		atom_card_info->ioreg_read = cail_ioreg_read;
>  		atom_card_info->ioreg_write = cail_ioreg_write;
>  	} else {
> -		DRM_INFO("PCI I/O BAR is not found. Using MMIO to access
> ATOM BIOS\n");
> +		INIT_INFO("PCI I/O BAR is not found. Using MMIO to access
> ATOM BIOS\n");

This can be changed to DRM_DEBUG.

>  		atom_card_info->ioreg_read = cail_reg_read;
>  		atom_card_info->ioreg_write = cail_reg_write;
>  	}
> @@ -1716,10 +1716,8 @@ static int amdgpu_init(struct amdgpu_device
> *adev)
>  		adev->ip_blocks[i].status.hw = true;
>  	}
> 
> -	if (amdgpu_sriov_vf(adev)) {
> -		DRM_INFO("rel_init\n");
> +	if (amdgpu_sriov_vf(adev))
>  		amdgpu_virt_release_full_gpu(adev, true);
> -	}
> 
>  	return 0;
>  }
> @@ -2264,14 +2262,14 @@ int amdgpu_device_init(struct amdgpu_device
> *adev,
>  			r = -EINVAL;
>  			goto failed;
>  		}
> -		DRM_INFO("GPU posting now...\n");
> +		INIT_INFO("GPU posting now...\n");

This is useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  		r = amdgpu_atom_asic_init(adev-
> >mode_info.atom_context);
>  		if (r) {
>  			dev_err(adev->dev, "gpu post error!\n");
>  			goto failed;
>  		}
>  	} else {
> -		DRM_INFO("GPU post is not needed\n");
> +		INIT_INFO("GPU post is not needed\n");

This can be dropped.

>  	}
> 
>  	if (adev->is_atom_fw) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index c2f414f..6230adc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -128,6 +128,7 @@ int amdgpu_param_buf_per_se = 0;
>  int amdgpu_job_hang_limit = 0;
>  int amdgpu_lbpw = -1;
>  int amdgpu_compute_multipipe = -1;
> +int amdgpu_init_log = 1;
> 
>  MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in
> megabytes");
>  module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
> @@ -280,6 +281,9 @@ module_param_named(lbpw, amdgpu_lbpw, int,
> 0444);
>  MODULE_PARM_DESC(compute_multipipe, "Force compute queues to be
> spread across pipes (1 = enable, 0 = disable, -1 = auto)");
>  module_param_named(compute_multipipe, amdgpu_compute_multipipe,
> int, 0444);
> 
> +MODULE_PARM_DESC(init_log, "log output during initialization (1 = enable,
> 0 = disable)");
> +module_param_named(init_log, amdgpu_init_log, int, 0444);
> +
>  #ifdef CONFIG_DRM_AMDGPU_SI
> 
>  #if defined(CONFIG_DRM_RADEON) ||
> defined(CONFIG_DRM_RADEON_MODULE)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index fb9f88ef..6c4c50f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -390,9 +390,9 @@ int amdgpu_fence_driver_start_ring(struct
> amdgpu_ring *ring,
>  	ring->fence_drv.irq_type = irq_type;
>  	ring->fence_drv.initialized = true;
> 
> -	dev_info(adev->dev, "fence driver on ring %d use gpu addr
> 0x%016llx, "
> -		 "cpu addr 0x%p\n", ring->idx,
> -		 ring->fence_drv.gpu_addr, ring->fence_drv.cpu_addr);
> +	INIT_DEV_INFO(adev->dev, "fence driver on ring %d use gpu addr
> 0x%016llx, "
> +		      "cpu addr 0x%p\n", ring->idx,
> +		      ring->fence_drv.gpu_addr, ring->fence_drv.cpu_addr);

This could be switched to DRM_DEBUG.

>  	return 0;
>  }
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index f437008..7ce8105 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -370,8 +370,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>  	/* Compute table size */
>  	adev->gart.num_cpu_pages = adev->mc.gart_size / PAGE_SIZE;
>  	adev->gart.num_gpu_pages = adev->mc.gart_size /
> AMDGPU_GPU_PAGE_SIZE;
> -	DRM_INFO("GART: num cpu pages %u, num gpu pages %u\n",
> -		 adev->gart.num_cpu_pages, adev->gart.num_gpu_pages);
> +	INIT_INFO("GART: num cpu pages %u, num gpu pages %u\n",
> +		  adev->gart.num_cpu_pages, adev->gart.num_gpu_pages);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

> 
>  #ifdef CONFIG_DRM_AMDGPU_GART_DEBUGFS
>  	/* Allocate pages table */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 47c5ce9..c2d8255 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -232,7 +232,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  		int ret = pci_enable_msi(adev->pdev);
>  		if (!ret) {
>  			adev->irq.msi_enabled = true;
> -			dev_info(adev->dev, "amdgpu: using MSI.\n");
> +			INIT_DEV_INFO(adev->dev, "amdgpu: using
> MSI.\n");

Make this debug only.

>  		}
>  	}
> 
> @@ -262,7 +262,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  		return r;
>  	}
> 
> -	DRM_INFO("amdgpu: irq initialized.\n");
> +	INIT_INFO("amdgpu: irq initialized.\n");

Use DRM_DEBUG.

>  	return 0;
>  }
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 8b4ed8a..d86805a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -778,11 +778,11 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
>  	/* Add an MTRR for the VRAM */
>  	adev->mc.vram_mtrr = arch_phys_wc_add(adev->mc.aper_base,
>  					      adev->mc.aper_size);
> -	DRM_INFO("Detected VRAM RAM=%lluM, BAR=%lluM\n",
> -		adev->mc.mc_vram_size >> 20,
> -		(unsigned long long)adev->mc.aper_size >> 20);
> -	DRM_INFO("RAM width %dbits %s\n",
> -		 adev->mc.vram_width, amdgpu_vram_names[adev-
> >mc.vram_type]);
> +	INIT_INFO("Detected VRAM RAM=%lluM, BAR=%lluM\n",
> +		  adev->mc.mc_vram_size >> 20,
> +		  (unsigned long long)adev->mc.aper_size >> 20);
> +	INIT_INFO("RAM width %dbits %s\n",
> +		  adev->mc.vram_width, amdgpu_vram_names[adev-
> >mc.vram_type]);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	return amdgpu_ttm_init(adev);
>  }
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index dcdfb8d..95a50c3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1348,8 +1348,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  				    NULL, NULL);
>  	if (r)
>  		return r;
> -	DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
> -		 (unsigned) (adev->mc.real_vram_size / (1024 * 1024)));
> +	INIT_INFO("amdgpu: %uM of VRAM memory ready\n",
> +		  (unsigned) (adev->mc.real_vram_size / (1024 * 1024)));
> 

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	if (amdgpu_gtt_size == -1)
>  		gtt_size = max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
> @@ -1361,8 +1361,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  		DRM_ERROR("Failed initializing GTT heap.\n");
>  		return r;
>  	}
> -	DRM_INFO("amdgpu: %uM of GTT memory ready.\n",
> -		 (unsigned)(gtt_size / (1024 * 1024)));
> +	INIT_INFO("amdgpu: %uM of GTT memory ready.\n",
> +		  (unsigned)(gtt_size / (1024 * 1024)));
> 

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	adev->gds.mem.total_size = adev->gds.mem.total_size <<
> AMDGPU_GDS_SHIFT;
>  	adev->gds.mem.gfx_partition_size = adev-
> >gds.mem.gfx_partition_size << AMDGPU_GDS_SHIFT;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index b46280c..940f666 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -954,7 +954,7 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring
> *ring)
>  	}
> 
>  	if (i < timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 041e012..5ca7d4e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -261,7 +261,7 @@ int amdgpu_vcn_dec_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
> @@ -500,7 +500,7 @@ int amdgpu_vcn_enc_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",

DRM_DEBUG.

>  			 ring->idx, i);
>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index ef8b7a9..908779b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2594,9 +2594,9 @@ void amdgpu_vm_adjust_size(struct
> amdgpu_device *adev, uint64_t vm_size,
> 
>  	amdgpu_vm_set_fragment_size(adev, fragment_size_default);
> 
> -	DRM_INFO("vm size is %llu GB, block size is %u-bit, fragment size is
> %u-bit\n",
> -		adev->vm_manager.vm_size, adev-
> >vm_manager.block_size,
> -		adev->vm_manager.fragment_size);
> +	INIT_INFO("vm size is %llu GB, block size is %u-bit, fragment size is
> %u-bit\n",
> +		  adev->vm_manager.vm_size, adev-
> >vm_manager.block_size,
> +		  adev->vm_manager.fragment_size);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  }
> 
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c
> b/drivers/gpu/drm/amd/amdgpu/atom.c
> index 69500a8..bfb308e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/atom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/atom.c
> @@ -1344,7 +1344,7 @@ struct atom_context *amdgpu_atom_parse(struct
> card_info *card, void *bios)
> 
>  	str = CSTR(idx);
>  	if (*str != '\0') {
> -		pr_info("ATOM BIOS: %s\n", str);
> +		INIT_INFO("ATOM BIOS: %s\n", str);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  		strlcpy(ctx->vbios_version, str, sizeof(ctx->vbios_version));
>  	}
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index 60cecd1..18268c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -657,7 +657,7 @@ static int cik_sdma_ring_test_ring(struct amdgpu_ring
> *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
>  			  ring->idx, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> index dbbe986..a4771a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> @@ -1798,7 +1798,7 @@ static int gfx_v6_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  		DRM_UDELAY(1);
>  	}
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed
> (scratch(0x%04X)=0x%08X)\n",
>  			  ring->idx, scratch, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index 0086876..7de5d68 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -2069,7 +2069,7 @@ static int gfx_v7_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  		DRM_UDELAY(1);
>  	}
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed
> (scratch(0x%04X)=0x%08X)\n",
>  			  ring->idx, scratch, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b8002ac..fd38cb1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -804,7 +804,7 @@ static int gfx_v8_0_ring_test_ring(struct amdgpu_ring
> *ring)
>  		DRM_UDELAY(1);
>  	}
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed
> (scratch(0x%04X)=0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 8738b13..50124ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -318,7 +318,7 @@ static int gfx_v9_0_ring_test_ring(struct amdgpu_ring
> *ring)
>  		DRM_UDELAY(1);
>  	}
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed
> (scratch(0x%04X)=0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index f4603a7..3b5f6bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -560,9 +560,9 @@ static int gmc_v6_0_gart_enable(struct
> amdgpu_device *adev)
>  		gmc_v6_0_set_fault_enable_default(adev, true);
> 
>  	gmc_v6_0_gart_flush_gpu_tlb(adev, 0);
> -	dev_info(adev->dev, "PCIE GART of %uM enabled (table at
> 0x%016llX).\n",
> -		 (unsigned)(adev->mc.gart_size >> 20),
> -		 (unsigned long long)adev->gart.table_addr);
> +	INIT_DEV_INFO(adev->dev, "PCIE GART of %uM enabled (table at
> 0x%016llX).\n",
> +		      (unsigned)(adev->mc.gart_size >> 20),
> +		      (unsigned long long)adev->gart.table_addr);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	adev->gart.ready = true;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index b0528ca..23bf504 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -674,9 +674,9 @@ static int gmc_v7_0_gart_enable(struct
> amdgpu_device *adev)
>  	}
> 
>  	gmc_v7_0_gart_flush_gpu_tlb(adev, 0);
> -	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> -		 (unsigned)(adev->mc.gart_size >> 20),
> -		 (unsigned long long)adev->gart.table_addr);
> +	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> +		  (unsigned)(adev->mc.gart_size >> 20),
> +		  (unsigned long long)adev->gart.table_addr);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	adev->gart.ready = true;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index f368cfe..84ae01b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -890,9 +890,9 @@ static int gmc_v8_0_gart_enable(struct
> amdgpu_device *adev)
>  		gmc_v8_0_set_fault_enable_default(adev, true);
> 
>  	gmc_v8_0_gart_flush_gpu_tlb(adev, 0);
> -	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> -		 (unsigned)(adev->mc.gart_size >> 20),
> -		 (unsigned long long)adev->gart.table_addr);
> +	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> +		  (unsigned)(adev->mc.gart_size >> 20),
> +		  (unsigned long long)adev->gart.table_addr);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	adev->gart.ready = true;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 6216993..3dac031 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -756,9 +756,9 @@ static int gmc_v9_0_gart_enable(struct
> amdgpu_device *adev)
>  	mmhub_v1_0_set_fault_enable_default(adev, value);
>  	gmc_v9_0_gart_flush_gpu_tlb(adev, 0);
> 
> -	DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> -		 (unsigned)(adev->mc.gart_size >> 20),
> -		 (unsigned long long)adev->gart.table_addr);
> +	INIT_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
> +		  (unsigned)(adev->mc.gart_size >> 20),
> +		  (unsigned long long)adev->gart.table_addr);

This is also useful.  I'd suggest either dropping this hunk or making it bare metal only.

>  	adev->gart.ready = true;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index 67f375b..8fd1451 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -633,7 +633,7 @@ static int sdma_v2_4_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
>  			  ring->idx, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 6d06f8e..540b42a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -893,7 +893,7 @@ static int sdma_v3_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
>  			  ring->idx, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 46009db..bf8e06a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -919,7 +919,7 @@ static int sdma_v4_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
>  			  ring->idx, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> index 3fa2fbf..c356338 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> @@ -252,7 +252,7 @@ static int si_dma_ring_test_ring(struct amdgpu_ring
> *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n", ring-
> >idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
>  			  ring->idx, tmp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> index 8ab0f78..c31f4d7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> @@ -521,7 +521,7 @@ static int uvd_v4_2_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> index bb6d46e..a62804b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> @@ -536,7 +536,7 @@ static int uvd_v5_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 71299c6..1f8093a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -184,7 +184,7 @@ static int uvd_v6_0_enc_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed\n",
> @@ -1010,7 +1010,7 @@ static int uvd_v6_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> index b8ed8fa..3d73542 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> @@ -184,7 +184,7 @@ static int uvd_v7_0_enc_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed\n",
> @@ -1198,7 +1198,7 @@ static int uvd_v7_0_ring_test_ring(struct
> amdgpu_ring *ring)
>  	}
> 
>  	if (i < adev->usec_timeout) {
> -		DRM_INFO("ring test on %d succeeded in %d usecs\n",
> +		INIT_INFO("ring test on %d succeeded in %d usecs\n",
>  			 ring->idx, i);

DRM_DEBUG.

>  	} else {
>  		DRM_ERROR("amdgpu: ring %d test failed (0x%08X)\n",
> diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> index 6b0cf8e..4de4d63 100644
> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> @@ -29,6 +29,7 @@
>  #include "amd_powerplay.h"
>  #include "pp_instance.h"
>  #include "power_state.h"
> +#include "amdgpu.h"
> 
>  #define PP_DPM_DISABLED 0xCCCC
> 
> @@ -119,7 +120,7 @@ static int pp_sw_init(void *handle)
> 
>  		ret = hwmgr->smumgr_funcs->smu_init(hwmgr);
> 
> -		pr_info("amdgpu: powerplay sw initialized\n");
> +		INIT_INFO("amdgpu: powerplay sw initialized\n");

Debug.

>  	}
>  	return ret;
>  }
> --
> 2.9.5
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]         ` <BLUPR12MB04494B5ECC666B264207A24A84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-24  1:31           ` Ding, Pixel
       [not found]             ` <2237B231-2EE3-4DDE-9C55-71FF3F589D40-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Ding, Pixel @ 2017-10-24  1:31 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Xiao, Jack
  Cc: Sun, Gary, Li, Bingley

Tested with 5248e3d9, however issue is still reproduced in reinit case.

+Jack,
To bypass MSI enable/disable for reinit, any comment?
— 
Sincerely Yours,
Pixel







On 23/10/2017, 6:57 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify
>
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>From: pding <Pixel.Ding@amd.com>
>
>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.
>
>Signed-off-by: pding <Pixel.Ding@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index c2d8255..a3314b5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> 	adev->irq.msi_enabled = false;
> 
> 	if (amdgpu_msi_ok(adev)) {
>-		int ret = pci_enable_msi(adev->pdev);
>-		if (!ret) {
>+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
> 			adev->irq.msi_enabled = true;
> 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
> 		}
>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
> 	if (adev->irq.installed) {
> 		drm_irq_uninstall(adev->ddev);
> 		adev->irq.installed = false;
>-		if (adev->irq.msi_enabled)
>+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
> 			pci_disable_msi(adev->pdev);
> 		flush_work(&adev->hotplug_work);
> 		cancel_work_sync(&adev->reset_work);
>--
>2.9.5
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
       [not found]         ` <BLUPR12MB0449ECFA70529779A41D856D84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-24  1:33           ` Ding, Pixel
       [not found]             ` <CC2BFABD-8E5A-4E7F-B435-3CA25B1D46C4-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Ding, Pixel @ 2017-10-24  1:33 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Li, Bingley

This is for retry init.

If the driver fails before late_init, the IRQ handler is not registered then. We need to know if the FLR is done at this point. I think maybe we also can leverage this interface to handle some special cases in future. Any concern about it?
— 
Sincerely Yours,
Pixel








On 23/10/2017, 7:01 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>I don't see this is a necessary patch, driver already have the implement to check if VF FLR is completed or not, see "xgpu_ai/vi_mailbox_flr_work()"
>
>Driver won't do gpu reset until this function received the NOTIFICATION_CMPL message
>
>Do you have any particular reason to add this wait_reset ? if so please send out the patch that use this interface 
>
>BR Monk
>
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
>
>From: pding <Pixel.Ding@amd.com>
>
>Driver can use this interface to check if there's a function level reset done in hypervisor.
>
>Signed-off-by: pding <Pixel.Ding@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16 ++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    |  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c    |  6 ++++++
> 4 files changed, 25 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>index 33dac7e..6a4a901 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>@@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device *adev)  }
> 
> /**
>+ * amdgpu_virt_wait_reset() - wait for reset gpu completed
>+ * @amdgpu:	amdgpu device.
>+ * Wait for GPU reset completed.
>+ * Return: Zero if reset success, otherwise will return error.
>+ */
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev) {
>+	struct amdgpu_virt *virt = &adev->virt;
>+
>+	if (!virt->ops || !virt->ops->wait_reset)
>+		return -EINVAL;
>+
>+	return virt->ops->wait_reset(adev);
>+}
>+
>+/**
>  * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
>  * @amdgpu:	amdgpu device.
>  * MM table is used by UVD and VCE for its initialization diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>index 81efb9d..d149aca 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>@@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
> 	int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
> 	int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
> 	int (*reset_gpu)(struct amdgpu_device *adev);
>+	int (*wait_reset)(struct amdgpu_device *adev);
> 	void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 data2, u32 data3);  };
> 
>@@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
> int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job);  int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);  void amdgpu_virt_free_mm_table(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index b4906d2..f91aab3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
> 	.req_full_gpu	= xgpu_ai_request_full_gpu_access,
> 	.rel_full_gpu	= xgpu_ai_release_full_gpu_access,
> 	.reset_gpu = xgpu_ai_request_reset,
>+	.wait_reset = NULL,
> 	.trans_msg = xgpu_ai_mailbox_trans_msg,  }; diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>index c25a831..27b03c7 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device *adev)
> 	return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);  }
> 
>+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev) {
>+	return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL); }
>+
> static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
> 					   bool init)
> {
>@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
> 	.req_full_gpu		= xgpu_vi_request_full_gpu_access,
> 	.rel_full_gpu		= xgpu_vi_release_full_gpu_access,
> 	.reset_gpu		= xgpu_vi_request_reset,
>+	.wait_reset             = xgpu_vi_wait_reset_cmpl,
> 	.trans_msg		= NULL, /* Does not need to trans VF errors to host. */
> };
>--
>2.9.5
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing
       [not found]         ` <BN6PR12MB165261DE3E8A751BD8250C77F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-24  1:42           ` Ding, Pixel
       [not found]             ` <F836DB5B-E4E7-4593-A838-FB852B50510B-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Ding, Pixel @ 2017-10-24  1:42 UTC (permalink / raw)
  To: Deucher, Alexander, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Li, Bingley

It works with current MMIO blocking policy, while all pages are blocked expect the one containing mailbox registers. But actually it’s not a good approach, as commented we should add interaction between host and guest for MMIO blocking check in future.

By now I can add more comments here, is it OK?
— 
Sincerely Yours,
Pixel







On 24/10/2017, 12:33 AM, "Deucher, Alexander" <Alexander.Deucher@amd.com> wrote:

>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>> Of Pixel Ding
>> Sent: Monday, October 23, 2017 6:03 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
>> Subject: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO
>> accessing
>> 
>> From: pding <Pixel.Ding@amd.com>
>> 
>> MMIO space can be blocked on virtualised device. Add this function
>> to check if MMIO is blocked or not.
>> 
>> Todo: need a reliable method such like communation with hypervisor.
>> 
>> Signed-off-by: pding <Pixel.Ding@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 +
>>  2 files changed, 6 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index e97f80f..33dac7e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -24,6 +24,11 @@
>>  #include "amdgpu.h"
>>  #define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
>> 
>> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)
>> +{
>> +	return RREG32_NO_KIQ(0xc040) == 0xffffffff;
>> +}
>
>Is this safe?  Won't accessing non-instanced registers cause a problem?  Probably also worth commenting that register this is and a note for future asics in case the register map changes.
>
>Alex
>
>> +
>>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev)
>>  {
>>  	int r;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> index b89d37f..81efb9d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> @@ -276,6 +276,7 @@ static inline bool is_virtual_machine(void)
>>  }
>> 
>>  struct amdgpu_vm;
>> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev);
>>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev);
>>  int amdgpu_map_static_csa(struct amdgpu_device *adev, struct
>> amdgpu_vm *vm,
>>  			  struct amdgpu_bo_va **bo_va);
>> --
>> 2.9.5
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]             ` <2237B231-2EE3-4DDE-9C55-71FF3F589D40-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-24  2:04               ` Ding, Pixel
  2017-10-24  3:03               ` Liu, Monk
  1 sibling, 0 replies; 28+ messages in thread
From: Ding, Pixel @ 2017-10-24  2:04 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Xiao, Jack
  Cc: Sun, Gary, Li, Bingley

Sorry for the misunderstanding. Will test with similar fix instead of 5248e3d9 directly later.

—
Sincerely Yours,
Pixel







On 24/10/2017, 9:31 AM, "Ding, Pixel" <Pixel.Ding@amd.com> wrote:

>Tested with 5248e3d9, however issue is still reproduced in reinit case.
>
>+Jack,
>To bypass MSI enable/disable for reinit, any comment?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 23/10/2017, 6:57 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:
>
>>Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify
>>
>>-----Original Message-----
>>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>>Sent: 2017年10月23日 18:04
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>>
>>From: pding <Pixel.Ding@amd.com>
>>
>>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.
>>
>>Signed-off-by: pding <Pixel.Ding@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index c2d8255..a3314b5 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>> 	adev->irq.msi_enabled = false;
>> 
>> 	if (amdgpu_msi_ok(adev)) {
>>-		int ret = pci_enable_msi(adev->pdev);
>>-		if (!ret) {
>>+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>> 			adev->irq.msi_enabled = true;
>> 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>> 		}
>>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>> 	if (adev->irq.installed) {
>> 		drm_irq_uninstall(adev->ddev);
>> 		adev->irq.installed = false;
>>-		if (adev->irq.msi_enabled)
>>+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>> 			pci_disable_msi(adev->pdev);
>> 		flush_work(&adev->hotplug_work);
>> 		cancel_work_sync(&adev->reset_work);
>>--
>>2.9.5
>>
>>_______________________________________________
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
       [not found]             ` <CC2BFABD-8E5A-4E7F-B435-3CA25B1D46C4-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-24  3:00               ` Liu, Monk
  0 siblings, 0 replies; 28+ messages in thread
From: Liu, Monk @ 2017-10-24  3:00 UTC (permalink / raw)
  To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Li, Bingley

Okay, I missed that part,

Free to add my RB 

-----Original Message-----
From: Ding, Pixel 
Sent: 2017年10月24日 9:33
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>
Subject: Re: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops

This is for retry init.

If the driver fails before late_init, the IRQ handler is not registered then. We need to know if the FLR is done at this point. I think maybe we also can leverage this interface to handle some special cases in future. Any concern about it?
— 
Sincerely Yours,
Pixel








On 23/10/2017, 7:01 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>I don't see this is a necessary patch, driver already have the implement to check if VF FLR is completed or not, see "xgpu_ai/vi_mailbox_flr_work()"
>
>Driver won't do gpu reset until this function received the NOTIFICATION_CMPL message
>
>Do you have any particular reason to add this wait_reset ? if so please send out the patch that use this interface 
>
>BR Monk
>
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
>
>From: pding <Pixel.Ding@amd.com>
>
>Driver can use this interface to check if there's a function level reset done in hypervisor.
>
>Signed-off-by: pding <Pixel.Ding@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16 ++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    |  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c    |  6 ++++++
> 4 files changed, 25 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>index 33dac7e..6a4a901 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>@@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device *adev)  }
> 
> /**
>+ * amdgpu_virt_wait_reset() - wait for reset gpu completed
>+ * @amdgpu:	amdgpu device.
>+ * Wait for GPU reset completed.
>+ * Return: Zero if reset success, otherwise will return error.
>+ */
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev) {
>+	struct amdgpu_virt *virt = &adev->virt;
>+
>+	if (!virt->ops || !virt->ops->wait_reset)
>+		return -EINVAL;
>+
>+	return virt->ops->wait_reset(adev);
>+}
>+
>+/**
>  * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
>  * @amdgpu:	amdgpu device.
>  * MM table is used by UVD and VCE for its initialization diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>index 81efb9d..d149aca 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>@@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
> 	int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
> 	int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
> 	int (*reset_gpu)(struct amdgpu_device *adev);
>+	int (*wait_reset)(struct amdgpu_device *adev);
> 	void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 data2, u32 data3);  };
> 
>@@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
> int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job);  int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);  void amdgpu_virt_free_mm_table(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index b4906d2..f91aab3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
> 	.req_full_gpu	= xgpu_ai_request_full_gpu_access,
> 	.rel_full_gpu	= xgpu_ai_release_full_gpu_access,
> 	.reset_gpu = xgpu_ai_request_reset,
>+	.wait_reset = NULL,
> 	.trans_msg = xgpu_ai_mailbox_trans_msg,  }; diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>index c25a831..27b03c7 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device *adev)
> 	return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);  }
> 
>+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev) {
>+	return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL); }
>+
> static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
> 					   bool init)
> {
>@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
> 	.req_full_gpu		= xgpu_vi_request_full_gpu_access,
> 	.rel_full_gpu		= xgpu_vi_release_full_gpu_access,
> 	.reset_gpu		= xgpu_vi_request_reset,
>+	.wait_reset             = xgpu_vi_wait_reset_cmpl,
> 	.trans_msg		= NULL, /* Does not need to trans VF errors to host. */
> };
>--
>2.9.5
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]             ` <2237B231-2EE3-4DDE-9C55-71FF3F589D40-5C7GfCeVMHo@public.gmane.org>
  2017-10-24  2:04               ` Ding, Pixel
@ 2017-10-24  3:03               ` Liu, Monk
       [not found]                 ` <SN1PR12MB04627B3E054A9BF1D3408CAA84470-z7L1TMIYDg7VaWpRXmIMygdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Liu, Monk @ 2017-10-24  3:03 UTC (permalink / raw)
  To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Xiao, Jack
  Cc: Sun, Gary, Li, Bingley

Can you try call pci_disable_device once you found init failed, and make sure it is called before re-init 


-----Original Message-----
From: Ding, Pixel 
Sent: 2017年10月24日 9:31
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Xiao, Jack <Jack.Xiao@amd.com>
Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>
Subject: Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

Tested with 5248e3d9, however issue is still reproduced in reinit case.

+Jack,
To bypass MSI enable/disable for reinit, any comment?
— 
Sincerely Yours,
Pixel







On 23/10/2017, 6:57 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify
>
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>From: pding <Pixel.Ding@amd.com>
>
>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.
>
>Signed-off-by: pding <Pixel.Ding@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index c2d8255..a3314b5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> 	adev->irq.msi_enabled = false;
> 
> 	if (amdgpu_msi_ok(adev)) {
>-		int ret = pci_enable_msi(adev->pdev);
>-		if (!ret) {
>+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
> 			adev->irq.msi_enabled = true;
> 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
> 		}
>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
> 	if (adev->irq.installed) {
> 		drm_irq_uninstall(adev->ddev);
> 		adev->irq.installed = false;
>-		if (adev->irq.msi_enabled)
>+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
> 			pci_disable_msi(adev->pdev);
> 		flush_work(&adev->hotplug_work);
> 		cancel_work_sync(&adev->reset_work);
>--
>2.9.5
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]                 ` <SN1PR12MB04627B3E054A9BF1D3408CAA84470-z7L1TMIYDg7VaWpRXmIMygdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-24  7:52                   ` Ding, Pixel
       [not found]                     ` <E41D1A06-AB17-4BF4-B98A-CF0B79ABD814-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Ding, Pixel @ 2017-10-24  7:52 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Xiao, Jack
  Cc: Sun, Gary, Li, Bingley

Tried but still fail. Meanwhile I also think it’s not good to disable device here since we only redo the amdgpu_device_init for exclusive mode timeout.

— 
Sincerely Yours,
Pixel








On 24/10/2017, 11:03 AM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>Can you try call pci_disable_device once you found init failed, and make sure it is called before re-init 
>
>
>-----Original Message-----
>From: Ding, Pixel 
>Sent: 2017年10月24日 9:31
>To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Xiao, Jack <Jack.Xiao@amd.com>
>Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>Tested with 5248e3d9, however issue is still reproduced in reinit case.
>
>+Jack,
>To bypass MSI enable/disable for reinit, any comment?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 23/10/2017, 6:57 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:
>
>>Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify
>>
>>-----Original Message-----
>>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>>Sent: 2017年10月23日 18:04
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>>
>>From: pding <Pixel.Ding@amd.com>
>>
>>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.
>>
>>Signed-off-by: pding <Pixel.Ding@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index c2d8255..a3314b5 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>> 	adev->irq.msi_enabled = false;
>> 
>> 	if (amdgpu_msi_ok(adev)) {
>>-		int ret = pci_enable_msi(adev->pdev);
>>-		if (!ret) {
>>+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>> 			adev->irq.msi_enabled = true;
>> 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>> 		}
>>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>> 	if (adev->irq.installed) {
>> 		drm_irq_uninstall(adev->ddev);
>> 		adev->irq.installed = false;
>>-		if (adev->irq.msi_enabled)
>>+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>> 			pci_disable_msi(adev->pdev);
>> 		flush_work(&adev->hotplug_work);
>> 		cancel_work_sync(&adev->reset_work);
>>--
>>2.9.5
>>
>>_______________________________________________
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
       [not found]                     ` <E41D1A06-AB17-4BF4-B98A-CF0B79ABD814-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-24  7:56                       ` Liu, Monk
  0 siblings, 0 replies; 28+ messages in thread
From: Liu, Monk @ 2017-10-24  7:56 UTC (permalink / raw)
  To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Xiao, Jack
  Cc: Sun, Gary, Li, Bingley

One thing I'm concerned is whether you patch will break the kmd reload functional, but since the latest staging already broken on reload, you won't get chance to
Verify your patch ....

If you have bandwidth you can try applying your patch on our private customer branch ,and see if the driver reloading still works or not.

Ack-by: Monk Liu <monk.liu@amd.com>

BR Monk

-----Original Message-----
From: Ding, Pixel 
Sent: 2017年10月24日 15:52
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Xiao, Jack <Jack.Xiao@amd.com>
Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>
Subject: Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

Tried but still fail. Meanwhile I also think it’s not good to disable device here since we only redo the amdgpu_device_init for exclusive mode timeout.

— 
Sincerely Yours,
Pixel








On 24/10/2017, 11:03 AM, "Liu, Monk" <Monk.Liu@amd.com> wrote:

>Can you try call pci_disable_device once you found init failed, and make sure it is called before re-init 
>
>
>-----Original Message-----
>From: Ding, Pixel 
>Sent: 2017年10月24日 9:31
>To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Xiao, Jack <Jack.Xiao@amd.com>
>Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>Subject: Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>Tested with 5248e3d9, however issue is still reproduced in reinit case.
>
>+Jack,
>To bypass MSI enable/disable for reinit, any comment?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 23/10/2017, 6:57 PM, "Liu, Monk" <Monk.Liu@amd.com> wrote:
>
>>Please check commit "5248e3d9", your issue should already be fixed by that patch, please verify
>>
>>-----Original Message-----
>>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
>>Sent: 2017年10月23日 18:04
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Sun, Gary <Gary.Sun@amd.com>; Ding, Pixel <Pixel.Ding@amd.com>; Li, Bingley <Bingley.Li@amd.com>
>>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>>
>>From: pding <Pixel.Ding@amd.com>
>>
>>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive interrupt anymore. This may introduce problems in module reloading or retrying init.
>>
>>Signed-off-by: pding <Pixel.Ding@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index c2d8255..a3314b5 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>> 	adev->irq.msi_enabled = false;
>> 
>> 	if (amdgpu_msi_ok(adev)) {
>>-		int ret = pci_enable_msi(adev->pdev);
>>-		if (!ret) {
>>+		if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>> 			adev->irq.msi_enabled = true;
>> 			INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>> 		}
>>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>> 	if (adev->irq.installed) {
>> 		drm_irq_uninstall(adev->ddev);
>> 		adev->irq.installed = false;
>>-		if (adev->irq.msi_enabled)
>>+		if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>> 			pci_disable_msi(adev->pdev);
>> 		flush_work(&adev->hotplug_work);
>> 		cancel_work_sync(&adev->reset_work);
>>--
>>2.9.5
>>
>>_______________________________________________
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing
       [not found]             ` <F836DB5B-E4E7-4593-A838-FB852B50510B-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-24 20:42               ` Deucher, Alexander
  0 siblings, 0 replies; 28+ messages in thread
From: Deucher, Alexander @ 2017-10-24 20:42 UTC (permalink / raw)
  To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Sun, Gary, Li, Bingley

> -----Original Message-----
> From: Ding, Pixel
> Sent: Monday, October 23, 2017 9:43 PM
> To: Deucher, Alexander; amd-gfx@lists.freedesktop.org
> Cc: Sun, Gary; Li, Bingley
> Subject: Re: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO
> accessing
> 
> It works with current MMIO blocking policy, while all pages are blocked
> expect the one containing mailbox registers. But actually it’s not a good
> approach, as commented we should add interaction between host and guest
> for MMIO blocking check in future.
> 
> By now I can add more comments here, is it OK?

Please add a comment that mentions what register this is.  With that fixed the patch is:
Acked-by: Alex Deucher <alexander.deucher@amd.com>

> —
> Sincerely Yours,
> Pixel
> 
> 
> 
> 
> 
> 
> 
> On 24/10/2017, 12:33 AM, "Deucher, Alexander"
> <Alexander.Deucher@amd.com> wrote:
> 
> >> -----Original Message-----
> >> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On
> Behalf
> >> Of Pixel Ding
> >> Sent: Monday, October 23, 2017 6:03 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
> >> Subject: [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO
> >> accessing
> >>
> >> From: pding <Pixel.Ding@amd.com>
> >>
> >> MMIO space can be blocked on virtualised device. Add this function
> >> to check if MMIO is blocked or not.
> >>
> >> Todo: need a reliable method such like communation with hypervisor.
> >>
> >> Signed-off-by: pding <Pixel.Ding@amd.com>
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++++
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 +
> >>  2 files changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >> index e97f80f..33dac7e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >> @@ -24,6 +24,11 @@
> >>  #include "amdgpu.h"
> >>  #define MAX_KIQ_REG_WAIT	100000000 /* in usecs */
> >>
> >> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)
> >> +{
> >> +	return RREG32_NO_KIQ(0xc040) == 0xffffffff;
> >> +}
> >
> >Is this safe?  Won't accessing non-instanced registers cause a problem?
> Probably also worth commenting that register this is and a note for future
> asics in case the register map changes.
> >
> >Alex
> >
> >> +
> >>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev)
> >>  {
> >>  	int r;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >> index b89d37f..81efb9d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >> @@ -276,6 +276,7 @@ static inline bool is_virtual_machine(void)
> >>  }
> >>
> >>  struct amdgpu_vm;
> >> +bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev);
> >>  int amdgpu_allocate_static_csa(struct amdgpu_device *adev);
> >>  int amdgpu_map_static_csa(struct amdgpu_device *adev, struct
> >> amdgpu_vm *vm,
> >>  			  struct amdgpu_bo_va **bo_va);
> >> --
> >> 2.9.5
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init
       [not found]         ` <BN6PR12MB16528E993E985FB6B21F7022F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-25  1:11           ` Ding, Pixel
       [not found]             ` <C93A21B6-251A-47B5-BC30-C199564D46CA-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Ding, Pixel @ 2017-10-25  1:11 UTC (permalink / raw)
  To: Deucher, Alexander, Liu, Monk, Wang, Daniel(Xiaowei)
  Cc: Sun, Gary, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Li, Bingley

Hi all,

I notice that this change will break KFD init on VF. KFD also load HQD which needs to be done in exclusive mode, otherwise KIQ register accessing causes LOAD VF failed at that time. However if we release EX mode after KFD init, the retry init logic will be quite complicated to handle all failures, also the EX mode time consuming is too much. Any idea? 

By now I would drop this change upstream.


— 
Sincerely Yours,
Pixel








On 24/10/2017, 12:23 AM, "Deucher, Alexander" <Alexander.Deucher@amd.com> wrote:

>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>> Of Pixel Ding
>> Sent: Monday, October 23, 2017 6:03 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
>> Subject: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after
>> hw_init
>> 
>> The subsequent operations don't need exclusive accessing hardware.
>> 
>> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
>
>Acked-by: Alex Deucher <alexander.deucher@amd.com>
>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 3 ---
>>  2 files changed, 5 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 99acf29..286ba3c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,11 @@ static int amdgpu_init(struct amdgpu_device
>> *adev)
>>  		adev->ip_blocks[i].status.hw = true;
>>  	}
>> 
>> +	if (amdgpu_sriov_vf(adev)) {
>> +		DRM_INFO("rel_init\n");
>> +		amdgpu_virt_release_full_gpu(adev, true);
>> +	}
>> +
>>  	return 0;
>>  }
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 4a9f749..f2eb7ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -171,9 +171,6 @@ int amdgpu_driver_load_kms(struct drm_device
>> *dev, unsigned long flags)
>>  		pm_runtime_put_autosuspend(dev->dev);
>>  	}
>> 
>> -	if (amdgpu_sriov_vf(adev))
>> -		amdgpu_virt_release_full_gpu(adev, true);
>> -
>>  out:
>>  	if (r) {
>>  		/* balance pm_runtime_get_sync in
>> amdgpu_driver_unload_kms */
>> --
>> 2.9.5
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout
       [not found]         ` <04e77adc-085b-6bea-d08d-73a0980bcdc4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-10-25  1:38           ` Ding, Pixel
  0 siblings, 0 replies; 28+ messages in thread
From: Ding, Pixel @ 2017-10-25  1:38 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Thanks Andres, revised in v2.
— 
Sincerely Yours,
Pixel








On 23/10/2017, 11:44 PM, "amd-gfx on behalf of Andres Rodriguez" <amd-gfx-bounces@lists.freedesktop.org on behalf of andresx7@gmail.com> wrote:

>
>
>On 2017-10-23 06:03 AM, Pixel Ding wrote:
>> From: pding <Pixel.Ding@amd.com>
>> 
>> The exclusive mode has real-time limitation in reality, such like being
>> done in 300ms. It's easy observed if running many VF/VMs in single host
>> with heavy CPU workload.
>> 
>> If we find the init fails due to exclusive mode timeout, try it again.
>> 
>> Signed-off-by: pding <Pixel.Ding@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 15 +++++++++++++--
>>   2 files changed, 23 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 3458d46..1935f5a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2306,6 +2306,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>   
>>   	r = amdgpu_init(adev);
>>   	if (r) {
>> +		/* failed in exclusive mode due to timeout */
>> +		if (amdgpu_sriov_vf(adev) &&
>> +		    !amdgpu_sriov_runtime(adev) &&
>> +		    amdgpu_virt_mmio_blocked(adev) &&
>> +		    !amdgpu_virt_wait_reset(adev)) {
>> +			dev_err(adev->dev, "VF exclusive mode timeout\n");
>> +			r = -EAGAIN;
>> +			goto failed;
>> +		}
>>   		dev_err(adev->dev, "amdgpu_init failed\n");
>>   		amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 0);
>>   		amdgpu_fini(adev);
>> @@ -2393,6 +2402,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>   	amdgpu_vf_error_trans_all(adev);
>>   	if (runtime)
>>   		vga_switcheroo_fini_domain_pm_ops(adev->dev);
>> +
>>   	return r;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index f2eb7ac..fdc240a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   {
>>   	struct amdgpu_device *adev;
>> -	int r, acpi_status;
>> +	int r, acpi_status, retry = 0;
>>   
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>>   	if (!amdgpu_si_support) {
>> @@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   		}
>>   	}
>>   #endif
>> +retry_init:
>>   
>>   	adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>   	if (adev == NULL) {
>> @@ -144,7 +145,17 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   	 * VRAM allocation
>>   	 */
>>   	r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>> -	if (r) {
>> +	if (++retry != 3 && r == -EAGAIN) {
>
>Minor nitpick here. Might want to rewrite the condition so that it 
>evaluates to false for most values of retry (currently it evaluates to 
>false only for one value of retry).
>
>E.g. if (++retry >= 3 ...)
>
>Or
>
>int retry = 3;
>...
>if (--retry >= 0 ...)
>
>> +		adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> +		adev->virt.ops = NULL;
>> +		amdgpu_device_fini(adev);
>> +		kfree(adev);
>> +		dev->dev_private = NULL;
>> +		msleep(5000);
>> +		dev_err(&dev->pdev->dev, "retry init %d\n", retry);
>> +		amdgpu_init_log = 0;
>> +		goto retry_init;
>> +	} else if (r) {
>>   		dev_err(&dev->pdev->dev, "Fatal error during GPU init\n");
>>   		goto out;
>>   	}
>> 
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init
       [not found]             ` <C93A21B6-251A-47B5-BC30-C199564D46CA-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-25  1:39               ` Li, Bingley
  0 siblings, 0 replies; 28+ messages in thread
From: Li, Bingley @ 2017-10-25  1:39 UTC (permalink / raw)
  To: Ding, Pixel, Deucher, Alexander, Liu, Monk, Wang, Daniel(Xiaowei),
	Liu, Shaoyun
  Cc: Sun, Gary, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Shaoyun,

Can you comment on this?

Regards,
Bingley

-----Original Message-----
From: Ding, Pixel 
Sent: Wednesday, October 25, 2017 9:11 AM
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Wang, Daniel(Xiaowei) <Daniel.Wang2@amd.com>
Cc: Sun, Gary <Gary.Sun@amd.com>; Li, Bingley <Bingley.Li@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init

Hi all,

I notice that this change will break KFD init on VF. KFD also load HQD which needs to be done in exclusive mode, otherwise KIQ register accessing causes LOAD VF failed at that time. However if we release EX mode after KFD init, the retry init logic will be quite complicated to handle all failures, also the EX mode time consuming is too much. Any idea? 

By now I would drop this change upstream.


— 
Sincerely Yours,
Pixel








On 24/10/2017, 12:23 AM, "Deucher, Alexander" <Alexander.Deucher@amd.com> wrote:

>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>> Of Pixel Ding
>> Sent: Monday, October 23, 2017 6:03 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
>> Subject: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after
>> hw_init
>> 
>> The subsequent operations don't need exclusive accessing hardware.
>> 
>> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
>
>Acked-by: Alex Deucher <alexander.deucher@amd.com>
>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 3 ---
>>  2 files changed, 5 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 99acf29..286ba3c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,11 @@ static int amdgpu_init(struct amdgpu_device
>> *adev)
>>  		adev->ip_blocks[i].status.hw = true;
>>  	}
>> 
>> +	if (amdgpu_sriov_vf(adev)) {
>> +		DRM_INFO("rel_init\n");
>> +		amdgpu_virt_release_full_gpu(adev, true);
>> +	}
>> +
>>  	return 0;
>>  }
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 4a9f749..f2eb7ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -171,9 +171,6 @@ int amdgpu_driver_load_kms(struct drm_device
>> *dev, unsigned long flags)
>>  		pm_runtime_put_autosuspend(dev->dev);
>>  	}
>> 
>> -	if (amdgpu_sriov_vf(adev))
>> -		amdgpu_virt_release_full_gpu(adev, true);
>> -
>>  out:
>>  	if (r) {
>>  		/* balance pm_runtime_get_sync in
>> amdgpu_driver_unload_kms */
>> --
>> 2.9.5
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-10-25  1:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 10:03 Init timing changes for SRIOV VF Pixel Ding
     [not found] ` <1508753012-2196-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 10:03   ` [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init Pixel Ding
     [not found]     ` <1508753012-2196-2-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 16:23       ` Deucher, Alexander
     [not found]         ` <BN6PR12MB16528E993E985FB6B21F7022F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-25  1:11           ` Ding, Pixel
     [not found]             ` <C93A21B6-251A-47B5-BC30-C199564D46CA-5C7GfCeVMHo@public.gmane.org>
2017-10-25  1:39               ` Li, Bingley
2017-10-23 10:03   ` [PATCH 2/7] drm/amdgpu: add init_log param to control logs in exclusive mode Pixel Ding
     [not found]     ` <1508753012-2196-3-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 16:34       ` Deucher, Alexander
2017-10-23 10:03   ` [PATCH 3/7] drm/amdgpu: avoid soft lockup when waiting for RLC serdes Pixel Ding
     [not found]     ` <1508753012-2196-4-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 16:22       ` Deucher, Alexander
2017-10-23 10:03   ` [PATCH 4/7] drm/amdgpu/virt: add function to check MMIO accessing Pixel Ding
     [not found]     ` <1508753012-2196-5-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 16:33       ` Deucher, Alexander
     [not found]         ` <BN6PR12MB165261DE3E8A751BD8250C77F7460-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-24  1:42           ` Ding, Pixel
     [not found]             ` <F836DB5B-E4E7-4593-A838-FB852B50510B-5C7GfCeVMHo@public.gmane.org>
2017-10-24 20:42               ` Deucher, Alexander
2017-10-23 10:03   ` [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function Pixel Ding
     [not found]     ` <1508753012-2196-6-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 10:57       ` Liu, Monk
     [not found]         ` <BLUPR12MB04494B5ECC666B264207A24A84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-24  1:31           ` Ding, Pixel
     [not found]             ` <2237B231-2EE3-4DDE-9C55-71FF3F589D40-5C7GfCeVMHo@public.gmane.org>
2017-10-24  2:04               ` Ding, Pixel
2017-10-24  3:03               ` Liu, Monk
     [not found]                 ` <SN1PR12MB04627B3E054A9BF1D3408CAA84470-z7L1TMIYDg7VaWpRXmIMygdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-24  7:52                   ` Ding, Pixel
     [not found]                     ` <E41D1A06-AB17-4BF4-B98A-CF0B79ABD814-5C7GfCeVMHo@public.gmane.org>
2017-10-24  7:56                       ` Liu, Monk
2017-10-23 10:03   ` [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops Pixel Ding
     [not found]     ` <1508753012-2196-7-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 11:01       ` Liu, Monk
     [not found]         ` <BLUPR12MB0449ECFA70529779A41D856D84460-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-24  1:33           ` Ding, Pixel
     [not found]             ` <CC2BFABD-8E5A-4E7F-B435-3CA25B1D46C4-5C7GfCeVMHo@public.gmane.org>
2017-10-24  3:00               ` Liu, Monk
2017-10-23 16:28       ` Deucher, Alexander
2017-10-23 10:03   ` [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout Pixel Ding
     [not found]     ` <1508753012-2196-8-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2017-10-23 15:44       ` Andres Rodriguez
     [not found]         ` <04e77adc-085b-6bea-d08d-73a0980bcdc4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-25  1:38           ` Ding, Pixel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.