* [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
Hi,
The goal of this patchset is to improve debugging device resets on amdgpu.
The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.
The last patches are a rework to alloc devcoredump dynamically and to move it to
a better source file.
I have dropped the patches that add more information to devcoredump for now,
until I figure out a better way to do so, like storing the IB address in the
fence.
Thanks,
André
[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1
Changelog:
v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset
v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
- Drop "Mark contexts guilty for causing soft recoveries" patch
- Use GFP_NOWAIT for devcoredump allocation
André Almeida (5):
drm/amdgpu: Create a module param to disable soft recovery
drm/amdgpu: Allocate coredump memory in a nonblocking way
drm/amdgpu: Rework coredump to use memory dynamically
drm/amdgpu: Move coredump code to amdgpu_reset file
drm/amdgpu: Create version number for coredumps
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
6 files changed, 111 insertions(+), 70 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
Hi,
The goal of this patchset is to improve debugging device resets on amdgpu.
The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.
The last patches are a rework to alloc devcoredump dynamically and to move it to
a better source file.
I have dropped the patches that add more information to devcoredump for now,
until I figure out a better way to do so, like storing the IB address in the
fence.
Thanks,
André
[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1
Changelog:
v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset
v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
- Drop "Mark contexts guilty for causing soft recoveries" patch
- Use GFP_NOWAIT for devcoredump allocation
André Almeida (5):
drm/amdgpu: Create a module param to disable soft recovery
drm/amdgpu: Allocate coredump memory in a nonblocking way
drm/amdgpu: Rework coredump to use memory dynamically
drm/amdgpu: Move coredump code to amdgpu_reset file
drm/amdgpu: Create version number for coredumps
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
6 files changed, 111 insertions(+), 70 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
Hi,
The goal of this patchset is to improve debugging device resets on amdgpu.
The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.
The last patches are a rework to alloc devcoredump dynamically and to move it to
a better source file.
I have dropped the patches that add more information to devcoredump for now,
until I figure out a better way to do so, like storing the IB address in the
fence.
Thanks,
André
[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1
Changelog:
v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset
v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
- Drop "Mark contexts guilty for causing soft recoveries" patch
- Use GFP_NOWAIT for devcoredump allocation
André Almeida (5):
drm/amdgpu: Create a module param to disable soft recovery
drm/amdgpu: Allocate coredump memory in a nonblocking way
drm/amdgpu: Rework coredump to use memory dynamically
drm/amdgpu: Move coredump code to amdgpu_reset file
drm/amdgpu: Create version number for coredumps
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
6 files changed, 111 insertions(+), 70 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-14 16:11 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
Create a module parameter to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a84bd4a0c421..dbe062a087c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training;
extern int amdgpu_lbpw;
extern int amdgpu_compute_multipipe;
extern int amdgpu_gpu_recovery;
+extern bool amdgpu_soft_recovery;
extern int amdgpu_emu_mode;
extern uint amdgpu_smu_memory_pool_size;
extern int amdgpu_smu_pptable_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3b711babd4e2..7c69f3169aa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -163,6 +163,7 @@ uint amdgpu_force_long_training;
int amdgpu_lbpw = -1;
int amdgpu_compute_multipipe = -1;
int amdgpu_gpu_recovery = -1; /* auto */
+bool amdgpu_soft_recovery = true;
int amdgpu_emu_mode;
uint amdgpu_smu_memory_pool_size;
int amdgpu_smu_pptable_id = -1;
@@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444);
MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)");
module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444);
+/**
+ * DOC: gpu_soft_recovery (bool)
+ * Set true to allow the driver to try soft recoveries if a job get stuck. Set
+ * to false to always force a GPU reset during recovery.
+ */
+MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)");
+module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644);
+
/**
* DOC: emu_mode (int)
* Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled).
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..40678d9fb17e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
struct dma_fence *fence)
{
unsigned long flags;
+ ktime_t deadline;
- ktime_t deadline = ktime_add_us(ktime_get(), 10000);
+ if (!amdgpu_soft_recovery)
+ return false;
+
+ deadline = ktime_add_us(ktime_get(), 10000);
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
return false;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
Create a module parameter to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a84bd4a0c421..dbe062a087c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training;
extern int amdgpu_lbpw;
extern int amdgpu_compute_multipipe;
extern int amdgpu_gpu_recovery;
+extern bool amdgpu_soft_recovery;
extern int amdgpu_emu_mode;
extern uint amdgpu_smu_memory_pool_size;
extern int amdgpu_smu_pptable_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3b711babd4e2..7c69f3169aa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -163,6 +163,7 @@ uint amdgpu_force_long_training;
int amdgpu_lbpw = -1;
int amdgpu_compute_multipipe = -1;
int amdgpu_gpu_recovery = -1; /* auto */
+bool amdgpu_soft_recovery = true;
int amdgpu_emu_mode;
uint amdgpu_smu_memory_pool_size;
int amdgpu_smu_pptable_id = -1;
@@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444);
MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)");
module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444);
+/**
+ * DOC: gpu_soft_recovery (bool)
+ * Set true to allow the driver to try soft recoveries if a job get stuck. Set
+ * to false to always force a GPU reset during recovery.
+ */
+MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)");
+module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644);
+
/**
* DOC: emu_mode (int)
* Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled).
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..40678d9fb17e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
struct dma_fence *fence)
{
unsigned long flags;
+ ktime_t deadline;
- ktime_t deadline = ktime_add_us(ktime_get(), 10000);
+ if (!amdgpu_soft_recovery)
+ return false;
+
+ deadline = ktime_add_us(ktime_get(), 10000);
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
return false;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
Create a module parameter to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a84bd4a0c421..dbe062a087c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training;
extern int amdgpu_lbpw;
extern int amdgpu_compute_multipipe;
extern int amdgpu_gpu_recovery;
+extern bool amdgpu_soft_recovery;
extern int amdgpu_emu_mode;
extern uint amdgpu_smu_memory_pool_size;
extern int amdgpu_smu_pptable_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3b711babd4e2..7c69f3169aa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -163,6 +163,7 @@ uint amdgpu_force_long_training;
int amdgpu_lbpw = -1;
int amdgpu_compute_multipipe = -1;
int amdgpu_gpu_recovery = -1; /* auto */
+bool amdgpu_soft_recovery = true;
int amdgpu_emu_mode;
uint amdgpu_smu_memory_pool_size;
int amdgpu_smu_pptable_id = -1;
@@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444);
MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)");
module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444);
+/**
+ * DOC: gpu_soft_recovery (bool)
+ * Set true to allow the driver to try soft recoveries if a job get stuck. Set
+ * to false to always force a GPU reset during recovery.
+ */
+MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)");
+module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644);
+
/**
* DOC: emu_mode (int)
* Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled).
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..40678d9fb17e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
struct dma_fence *fence)
{
unsigned long flags;
+ ktime_t deadline;
- ktime_t deadline = ktime_add_us(ktime_get(), 10000);
+ if (!amdgpu_soft_recovery)
+ return false;
+
+ deadline = ktime_add_us(ktime_get(), 10000);
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
return false;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-14 16:11 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e25f085ee886..a824f844a984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+ dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e25f085ee886..a824f844a984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+ dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e25f085ee886..a824f844a984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+ dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-14 16:11 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++--------
2 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dbe062a087c5..e1cc83a89d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1068,11 +1068,6 @@ struct amdgpu_device {
uint32_t *reset_dump_reg_list;
uint32_t *reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
- struct amdgpu_task_info reset_task_info;
- bool reset_vram_lost;
- struct timespec64 reset_time;
-#endif
bool scpm_enabled;
uint32_t scpm_status;
@@ -1085,6 +1080,15 @@ struct amdgpu_device {
uint32_t aid_mask;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a824f844a984..e80670420586 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
{
struct drm_printer p;
- struct amdgpu_device *adev = data;
+ struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
@@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec);
- if (adev->reset_task_info.pid)
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
drm_printf(&p, "process_name: %s PID: %d\n",
- adev->reset_task_info.process_name,
- adev->reset_task_info.pid);
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
- if (adev->reset_vram_lost)
+ if (coredump->reset_vram_lost)
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (adev->num_regs) {
+ if (coredump->adev->num_regs) {
drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
- for (i = 0; i < adev->num_regs; i++)
+ for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(&p, "0x%08x: 0x%08x\n",
- adev->reset_dump_reg_list[i],
- adev->reset_dump_reg_value[i]);
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
}
return count - iter.remain;
@@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
static void amdgpu_devcoredump_free(void *data)
{
+ kfree(data);
}
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
{
+ struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
- ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+
+ coredump->adev = adev;
+
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
@@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
goto out;
vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
-#ifdef CONFIG_DEV_COREDUMP
- tmp_adev->reset_vram_lost = vram_lost;
- memset(&tmp_adev->reset_task_info, 0,
- sizeof(tmp_adev->reset_task_info));
- if (reset_context->job && reset_context->job->vm)
- tmp_adev->reset_task_info =
- reset_context->job->vm->task_info;
- amdgpu_reset_capture_coredumpm(tmp_adev);
-#endif
+
+ amdgpu_coredump(tmp_adev, vram_lost, reset_context);
+
if (vram_lost) {
DRM_INFO("VRAM is lost due to GPU reset!\n");
amdgpu_inc_vram_lost(tmp_adev);
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++--------
2 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dbe062a087c5..e1cc83a89d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1068,11 +1068,6 @@ struct amdgpu_device {
uint32_t *reset_dump_reg_list;
uint32_t *reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
- struct amdgpu_task_info reset_task_info;
- bool reset_vram_lost;
- struct timespec64 reset_time;
-#endif
bool scpm_enabled;
uint32_t scpm_status;
@@ -1085,6 +1080,15 @@ struct amdgpu_device {
uint32_t aid_mask;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a824f844a984..e80670420586 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
{
struct drm_printer p;
- struct amdgpu_device *adev = data;
+ struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
@@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec);
- if (adev->reset_task_info.pid)
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
drm_printf(&p, "process_name: %s PID: %d\n",
- adev->reset_task_info.process_name,
- adev->reset_task_info.pid);
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
- if (adev->reset_vram_lost)
+ if (coredump->reset_vram_lost)
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (adev->num_regs) {
+ if (coredump->adev->num_regs) {
drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
- for (i = 0; i < adev->num_regs; i++)
+ for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(&p, "0x%08x: 0x%08x\n",
- adev->reset_dump_reg_list[i],
- adev->reset_dump_reg_value[i]);
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
}
return count - iter.remain;
@@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
static void amdgpu_devcoredump_free(void *data)
{
+ kfree(data);
}
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
{
+ struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
- ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+
+ coredump->adev = adev;
+
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
@@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
goto out;
vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
-#ifdef CONFIG_DEV_COREDUMP
- tmp_adev->reset_vram_lost = vram_lost;
- memset(&tmp_adev->reset_task_info, 0,
- sizeof(tmp_adev->reset_task_info));
- if (reset_context->job && reset_context->job->vm)
- tmp_adev->reset_task_info =
- reset_context->job->vm->task_info;
- amdgpu_reset_capture_coredumpm(tmp_adev);
-#endif
+
+ amdgpu_coredump(tmp_adev, vram_lost, reset_context);
+
if (vram_lost) {
DRM_INFO("VRAM is lost due to GPU reset!\n");
amdgpu_inc_vram_lost(tmp_adev);
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++--------
2 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dbe062a087c5..e1cc83a89d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1068,11 +1068,6 @@ struct amdgpu_device {
uint32_t *reset_dump_reg_list;
uint32_t *reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
- struct amdgpu_task_info reset_task_info;
- bool reset_vram_lost;
- struct timespec64 reset_time;
-#endif
bool scpm_enabled;
uint32_t scpm_status;
@@ -1085,6 +1080,15 @@ struct amdgpu_device {
uint32_t aid_mask;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a824f844a984..e80670420586 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
{
struct drm_printer p;
- struct amdgpu_device *adev = data;
+ struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
@@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec);
- if (adev->reset_task_info.pid)
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
drm_printf(&p, "process_name: %s PID: %d\n",
- adev->reset_task_info.process_name,
- adev->reset_task_info.pid);
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
- if (adev->reset_vram_lost)
+ if (coredump->reset_vram_lost)
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (adev->num_regs) {
+ if (coredump->adev->num_regs) {
drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
- for (i = 0; i < adev->num_regs; i++)
+ for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(&p, "0x%08x: 0x%08x\n",
- adev->reset_dump_reg_list[i],
- adev->reset_dump_reg_value[i]);
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
}
return count - iter.remain;
@@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
static void amdgpu_devcoredump_free(void *data)
{
+ kfree(data);
}
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
{
+ struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
- ktime_get_ts64(&adev->reset_time);
- dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+
+ coredump->adev = adev;
+
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
amdgpu_devcoredump_read, amdgpu_devcoredump_free);
}
#endif
@@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
goto out;
vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
-#ifdef CONFIG_DEV_COREDUMP
- tmp_adev->reset_vram_lost = vram_lost;
- memset(&tmp_adev->reset_task_info, 0,
- sizeof(tmp_adev->reset_task_info));
- if (reset_context->job && reset_context->job->vm)
- tmp_adev->reset_task_info =
- reset_context->job->vm->task_info;
- amdgpu_reset_capture_coredumpm(tmp_adev);
-#endif
+
+ amdgpu_coredump(tmp_adev, vram_lost, reset_context);
+
if (vram_lost) {
DRM_INFO("VRAM is lost due to GPU reset!\n");
amdgpu_inc_vram_lost(tmp_adev);
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-14 16:11 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ----------------------
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++
4 files changed, 89 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e1cc83a89d46..1e76cb38a554 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1080,15 +1080,6 @@ struct amdgpu_device {
uint32_t aid_mask;
};
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
- struct amdgpu_device *adev;
- struct amdgpu_task_info reset_task_info;
- struct timespec64 reset_time;
- bool reset_vram_lost;
-};
-#endif
-
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e80670420586..e84d499aaf4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
#include <linux/slab.h>
#include <linux/iommu.h>
#include <linux/pci.h>
-#include <linux/devcoredump.h>
-#include <generated/utsrelease.h>
#include <linux/pci-p2pdma.h>
#include <linux/apple-gmux.h>
@@ -4963,84 +4961,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
- size_t count, void *data, size_t datalen)
-{
- struct drm_printer p;
- struct amdgpu_coredump_info *coredump = data;
- struct drm_print_iterator iter;
- int i;
-
- iter.data = buffer;
- iter.offset = 0;
- iter.start = offset;
- iter.remain = count;
-
- p = drm_coredump_printer(&iter);
-
- drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
- drm_printf(&p, "kernel: " UTS_RELEASE "\n");
- drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
- if (coredump->reset_task_info.pid)
- drm_printf(&p, "process_name: %s PID: %d\n",
- coredump->reset_task_info.process_name,
- coredump->reset_task_info.pid);
-
- if (coredump->reset_vram_lost)
- drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (coredump->adev->num_regs) {
- drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
-
- for (i = 0; i < coredump->adev->num_regs; i++)
- drm_printf(&p, "0x%08x: 0x%08x\n",
- coredump->adev->reset_dump_reg_list[i],
- coredump->adev->reset_dump_reg_value[i]);
- }
-
- return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
- kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
- struct amdgpu_coredump_info *coredump;
- struct drm_device *dev = adev_to_drm(adev);
-
- coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
-
- if (!coredump) {
- DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
- return;
- }
-
- memset(coredump, 0, sizeof(*coredump));
-
- coredump->reset_vram_lost = vram_lost;
-
- if (reset_context->job && reset_context->job->vm)
- coredump->reset_task_info = reset_context->job->vm->task_info;
-
- coredump->adev = adev;
-
- ktime_get_ts64(&coredump->reset_time);
-
- dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
int amdgpu_do_asic_reset(struct list_head *device_list_handle,
struct amdgpu_reset_context *reset_context)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index eec41ad30406..081cdf3bc267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
*
*/
+#include <linux/devcoredump.h>
+#include <generated/utsrelease.h>
+
#include "amdgpu_reset.h"
#include "aldebaran.h"
#include "sienna_cichlid.h"
@@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain)
up_write(&reset_domain->sem);
}
+#ifndef CONFIG_DEV_COREDUMP
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
+static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
+ size_t count, void *data, size_t datalen)
+{
+ struct drm_printer p;
+ struct amdgpu_coredump_info *coredump = data;
+ struct drm_print_iterator iter;
+ int i;
+
+ iter.data = buffer;
+ iter.offset = 0;
+ iter.start = offset;
+ iter.remain = count;
+
+ p = drm_coredump_printer(&iter);
+
+ drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "kernel: " UTS_RELEASE "\n");
+ drm_printf(&p, "module: " KBUILD_MODNAME "\n");
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
+ drm_printf(&p, "process_name: %s PID: %d\n",
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
+
+ if (coredump->reset_vram_lost)
+ drm_printf(&p, "VRAM is lost due to GPU reset!\n");
+ if (coredump->adev->num_regs) {
+ drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
+
+ for (i = 0; i < coredump->adev->num_regs; i++)
+ drm_printf(&p, "0x%08x: 0x%08x\n",
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
+ }
+
+ return count - iter.remain;
+}
+
+static void amdgpu_devcoredump_free(void *data)
+{
+ kfree(data);
+}
+
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+ struct amdgpu_coredump_info *coredump;
+ struct drm_device *dev = adev_to_drm(adev);
+
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+ coredump->adev = adev;
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
+ amdgpu_devcoredump_read, amdgpu_devcoredump_free);
+}
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index f4a501ff87d9..362954521721 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -87,6 +87,15 @@ struct amdgpu_reset_domain {
atomic_t reset_res;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
int amdgpu_reset_init(struct amdgpu_device *adev);
int amdgpu_reset_fini(struct amdgpu_device *adev);
@@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain);
void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain);
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context);
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ----------------------
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++
4 files changed, 89 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e1cc83a89d46..1e76cb38a554 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1080,15 +1080,6 @@ struct amdgpu_device {
uint32_t aid_mask;
};
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
- struct amdgpu_device *adev;
- struct amdgpu_task_info reset_task_info;
- struct timespec64 reset_time;
- bool reset_vram_lost;
-};
-#endif
-
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e80670420586..e84d499aaf4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
#include <linux/slab.h>
#include <linux/iommu.h>
#include <linux/pci.h>
-#include <linux/devcoredump.h>
-#include <generated/utsrelease.h>
#include <linux/pci-p2pdma.h>
#include <linux/apple-gmux.h>
@@ -4963,84 +4961,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
- size_t count, void *data, size_t datalen)
-{
- struct drm_printer p;
- struct amdgpu_coredump_info *coredump = data;
- struct drm_print_iterator iter;
- int i;
-
- iter.data = buffer;
- iter.offset = 0;
- iter.start = offset;
- iter.remain = count;
-
- p = drm_coredump_printer(&iter);
-
- drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
- drm_printf(&p, "kernel: " UTS_RELEASE "\n");
- drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
- if (coredump->reset_task_info.pid)
- drm_printf(&p, "process_name: %s PID: %d\n",
- coredump->reset_task_info.process_name,
- coredump->reset_task_info.pid);
-
- if (coredump->reset_vram_lost)
- drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (coredump->adev->num_regs) {
- drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
-
- for (i = 0; i < coredump->adev->num_regs; i++)
- drm_printf(&p, "0x%08x: 0x%08x\n",
- coredump->adev->reset_dump_reg_list[i],
- coredump->adev->reset_dump_reg_value[i]);
- }
-
- return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
- kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
- struct amdgpu_coredump_info *coredump;
- struct drm_device *dev = adev_to_drm(adev);
-
- coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
-
- if (!coredump) {
- DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
- return;
- }
-
- memset(coredump, 0, sizeof(*coredump));
-
- coredump->reset_vram_lost = vram_lost;
-
- if (reset_context->job && reset_context->job->vm)
- coredump->reset_task_info = reset_context->job->vm->task_info;
-
- coredump->adev = adev;
-
- ktime_get_ts64(&coredump->reset_time);
-
- dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
int amdgpu_do_asic_reset(struct list_head *device_list_handle,
struct amdgpu_reset_context *reset_context)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index eec41ad30406..081cdf3bc267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
*
*/
+#include <linux/devcoredump.h>
+#include <generated/utsrelease.h>
+
#include "amdgpu_reset.h"
#include "aldebaran.h"
#include "sienna_cichlid.h"
@@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain)
up_write(&reset_domain->sem);
}
+#ifndef CONFIG_DEV_COREDUMP
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
+static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
+ size_t count, void *data, size_t datalen)
+{
+ struct drm_printer p;
+ struct amdgpu_coredump_info *coredump = data;
+ struct drm_print_iterator iter;
+ int i;
+
+ iter.data = buffer;
+ iter.offset = 0;
+ iter.start = offset;
+ iter.remain = count;
+
+ p = drm_coredump_printer(&iter);
+
+ drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "kernel: " UTS_RELEASE "\n");
+ drm_printf(&p, "module: " KBUILD_MODNAME "\n");
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
+ drm_printf(&p, "process_name: %s PID: %d\n",
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
+
+ if (coredump->reset_vram_lost)
+ drm_printf(&p, "VRAM is lost due to GPU reset!\n");
+ if (coredump->adev->num_regs) {
+ drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
+
+ for (i = 0; i < coredump->adev->num_regs; i++)
+ drm_printf(&p, "0x%08x: 0x%08x\n",
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
+ }
+
+ return count - iter.remain;
+}
+
+static void amdgpu_devcoredump_free(void *data)
+{
+ kfree(data);
+}
+
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+ struct amdgpu_coredump_info *coredump;
+ struct drm_device *dev = adev_to_drm(adev);
+
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+ coredump->adev = adev;
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
+ amdgpu_devcoredump_read, amdgpu_devcoredump_free);
+}
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index f4a501ff87d9..362954521721 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -87,6 +87,15 @@ struct amdgpu_reset_domain {
atomic_t reset_res;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
int amdgpu_reset_init(struct amdgpu_device *adev);
int amdgpu_reset_fini(struct amdgpu_device *adev);
@@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain);
void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain);
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context);
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ----------------------
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++
4 files changed, 89 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e1cc83a89d46..1e76cb38a554 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1080,15 +1080,6 @@ struct amdgpu_device {
uint32_t aid_mask;
};
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
- struct amdgpu_device *adev;
- struct amdgpu_task_info reset_task_info;
- struct timespec64 reset_time;
- bool reset_vram_lost;
-};
-#endif
-
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
{
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e80670420586..e84d499aaf4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
#include <linux/slab.h>
#include <linux/iommu.h>
#include <linux/pci.h>
-#include <linux/devcoredump.h>
-#include <generated/utsrelease.h>
#include <linux/pci-p2pdma.h>
#include <linux/apple-gmux.h>
@@ -4963,84 +4961,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
return 0;
}
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
- size_t count, void *data, size_t datalen)
-{
- struct drm_printer p;
- struct amdgpu_coredump_info *coredump = data;
- struct drm_print_iterator iter;
- int i;
-
- iter.data = buffer;
- iter.offset = 0;
- iter.start = offset;
- iter.remain = count;
-
- p = drm_coredump_printer(&iter);
-
- drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
- drm_printf(&p, "kernel: " UTS_RELEASE "\n");
- drm_printf(&p, "module: " KBUILD_MODNAME "\n");
- drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
- if (coredump->reset_task_info.pid)
- drm_printf(&p, "process_name: %s PID: %d\n",
- coredump->reset_task_info.process_name,
- coredump->reset_task_info.pid);
-
- if (coredump->reset_vram_lost)
- drm_printf(&p, "VRAM is lost due to GPU reset!\n");
- if (coredump->adev->num_regs) {
- drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
-
- for (i = 0; i < coredump->adev->num_regs; i++)
- drm_printf(&p, "0x%08x: 0x%08x\n",
- coredump->adev->reset_dump_reg_list[i],
- coredump->adev->reset_dump_reg_value[i]);
- }
-
- return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
- kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
-{
- struct amdgpu_coredump_info *coredump;
- struct drm_device *dev = adev_to_drm(adev);
-
- coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
-
- if (!coredump) {
- DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
- return;
- }
-
- memset(coredump, 0, sizeof(*coredump));
-
- coredump->reset_vram_lost = vram_lost;
-
- if (reset_context->job && reset_context->job->vm)
- coredump->reset_task_info = reset_context->job->vm->task_info;
-
- coredump->adev = adev;
-
- ktime_get_ts64(&coredump->reset_time);
-
- dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
int amdgpu_do_asic_reset(struct list_head *device_list_handle,
struct amdgpu_reset_context *reset_context)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index eec41ad30406..081cdf3bc267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
*
*/
+#include <linux/devcoredump.h>
+#include <generated/utsrelease.h>
+
#include "amdgpu_reset.h"
#include "aldebaran.h"
#include "sienna_cichlid.h"
@@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain)
up_write(&reset_domain->sem);
}
+#ifndef CONFIG_DEV_COREDUMP
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+}
+#else
+static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
+ size_t count, void *data, size_t datalen)
+{
+ struct drm_printer p;
+ struct amdgpu_coredump_info *coredump = data;
+ struct drm_print_iterator iter;
+ int i;
+
+ iter.data = buffer;
+ iter.offset = 0;
+ iter.start = offset;
+ iter.remain = count;
+
+ p = drm_coredump_printer(&iter);
+
+ drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "kernel: " UTS_RELEASE "\n");
+ drm_printf(&p, "module: " KBUILD_MODNAME "\n");
+ drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
+ if (coredump->reset_task_info.pid)
+ drm_printf(&p, "process_name: %s PID: %d\n",
+ coredump->reset_task_info.process_name,
+ coredump->reset_task_info.pid);
+
+ if (coredump->reset_vram_lost)
+ drm_printf(&p, "VRAM is lost due to GPU reset!\n");
+ if (coredump->adev->num_regs) {
+ drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n");
+
+ for (i = 0; i < coredump->adev->num_regs; i++)
+ drm_printf(&p, "0x%08x: 0x%08x\n",
+ coredump->adev->reset_dump_reg_list[i],
+ coredump->adev->reset_dump_reg_value[i]);
+ }
+
+ return count - iter.remain;
+}
+
+static void amdgpu_devcoredump_free(void *data)
+{
+ kfree(data);
+}
+
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context)
+{
+ struct amdgpu_coredump_info *coredump;
+ struct drm_device *dev = adev_to_drm(adev);
+
+ coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT);
+
+ if (!coredump) {
+ DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__);
+ return;
+ }
+
+ memset(coredump, 0, sizeof(*coredump));
+
+ coredump->reset_vram_lost = vram_lost;
+
+ if (reset_context->job && reset_context->job->vm)
+ coredump->reset_task_info = reset_context->job->vm->task_info;
+ coredump->adev = adev;
+ ktime_get_ts64(&coredump->reset_time);
+
+ dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
+ amdgpu_devcoredump_read, amdgpu_devcoredump_free);
+}
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index f4a501ff87d9..362954521721 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -87,6 +87,15 @@ struct amdgpu_reset_domain {
atomic_t reset_res;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
+#endif
+
int amdgpu_reset_init(struct amdgpu_device *adev);
int amdgpu_reset_fini(struct amdgpu_device *adev);
@@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain);
void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain);
+void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+ struct amdgpu_reset_context *reset_context);
#endif
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-14 16:11 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: kernel-dev, alexander.deucher, christian.koenig,
pierre-eric.pelloux-prayer, 'Marek Olšák',
Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
michel.daenzer, André Almeida
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.
Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 081cdf3bc267..dab385f9dd80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
p = drm_coredump_printer(&iter);
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 362954521721..7b6767ca8127 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
};
#ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
struct amdgpu_coredump_info {
struct amdgpu_device *adev;
struct amdgpu_task_info reset_task_info;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
alexander.deucher, christian.koenig
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.
Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 081cdf3bc267..dab385f9dd80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
p = drm_coredump_printer(&iter);
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 362954521721..7b6767ca8127 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
};
#ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
struct amdgpu_coredump_info {
struct amdgpu_device *adev;
struct amdgpu_task_info reset_task_info;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps
@ 2023-07-14 16:11 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-14 16:11 UTC (permalink / raw)
To: dri-devel, amd-gfx, linux-kernel
Cc: pierre-eric.pelloux-prayer, André Almeida,
'Marek Olšák',
Timur Kristóf, michel.daenzer, Samuel Pitoiset, kernel-dev,
Bas Nieuwenhuizen, alexander.deucher, christian.koenig
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.
Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 081cdf3bc267..dab385f9dd80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
p = drm_coredump_printer(&iter);
drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
+ drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 362954521721..7b6767ca8127 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
};
#ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
struct amdgpu_coredump_info {
struct amdgpu_device *adev;
struct amdgpu_task_info reset_task_info;
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
2023-07-14 16:11 ` André Almeida
(?)
@ 2023-07-28 14:45 ` André Almeida
-1 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-28 14:45 UTC (permalink / raw)
To: christian.koenig
Cc: kernel-dev, alexander.deucher, amd-gfx,
pierre-eric.pelloux-prayer, 'Marek Olšák',
linux-kernel, Samuel Pitoiset, Bas Nieuwenhuizen,
Timur Kristóf, michel.daenzer, dri-devel
Hi Christian, gently ping here
Em 14/07/2023 13:11, André Almeida escreveu:
> Hi,
>
> The goal of this patchset is to improve debugging device resets on amdgpu.
>
> The first patch creates a new module parameter to disable soft recoveries,
> ensuring every recovery go through the full device reset, making easier to
> generate resets from userspace tools like [0] and [1]. This is important to
> validate how the stack behaves on resets, from end-to-end.
>
> The last patches are a rework to alloc devcoredump dynamically and to move it to
> a better source file.
>
> I have dropped the patches that add more information to devcoredump for now,
> until I figure out a better way to do so, like storing the IB address in the
> fence.
>
> Thanks,
> André
>
> [0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
> [1] https://github.com/andrealmeid/vulkan-triangle-v1
>
> Changelog:
>
> v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
> - Drop the IB and ring patch
> - Drop patch that limited information from kernel threads
> - Add patch to move coredump to amdgpu_reset
>
> v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
> - Drop "Mark contexts guilty for causing soft recoveries" patch
> - Use GFP_NOWAIT for devcoredump allocation
>
> André Almeida (5):
> drm/amdgpu: Create a module param to disable soft recovery
> drm/amdgpu: Allocate coredump memory in a nonblocking way
> drm/amdgpu: Rework coredump to use memory dynamically
> drm/amdgpu: Move coredump code to amdgpu_reset file
> drm/amdgpu: Create version number for coredumps
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
> 6 files changed, 111 insertions(+), 70 deletions(-)
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-28 14:45 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-28 14:45 UTC (permalink / raw)
To: christian.koenig
Cc: pierre-eric.pelloux-prayer, 'Marek Olšák',
Timur Kristóf, dri-devel, michel.daenzer, linux-kernel,
amd-gfx, Samuel Pitoiset, kernel-dev, alexander.deucher
Hi Christian, gently ping here
Em 14/07/2023 13:11, André Almeida escreveu:
> Hi,
>
> The goal of this patchset is to improve debugging device resets on amdgpu.
>
> The first patch creates a new module parameter to disable soft recoveries,
> ensuring every recovery go through the full device reset, making easier to
> generate resets from userspace tools like [0] and [1]. This is important to
> validate how the stack behaves on resets, from end-to-end.
>
> The last patches are a rework to alloc devcoredump dynamically and to move it to
> a better source file.
>
> I have dropped the patches that add more information to devcoredump for now,
> until I figure out a better way to do so, like storing the IB address in the
> fence.
>
> Thanks,
> André
>
> [0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
> [1] https://github.com/andrealmeid/vulkan-triangle-v1
>
> Changelog:
>
> v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
> - Drop the IB and ring patch
> - Drop patch that limited information from kernel threads
> - Add patch to move coredump to amdgpu_reset
>
> v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
> - Drop "Mark contexts guilty for causing soft recoveries" patch
> - Use GFP_NOWAIT for devcoredump allocation
>
> André Almeida (5):
> drm/amdgpu: Create a module param to disable soft recovery
> drm/amdgpu: Allocate coredump memory in a nonblocking way
> drm/amdgpu: Rework coredump to use memory dynamically
> drm/amdgpu: Move coredump code to amdgpu_reset file
> drm/amdgpu: Create version number for coredumps
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
> 6 files changed, 111 insertions(+), 70 deletions(-)
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-28 14:45 ` André Almeida
0 siblings, 0 replies; 21+ messages in thread
From: André Almeida @ 2023-07-28 14:45 UTC (permalink / raw)
To: christian.koenig
Cc: pierre-eric.pelloux-prayer, 'Marek Olšák',
Timur Kristóf, dri-devel, michel.daenzer, linux-kernel,
amd-gfx, Samuel Pitoiset, kernel-dev, Bas Nieuwenhuizen,
alexander.deucher
Hi Christian, gently ping here
Em 14/07/2023 13:11, André Almeida escreveu:
> Hi,
>
> The goal of this patchset is to improve debugging device resets on amdgpu.
>
> The first patch creates a new module parameter to disable soft recoveries,
> ensuring every recovery go through the full device reset, making easier to
> generate resets from userspace tools like [0] and [1]. This is important to
> validate how the stack behaves on resets, from end-to-end.
>
> The last patches are a rework to alloc devcoredump dynamically and to move it to
> a better source file.
>
> I have dropped the patches that add more information to devcoredump for now,
> until I figure out a better way to do so, like storing the IB address in the
> fence.
>
> Thanks,
> André
>
> [0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
> [1] https://github.com/andrealmeid/vulkan-triangle-v1
>
> Changelog:
>
> v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@igalia.com/
> - Drop the IB and ring patch
> - Drop patch that limited information from kernel threads
> - Add patch to move coredump to amdgpu_reset
>
> v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@igalia.com/
> - Drop "Mark contexts guilty for causing soft recoveries" patch
> - Use GFP_NOWAIT for devcoredump allocation
>
> André Almeida (5):
> drm/amdgpu: Create a module param to disable soft recovery
> drm/amdgpu: Allocate coredump memory in a nonblocking way
> drm/amdgpu: Rework coredump to use memory dynamically
> drm/amdgpu: Move coredump code to amdgpu_reset file
> drm/amdgpu: Create version number for coredumps
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-----------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
> 6 files changed, 111 insertions(+), 70 deletions(-)
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-07-28 14:46 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14 16:11 [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-14 16:11 ` André Almeida
2023-07-28 14:45 ` [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump André Almeida
2023-07-28 14:45 ` André Almeida
2023-07-28 14:45 ` André Almeida
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.