All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
@ 2013-10-02 14:50 Christian König
  2013-10-03  0:45 ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-02 14:50 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Possible, but I would rather guess that this doesn't work because the IB test runs into a deadlock situation and so the GPU reset never fully completes.

Can you reproduce the problem?

If you want to make GPU resets more reliable I would rather suggest to remove the ring lock dependency.
Then we should try to give all the fence wait functions a (reliable) timeout and move reset handling a layer up into the ioctl functions. But for this you need to rip out the old PM code first.

Christian.

Marek Olšák <maraeo@gmail.com> schrieb:

>I'm afraid signalling the fences with an IB test is not reliable.
>
>Marek
>
>On Wed, Oct 2, 2013 at 3:52 PM, Christian König <deathsimple@vodafone.de> wrote:
>> NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.
>>
>> If we don't recover we indeed signal all fences manually.
>>
>> Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.
>>
>> Christian.
>>
>> Marek Olšák <maraeo@gmail.com> schrieb:
>>
>>>From: Marek Olšák <marek.olsak@amd.com>
>>>
>>>After a lockup, fences are not signalled sometimes, causing
>>>the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>in an X server freeze.
>>>
>>>This fixes only one of many deadlocks which can occur during a lockup.
>>>
>>>Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>---
>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>>diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>>index 841d0e0..7b97baa 100644
>>>--- a/drivers/gpu/drm/radeon/radeon_device.c
>>>+++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>>       radeon_save_bios_scratch_regs(rdev);
>>>       /* block TTM */
>>>       resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>+
>>>+      mutex_lock(&rdev->ring_lock);
>>>+      radeon_fence_driver_force_completion(rdev);
>>>+      mutex_unlock(&rdev->ring_lock);
>>>+
>>>       radeon_pm_suspend(rdev);
>>>       radeon_suspend(rdev);
>>>
>>>--
>>>1.8.1.2
>>>
>>>_______________________________________________
>>>dri-devel mailing list
>>>dri-devel@lists.freedesktop.org
>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-02 14:50 [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT Christian König
@ 2013-10-03  0:45 ` Marek Olšák
  2013-10-07 11:08   ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Marek Olšák @ 2013-10-03  0:45 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

First of all, I can't complain about the reliability of the hardware
GPU reset. It's mostly the kernel driver that happens to run into a
deadlock at the same time.

Regarding the issue with fences, the problem is that the GPU reset
completes successfully according to dmesg, but X doesn't respond. I
can move the cursor on the screen, but I can't do anything else and
the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
easily reproduce this, because it's the most common reason why a GPU
lockup leads to frozen X. The GPU actually recovers, but X is hung. I
can't tell whether the fences are just not signalled or whether there
is actually a real CPU deadlock I can't see.

This patch makes the problem go away and GPU resets are successful
(except for extreme cases, see below). With a small enough lockup
timeout, the lockups are just a minor annoyance and I thought I could
get through a piglit run just with a few tens or hundreds of GPU
resets...

A different type of deadlock showed up, though it needs a lot of
concurrently-running apps like piglit. What happened is that the
kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
to a GPU hang while holding onto the exclusive lock, and another
thread wanting to do the GPU reset was unable to acquire the lock.

That said, I will use the patch locally, because it helps a lot. I got
a few lockups while writing this email and I'm glad I didn't have to
reboot.

Marek

On Wed, Oct 2, 2013 at 4:50 PM, Christian König <deathsimple@vodafone.de> wrote:
> Possible, but I would rather guess that this doesn't work because the IB test runs into a deadlock situation and so the GPU reset never fully completes.
>
> Can you reproduce the problem?
>
> If you want to make GPU resets more reliable I would rather suggest to remove the ring lock dependency.
> Then we should try to give all the fence wait functions a (reliable) timeout and move reset handling a layer up into the ioctl functions. But for this you need to rip out the old PM code first.
>
> Christian.
>
> Marek Olšák <maraeo@gmail.com> schrieb:
>
>>I'm afraid signalling the fences with an IB test is not reliable.
>>
>>Marek
>>
>>On Wed, Oct 2, 2013 at 3:52 PM, Christian König <deathsimple@vodafone.de> wrote:
>>> NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.
>>>
>>> If we don't recover we indeed signal all fences manually.
>>>
>>> Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.
>>>
>>> Christian.
>>>
>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>
>>>>From: Marek Olšák <marek.olsak@amd.com>
>>>>
>>>>After a lockup, fences are not signalled sometimes, causing
>>>>the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>in an X server freeze.
>>>>
>>>>This fixes only one of many deadlocks which can occur during a lockup.
>>>>
>>>>Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>---
>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>>diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>>>index 841d0e0..7b97baa 100644
>>>>--- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>+++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>>>       radeon_save_bios_scratch_regs(rdev);
>>>>       /* block TTM */
>>>>       resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>+
>>>>+      mutex_lock(&rdev->ring_lock);
>>>>+      radeon_fence_driver_force_completion(rdev);
>>>>+      mutex_unlock(&rdev->ring_lock);
>>>>+
>>>>       radeon_pm_suspend(rdev);
>>>>       radeon_suspend(rdev);
>>>>
>>>>--
>>>>1.8.1.2
>>>>
>>>>_______________________________________________
>>>>dri-devel mailing list
>>>>dri-devel@lists.freedesktop.org
>>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-03  0:45 ` Marek Olšák
@ 2013-10-07 11:08   ` Christian König
  2013-10-08 16:21     ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-07 11:08 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

> First of all, I can't complain about the reliability of the hardware
> GPU reset. It's mostly the kernel driver that happens to run into a
> deadlock at the same time.

Alex and I spend quite some time on making this reliable again after 
activating more rings and adding VM support. The main problem is that I 
couldn't figure out where the CPU deadlock comes from, cause I couldn't 
reliable reproduce the issue.

What is the content of /proc/<pid of X server>/task/*/stack and 
sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in 
the deadlock situation?

I'm pretty sure that we nearly always have a problem when two threads 
are waiting for fences on one of them detects that we have a lockup 
while the other one keeps holding the exclusive lock. Signaling all 
fences might work around that problem, but it probably would be better 
to fix the underlying issue.

Going to take a deeper look into it.

Christian.

Am 03.10.2013 02:45, schrieb Marek Olšák:
> First of all, I can't complain about the reliability of the hardware
> GPU reset. It's mostly the kernel driver that happens to run into a
> deadlock at the same time.
>
> Regarding the issue with fences, the problem is that the GPU reset
> completes successfully according to dmesg, but X doesn't respond. I
> can move the cursor on the screen, but I can't do anything else and
> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
> easily reproduce this, because it's the most common reason why a GPU
> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
> can't tell whether the fences are just not signalled or whether there
> is actually a real CPU deadlock I can't see.
>
> This patch makes the problem go away and GPU resets are successful
> (except for extreme cases, see below). With a small enough lockup
> timeout, the lockups are just a minor annoyance and I thought I could
> get through a piglit run just with a few tens or hundreds of GPU
> resets...
>
> A different type of deadlock showed up, though it needs a lot of
> concurrently-running apps like piglit. What happened is that the
> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
> to a GPU hang while holding onto the exclusive lock, and another
> thread wanting to do the GPU reset was unable to acquire the lock.
>
> That said, I will use the patch locally, because it helps a lot. I got
> a few lockups while writing this email and I'm glad I didn't have to
> reboot.
>
> Marek
>
> On Wed, Oct 2, 2013 at 4:50 PM, Christian König <deathsimple@vodafone.de> wrote:
>> Possible, but I would rather guess that this doesn't work because the IB test runs into a deadlock situation and so the GPU reset never fully completes.
>>
>> Can you reproduce the problem?
>>
>> If you want to make GPU resets more reliable I would rather suggest to remove the ring lock dependency.
>> Then we should try to give all the fence wait functions a (reliable) timeout and move reset handling a layer up into the ioctl functions. But for this you need to rip out the old PM code first.
>>
>> Christian.
>>
>> Marek Olšák <maraeo@gmail.com> schrieb:
>>
>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>
>>> Marek
>>>
>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König <deathsimple@vodafone.de> wrote:
>>>> NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.
>>>>
>>>> If we don't recover we indeed signal all fences manually.
>>>>
>>>> Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.
>>>>
>>>> Christian.
>>>>
>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>
>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>
>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>> in an X server freeze.
>>>>>
>>>>> This fixes only one of many deadlocks which can occur during a lockup.
>>>>>
>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>>>> index 841d0e0..7b97baa 100644
>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>>>>        radeon_save_bios_scratch_regs(rdev);
>>>>>        /* block TTM */
>>>>>        resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>> +
>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>> +
>>>>>        radeon_pm_suspend(rdev);
>>>>>        radeon_suspend(rdev);
>>>>>
>>>>> --
>>>>> 1.8.1.2
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-07 11:08   ` Christian König
@ 2013-10-08 16:21     ` Christian König
  2013-10-09 10:36       ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-08 16:21 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 5858 bytes --]

Hi Marek,

please try the attached patch as a replacement for your signaling all 
fences patch. I'm not 100% sure if it fixes all issues, but it's at 
least a start.

Thanks,
Christian.

Am 07.10.2013 13:08, schrieb Christian König:
>> First of all, I can't complain about the reliability of the hardware
>> GPU reset. It's mostly the kernel driver that happens to run into a
>> deadlock at the same time.
>
> Alex and I spend quite some time on making this reliable again after 
> activating more rings and adding VM support. The main problem is that 
> I couldn't figure out where the CPU deadlock comes from, cause I 
> couldn't reliable reproduce the issue.
>
> What is the content of /proc/<pid of X server>/task/*/stack and 
> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in 
> the deadlock situation?
>
> I'm pretty sure that we nearly always have a problem when two threads 
> are waiting for fences on one of them detects that we have a lockup 
> while the other one keeps holding the exclusive lock. Signaling all 
> fences might work around that problem, but it probably would be better 
> to fix the underlying issue.
>
> Going to take a deeper look into it.
>
> Christian.
>
> Am 03.10.2013 02:45, schrieb Marek Olšák:
>> First of all, I can't complain about the reliability of the hardware
>> GPU reset. It's mostly the kernel driver that happens to run into a
>> deadlock at the same time.
>>
>> Regarding the issue with fences, the problem is that the GPU reset
>> completes successfully according to dmesg, but X doesn't respond. I
>> can move the cursor on the screen, but I can't do anything else and
>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>> easily reproduce this, because it's the most common reason why a GPU
>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>> can't tell whether the fences are just not signalled or whether there
>> is actually a real CPU deadlock I can't see.
>>
>> This patch makes the problem go away and GPU resets are successful
>> (except for extreme cases, see below). With a small enough lockup
>> timeout, the lockups are just a minor annoyance and I thought I could
>> get through a piglit run just with a few tens or hundreds of GPU
>> resets...
>>
>> A different type of deadlock showed up, though it needs a lot of
>> concurrently-running apps like piglit. What happened is that the
>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>> to a GPU hang while holding onto the exclusive lock, and another
>> thread wanting to do the GPU reset was unable to acquire the lock.
>>
>> That said, I will use the patch locally, because it helps a lot. I got
>> a few lockups while writing this email and I'm glad I didn't have to
>> reboot.
>>
>> Marek
>>
>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König 
>> <deathsimple@vodafone.de> wrote:
>>> Possible, but I would rather guess that this doesn't work because 
>>> the IB test runs into a deadlock situation and so the GPU reset 
>>> never fully completes.
>>>
>>> Can you reproduce the problem?
>>>
>>> If you want to make GPU resets more reliable I would rather suggest 
>>> to remove the ring lock dependency.
>>> Then we should try to give all the fence wait functions a (reliable) 
>>> timeout and move reset handling a layer up into the ioctl functions. 
>>> But for this you need to rip out the old PM code first.
>>>
>>> Christian.
>>>
>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>
>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>
>>>> Marek
>>>>
>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König 
>>>> <deathsimple@vodafone.de> wrote:
>>>>> NAK, after recovering from a lockup the first thing we do is 
>>>>> signalling all remaining fences with an IB test.
>>>>>
>>>>> If we don't recover we indeed signal all fences manually.
>>>>>
>>>>> Signalling all fences regardless of the outcome of the reset 
>>>>> creates problems with both types of partial resets.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>
>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>
>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>> in an X server freeze.
>>>>>>
>>>>>> This fixes only one of many deadlocks which can occur during a 
>>>>>> lockup.
>>>>>>
>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>> 1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>> index 841d0e0..7b97baa 100644
>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device 
>>>>>> *rdev)
>>>>>>        radeon_save_bios_scratch_regs(rdev);
>>>>>>        /* block TTM */
>>>>>>        resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>> +
>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>> +
>>>>>>        radeon_pm_suspend(rdev);
>>>>>>        radeon_suspend(rdev);
>>>>>>
>>>>>> -- 
>>>>>> 1.8.1.2
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-rework-and-fix-reset-detection.patch --]
[-- Type: text/x-diff; name="0001-drm-radeon-rework-and-fix-reset-detection.patch", Size: 17841 bytes --]

>From b2a36ace270eb5649743f42f3c559cfdff7f41a2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Tue, 8 Oct 2013 18:02:38 +0200
Subject: [PATCH] drm/radeon: rework and fix reset detection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT.
Consolidate the two wait sequence implementations into just one function.
Activate all waiters and remember if the reset was already done instead of
trying to reset from only one thread.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |   2 +-
 drivers/gpu/drm/radeon/radeon_device.c |   7 +
 drivers/gpu/drm/radeon/radeon_fence.c  | 347 +++++++++++----------------------
 3 files changed, 126 insertions(+), 230 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a400ac1..0201c6e 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -327,7 +327,6 @@ struct radeon_fence_driver {
 	/* sync_seq is protected by ring emission lock */
 	uint64_t			sync_seq[RADEON_NUM_RINGS];
 	atomic64_t			last_seq;
-	unsigned long			last_activity;
 	bool				initialized;
 };
 
@@ -2170,6 +2169,7 @@ struct radeon_device {
 	bool				need_dma32;
 	bool				accel_working;
 	bool				fastfb_working; /* IGP feature*/
+	bool				needs_reset;
 	struct radeon_surface_reg surface_regs[RADEON_GEM_MAX_SURFACES];
 	const struct firmware *me_fw;	/* all family ME firmware */
 	const struct firmware *pfp_fw;	/* r6/700 PFP firmware */
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 841d0e0..0eb9365 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1549,6 +1549,12 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 	int resched;
 
 	down_write(&rdev->exclusive_lock);
+
+	if (!rdev->needs_reset) {
+		up_write(&rdev->exclusive_lock);
+		return 0;
+	}
+
 	radeon_save_bios_scratch_regs(rdev);
 	/* block TTM */
 	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
@@ -1607,6 +1613,7 @@ retry:
 		dev_info(rdev->dev, "GPU reset failed\n");
 	}
 
+	rdev->needs_reset = false;
 	up_write(&rdev->exclusive_lock);
 	return r;
 }
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index ddb8f8e..b8f68b2 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -190,10 +190,8 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 		}
 	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
 
-	if (wake) {
-		rdev->fence_drv[ring].last_activity = jiffies;
+	if (wake)
 		wake_up_all(&rdev->fence_queue);
-	}
 }
 
 /**
@@ -212,13 +210,13 @@ static void radeon_fence_destroy(struct kref *kref)
 }
 
 /**
- * radeon_fence_seq_signaled - check if a fence sequeuce number has signaled
+ * radeon_fence_seq_signaled - check if a fence sequence number has signaled
  *
  * @rdev: radeon device pointer
  * @seq: sequence number
  * @ring: ring index the fence is associated with
  *
- * Check if the last singled fence sequnce number is >= the requested
+ * Check if the last signaled fence sequnce number is >= the requested
  * sequence number (all asics).
  * Returns true if the fence has signaled (current fence value
  * is >= requested value) or false if it has not (current fence
@@ -263,113 +261,131 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 }
 
 /**
- * radeon_fence_wait_seq - wait for a specific sequence number
+ * radeon_fence_any_seq_signaled - check if any sequence number is signaled
  *
  * @rdev: radeon device pointer
- * @target_seq: sequence number we want to wait for
- * @ring: ring index the fence is associated with
+ * @seq: sequence numbers
+ *
+ * Check if the last signaled fence sequnce number is >= the requested
+ * sequence number (all asics).
+ * Returns true if any has signaled (current value is >= requested value)
+ * or false if it has not. Helper function for radeon_fence_wait_seq.
+ */
+static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
+{
+	unsigned i;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i))
+			return true;
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_wait_seq - wait for a specific sequence numbers
+ *
+ * @rdev: radeon device pointer
+ * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
  * @lock_ring: whether the ring should be locked or not
  *
- * Wait for the requested sequence number to be written (all asics).
+ * Wait for the requested sequence number(s) to be written by any ring
+ * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
- * for radeon_fence_wait(), et al.
+ * for radeon_fence_wait_*().
  * Returns 0 if the sequence number has passed, error for all other cases.
- * -EDEADLK is returned when a GPU lockup has been detected and the ring is
- * marked as not ready so no further jobs get scheduled until a successful
- * reset.
+ * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
-				 unsigned ring, bool intr, bool lock_ring)
+static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
+				 bool intr, bool lock_ring)
 {
-	unsigned long timeout, last_activity;
-	uint64_t seq;
-	unsigned i;
+	uint64_t last_seq[RADEON_NUM_RINGS];
 	bool signaled;
-	int r;
+	int i, r;
+
+	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+
+		/* Save current sequence values, used to check for GPU lockups */
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (!target_seq[i])
+				continue;
 
-	while (target_seq > atomic64_read(&rdev->fence_drv[ring].last_seq)) {
-		if (!rdev->ring[ring].ready) {
-			return -EBUSY;
+			last_seq[i] = atomic64_read(&rdev->fence_drv[i].last_seq);
+			trace_radeon_fence_wait_begin(rdev->ddev, target_seq[i]);
+			radeon_irq_kms_sw_irq_get(rdev, i);
 		}
 
-		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
-		if (time_after(rdev->fence_drv[ring].last_activity, timeout)) {
-			/* the normal case, timeout is somewhere before last_activity */
-			timeout = rdev->fence_drv[ring].last_activity - timeout;
+		if (intr) {
+			r = wait_event_interruptible_timeout(rdev->fence_queue, (
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
+				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
 		} else {
-			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
-			 * anyway we will just wait for the minimum amount and then check for a lockup
-			 */
-			timeout = 1;
+			r = wait_event_timeout(rdev->fence_queue, (
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
+				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
 		}
-		seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
-		/* Save current last activity valuee, used to check for GPU lockups */
-		last_activity = rdev->fence_drv[ring].last_activity;
 
-		trace_radeon_fence_wait_begin(rdev->ddev, seq);
-		radeon_irq_kms_sw_irq_get(rdev, ring);
-		if (intr) {
-			r = wait_event_interruptible_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
-				timeout);
-                } else {
-			r = wait_event_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
-				timeout);
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (!target_seq[i])
+				continue;
+
+			radeon_irq_kms_sw_irq_put(rdev, i);
+			trace_radeon_fence_wait_end(rdev->ddev, target_seq[i]);
 		}
-		radeon_irq_kms_sw_irq_put(rdev, ring);
-		if (unlikely(r < 0)) {
+
+		if (unlikely(r < 0))
 			return r;
-		}
-		trace_radeon_fence_wait_end(rdev->ddev, seq);
 
 		if (unlikely(!signaled)) {
+			if (rdev->needs_reset)
+				return -EDEADLK;
+
 			/* we were interrupted for some reason and fence
 			 * isn't signaled yet, resume waiting */
-			if (r) {
+			if (r)
 				continue;
+
+			for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+				if (!target_seq[i])
+					continue;
+
+				if (last_seq[i] != atomic64_read(&rdev->fence_drv[i].last_seq))
+					break;
 			}
 
-			/* check if sequence value has changed since last_activity */
-			if (seq != atomic64_read(&rdev->fence_drv[ring].last_seq)) {
+			if (i != RADEON_NUM_RINGS)
 				continue;
-			}
 
-			if (lock_ring) {
+			if (lock_ring)
 				mutex_lock(&rdev->ring_lock);
-			}
 
-			/* test if somebody else has already decided that this is a lockup */
-			if (last_activity != rdev->fence_drv[ring].last_activity) {
-				if (lock_ring) {
-					mutex_unlock(&rdev->ring_lock);
-				}
-				continue;
+			for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+				if (!target_seq[i])
+					continue;
+
+				if (radeon_ring_is_lockup(rdev, i, &rdev->ring[i]))
+					break;
 			}
 
-			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
+			if (i < RADEON_NUM_RINGS) {
 				/* good news we believe it's a lockup */
-				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
-					 target_seq, seq);
-
-				/* change last activity so nobody else think there is a lockup */
-				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-					rdev->fence_drv[i].last_activity = jiffies;
-				}
-
-				/* mark the ring as not ready any more */
-				rdev->ring[ring].ready = false;
-				if (lock_ring) {
+				dev_warn(rdev->dev, "GPU lockup (waiting for "
+					 "0x%016llx last fence id 0x%016llx on"
+					 " ring %d)\n",
+					 target_seq[i], last_seq[i], i);
+
+				/* remember that we need an reset */
+				rdev->needs_reset = true;
+				if (lock_ring)
 					mutex_unlock(&rdev->ring_lock);
-				}
+				wake_up_all(&rdev->fence_queue);
 				return -EDEADLK;
 			}
 
-			if (lock_ring) {
+			if (lock_ring)
 				mutex_unlock(&rdev->ring_lock);
-			}
 		}
 	}
 	return 0;
@@ -388,6 +404,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
  */
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 	int r;
 
 	if (fence == NULL) {
@@ -395,147 +412,15 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 		return -EINVAL;
 	}
 
-	r = radeon_fence_wait_seq(fence->rdev, fence->seq,
-				  fence->ring, intr, true);
-	if (r) {
-		return r;
-	}
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
-	return 0;
-}
-
-static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
-{
-	unsigned i;
-
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) {
-			return true;
-		}
-	}
-	return false;
-}
-
-/**
- * radeon_fence_wait_any_seq - wait for a sequence number on any ring
- *
- * @rdev: radeon device pointer
- * @target_seq: sequence number(s) we want to wait for
- * @intr: use interruptable sleep
- *
- * Wait for the requested sequence number(s) to be written by any ring
- * (all asics).  Sequnce number array is indexed by ring id.
- * @intr selects whether to use interruptable (true) or non-interruptable
- * (false) sleep when waiting for the sequence number.  Helper function
- * for radeon_fence_wait_any(), et al.
- * Returns 0 if the sequence number has passed, error for all other cases.
- */
-static int radeon_fence_wait_any_seq(struct radeon_device *rdev,
-				     u64 *target_seq, bool intr)
-{
-	unsigned long timeout, last_activity, tmp;
-	unsigned i, ring = RADEON_NUM_RINGS;
-	bool signaled;
-	int r;
-
-	for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (!target_seq[i]) {
-			continue;
-		}
-
-		/* use the most recent one as indicator */
-		if (time_after(rdev->fence_drv[i].last_activity, last_activity)) {
-			last_activity = rdev->fence_drv[i].last_activity;
-		}
-
-		/* For lockup detection just pick the lowest ring we are
-		 * actively waiting for
-		 */
-		if (i < ring) {
-			ring = i;
-		}
-	}
-
-	/* nothing to wait for ? */
-	if (ring == RADEON_NUM_RINGS) {
-		return -ENOENT;
-	}
-
-	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
-		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
-		if (time_after(last_activity, timeout)) {
-			/* the normal case, timeout is somewhere before last_activity */
-			timeout = last_activity - timeout;
-		} else {
-			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
-			 * anyway we will just wait for the minimum amount and then check for a lockup
-			 */
-			timeout = 1;
-		}
+	seq[fence->ring] = fence->seq;
+	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+		return 0;
 
-		trace_radeon_fence_wait_begin(rdev->ddev, target_seq[ring]);
-		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-			if (target_seq[i]) {
-				radeon_irq_kms_sw_irq_get(rdev, i);
-			}
-		}
-		if (intr) {
-			r = wait_event_interruptible_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
-				timeout);
-		} else {
-			r = wait_event_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
-				timeout);
-		}
-		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-			if (target_seq[i]) {
-				radeon_irq_kms_sw_irq_put(rdev, i);
-			}
-		}
-		if (unlikely(r < 0)) {
-			return r;
-		}
-		trace_radeon_fence_wait_end(rdev->ddev, target_seq[ring]);
-
-		if (unlikely(!signaled)) {
-			/* we were interrupted for some reason and fence
-			 * isn't signaled yet, resume waiting */
-			if (r) {
-				continue;
-			}
-
-			mutex_lock(&rdev->ring_lock);
-			for (i = 0, tmp = 0; i < RADEON_NUM_RINGS; ++i) {
-				if (time_after(rdev->fence_drv[i].last_activity, tmp)) {
-					tmp = rdev->fence_drv[i].last_activity;
-				}
-			}
-			/* test if somebody else has already decided that this is a lockup */
-			if (last_activity != tmp) {
-				last_activity = tmp;
-				mutex_unlock(&rdev->ring_lock);
-				continue;
-			}
-
-			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
-				/* good news we believe it's a lockup */
-				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx)\n",
-					 target_seq[ring]);
-
-				/* change last activity so nobody else think there is a lockup */
-				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-					rdev->fence_drv[i].last_activity = jiffies;
-				}
+	r = radeon_fence_wait_seq(fence->rdev, seq, intr, true);
+	if (r)
+		return r;
 
-				/* mark the ring as not ready any more */
-				rdev->ring[ring].ready = false;
-				mutex_unlock(&rdev->ring_lock);
-				return -EDEADLK;
-			}
-			mutex_unlock(&rdev->ring_lock);
-		}
-	}
+	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
 	return 0;
 }
 
@@ -557,7 +442,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 			  bool intr)
 {
 	uint64_t seq[RADEON_NUM_RINGS];
-	unsigned i;
+	unsigned i, num_rings = 0;
 	int r;
 
 	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -567,15 +452,19 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 			continue;
 		}
 
-		if (fences[i]->seq == RADEON_FENCE_SIGNALED_SEQ) {
-			/* something was allready signaled */
-			return 0;
-		}
-
 		seq[i] = fences[i]->seq;
+		++num_rings;
+
+		/* test if something was allready signaled */
+		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
+			return 0;
 	}
 
-	r = radeon_fence_wait_any_seq(rdev, seq, intr);
+	/* nothing to wait for ? */
+	if (num_rings == 0)
+		return -ENOENT;
+
+	r = radeon_fence_wait_seq(rdev, seq, intr, true);
 	if (r) {
 		return r;
 	}
@@ -594,15 +483,15 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  */
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
 {
-	uint64_t seq;
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 
-	seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
-	if (seq >= rdev->fence_drv[ring].sync_seq[ring]) {
+	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
+	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
 		/* nothing to wait for, last_seq is
 		   already the last emited fence */
 		return -ENOENT;
 	}
-	return radeon_fence_wait_seq(rdev, seq, ring, false, false);
+	return radeon_fence_wait_seq(rdev, seq, false, false);
 }
 
 /**
@@ -617,14 +506,15 @@ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
  */
 int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
 {
-	uint64_t seq = rdev->fence_drv[ring].sync_seq[ring];
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 	int r;
 
-	r = radeon_fence_wait_seq(rdev, seq, ring, false, false);
+	seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
+	r = radeon_fence_wait_seq(rdev, seq, false, false);
 	if (r) {
-		if (r == -EDEADLK) {
+		if (r == -EDEADLK)
 			return -EDEADLK;
-		}
+
 		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n",
 			ring, r);
 	}
@@ -826,7 +716,6 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	for (i = 0; i < RADEON_NUM_RINGS; ++i)
 		rdev->fence_drv[ring].sync_seq[i] = 0;
 	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
-	rdev->fence_drv[ring].last_activity = jiffies;
 	rdev->fence_drv[ring].initialized = false;
 }
 
-- 
1.8.1.2


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-08 16:21     ` Christian König
@ 2013-10-09 10:36       ` Marek Olšák
  2013-10-09 11:09         ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Marek Olšák @ 2013-10-09 10:36 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 6260 bytes --]

I'm afraid your patch sometimes causes the GPU reset to fail, which
had never happened before IIRC.

The dmesg log from the failure is attached.

Marek

On Tue, Oct 8, 2013 at 6:21 PM, Christian König <deathsimple@vodafone.de> wrote:
> Hi Marek,
>
> please try the attached patch as a replacement for your signaling all fences
> patch. I'm not 100% sure if it fixes all issues, but it's at least a start.
>
> Thanks,
> Christian.
>
> Am 07.10.2013 13:08, schrieb Christian König:
>
>>> First of all, I can't complain about the reliability of the hardware
>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>> deadlock at the same time.
>>
>>
>> Alex and I spend quite some time on making this reliable again after
>> activating more rings and adding VM support. The main problem is that I
>> couldn't figure out where the CPU deadlock comes from, cause I couldn't
>> reliable reproduce the issue.
>>
>> What is the content of /proc/<pid of X server>/task/*/stack and
>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in the
>> deadlock situation?
>>
>> I'm pretty sure that we nearly always have a problem when two threads are
>> waiting for fences on one of them detects that we have a lockup while the
>> other one keeps holding the exclusive lock. Signaling all fences might work
>> around that problem, but it probably would be better to fix the underlying
>> issue.
>>
>> Going to take a deeper look into it.
>>
>> Christian.
>>
>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>
>>> First of all, I can't complain about the reliability of the hardware
>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>> deadlock at the same time.
>>>
>>> Regarding the issue with fences, the problem is that the GPU reset
>>> completes successfully according to dmesg, but X doesn't respond. I
>>> can move the cursor on the screen, but I can't do anything else and
>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>> easily reproduce this, because it's the most common reason why a GPU
>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>> can't tell whether the fences are just not signalled or whether there
>>> is actually a real CPU deadlock I can't see.
>>>
>>> This patch makes the problem go away and GPU resets are successful
>>> (except for extreme cases, see below). With a small enough lockup
>>> timeout, the lockups are just a minor annoyance and I thought I could
>>> get through a piglit run just with a few tens or hundreds of GPU
>>> resets...
>>>
>>> A different type of deadlock showed up, though it needs a lot of
>>> concurrently-running apps like piglit. What happened is that the
>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>> to a GPU hang while holding onto the exclusive lock, and another
>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>
>>> That said, I will use the patch locally, because it helps a lot. I got
>>> a few lockups while writing this email and I'm glad I didn't have to
>>> reboot.
>>>
>>> Marek
>>>
>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König <deathsimple@vodafone.de>
>>> wrote:
>>>>
>>>> Possible, but I would rather guess that this doesn't work because the IB
>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>> completes.
>>>>
>>>> Can you reproduce the problem?
>>>>
>>>> If you want to make GPU resets more reliable I would rather suggest to
>>>> remove the ring lock dependency.
>>>> Then we should try to give all the fence wait functions a (reliable)
>>>> timeout and move reset handling a layer up into the ioctl functions. But for
>>>> this you need to rip out the old PM code first.
>>>>
>>>> Christian.
>>>>
>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>
>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>
>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>> signalling all remaining fences with an IB test.
>>>>>>
>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>
>>>>>> Signalling all fences regardless of the outcome of the reset creates
>>>>>> problems with both types of partial resets.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>
>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>
>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>> in an X server freeze.
>>>>>>>
>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>> lockup.
>>>>>>>
>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>> *rdev)
>>>>>>>        radeon_save_bios_scratch_regs(rdev);
>>>>>>>        /* block TTM */
>>>>>>>        resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>> +
>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>> +
>>>>>>>        radeon_pm_suspend(rdev);
>>>>>>>        radeon_suspend(rdev);
>>>>>>>
>>>>>>> --
>>>>>>> 1.8.1.2
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 10852 bytes --]

[  104.836967] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  104.836974] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000fcee last fence id 0x000000000000fcd8 on ring 0)
[  105.001288] [drm] Disabling audio 0 support
[  105.001289] [drm] Disabling audio 1 support
[  105.001290] [drm] Disabling audio 2 support
[  105.001291] [drm] Disabling audio 3 support
[  105.001292] [drm] Disabling audio 4 support
[  105.001292] [drm] Disabling audio 5 support
[  105.001293] [drm] Disabling audio 6 support
[  105.001496] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[  105.008773] radeon 0000:01:00.0: Saved 15990 dwords of commands on ring 0.
[  105.008776] radeon 0000:01:00.0: Saved 121 dwords of commands on ring 2.
[  105.008787] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  105.008788] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  105.008790] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  105.008791] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  105.008792] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  105.008794] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  105.008795] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  105.008796] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  105.008797] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  105.008799] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  105.008801] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
[  105.008802] radeon 0000:01:00.0:   CP_STAT = 0x84010200
[  105.008803] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  105.008805] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  105.008806] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  105.008807] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  105.008809] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  105.008810] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  105.008811] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  105.008813] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00002100
[  105.008814] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  105.008816] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  105.008817] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  105.018070] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  105.018122] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  105.019267] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  105.019268] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  105.019269] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  105.019271] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  105.019272] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  105.019273] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  105.019275] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  105.019276] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  105.019277] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  105.019279] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
[  105.019280] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  105.019281] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  105.019283] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  105.019284] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  105.019285] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  105.019287] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  105.019288] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  105.019289] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  105.019291] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  105.019292] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  105.019299] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  105.036864] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  105.036867] [drm] PCIE gen 3 link speeds already enabled
[  105.039262] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  105.039340] radeon 0000:01:00.0: WB enabled
[  105.039344] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc7cc00
[  105.039346] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc7cc04
[  105.039347] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc7cc08
[  105.039348] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc7cc0c
[  105.039349] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc7cc10
[  105.039727] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  105.041638] [drm] ring test on 0 succeeded in 4 usecs
[  105.316804] [drm:cik_ring_test] *ERROR* radeon: ring 1 test failed (scratch(0x30118)=0xCAFEDEAD)
[  105.316952] [drm] ring test on 3 succeeded in 2 usecs
[  105.316962] [drm] ring test on 4 succeeded in 2 usecs
[  105.362635] [drm] ring test on 5 succeeded in 2 usecs
[  105.382493] [drm] UVD initialized successfully.
[  105.383105] [drm] Enabling audio 0 support
[  105.383106] [drm] Enabling audio 1 support
[  105.383106] [drm] Enabling audio 2 support
[  105.383107] [drm] Enabling audio 3 support
[  105.383108] [drm] Enabling audio 4 support
[  105.383109] [drm] Enabling audio 5 support
[  105.383109] [drm] Enabling audio 6 support
[  105.383294] [drm:cik_ib_test] *ERROR* radeon: fence wait failed (-35).
[  105.383296] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[  105.383297] radeon 0000:01:00.0: ib ring test failed (-35).
[  105.383298] [drm] Disabling audio 0 support
[  105.383299] [drm] Disabling audio 1 support
[  105.383299] [drm] Disabling audio 2 support
[  105.383300] [drm] Disabling audio 3 support
[  105.383301] [drm] Disabling audio 4 support
[  105.383302] [drm] Disabling audio 5 support
[  105.383302] [drm] Disabling audio 6 support
[  105.390445] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  105.390446] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  105.390447] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  105.390449] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  105.390450] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  105.390451] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  105.390453] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  105.390454] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  105.390455] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  105.390457] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEED57
[  105.390458] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
[  105.390460] radeon 0000:01:00.0:   CP_STAT = 0x80010200
[  105.390461] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  105.390462] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  105.390464] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  105.390465] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  105.390466] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  105.390468] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  105.390469] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  105.390471] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00002100
[  105.390472] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  105.390473] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  105.390475] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  105.390583] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  105.390635] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  105.391779] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  105.391781] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  105.391782] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  105.391783] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  105.391785] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  105.391786] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  105.391787] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  105.391788] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  105.391790] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEED57
[  105.391791] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
[  105.391793] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  105.391794] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  105.391795] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  105.391797] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  105.391798] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  105.391799] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  105.391801] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  105.391802] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  105.391803] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  105.391805] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  105.391811] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  105.394327] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  105.394329] [drm] PCIE gen 3 link speeds already enabled
[  105.396699] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  105.396772] radeon 0000:01:00.0: WB enabled
[  105.396775] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc7cc00
[  105.396776] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc7cc04
[  105.396778] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc7cc08
[  105.396779] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc7cc0c
[  105.396780] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc7cc10
[  105.397157] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  105.399061] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  105.399062] [drm:cik_resume] *ERROR* cik startup failed on resume
[  105.399193] [drm:cik_sdma_ib_test] *ERROR* radeon: fence wait failed (-35).
[  105.399194] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 3 (-35).
[  105.399325] [drm:cik_sdma_ib_test] *ERROR* radeon: fence wait failed (-35).
[  105.399326] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 4 (-35).
[  105.536423] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  105.536424] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-09 10:36       ` Marek Olšák
@ 2013-10-09 11:09         ` Christian König
  2013-10-09 12:04           ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-09 11:09 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Mhm, that doesn't looks like anything related but more like the reset of 
the compute ring didn't worked.

How often does that happen? And do you still get the problem where X 
waits for a fence that never comes back?

Christian.

Am 09.10.2013 12:36, schrieb Marek Olšák:
> I'm afraid your patch sometimes causes the GPU reset to fail, which
> had never happened before IIRC.
>
> The dmesg log from the failure is attached.
>
> Marek
>
> On Tue, Oct 8, 2013 at 6:21 PM, Christian König <deathsimple@vodafone.de> wrote:
>> Hi Marek,
>>
>> please try the attached patch as a replacement for your signaling all fences
>> patch. I'm not 100% sure if it fixes all issues, but it's at least a start.
>>
>> Thanks,
>> Christian.
>>
>> Am 07.10.2013 13:08, schrieb Christian König:
>>
>>>> First of all, I can't complain about the reliability of the hardware
>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>> deadlock at the same time.
>>>
>>> Alex and I spend quite some time on making this reliable again after
>>> activating more rings and adding VM support. The main problem is that I
>>> couldn't figure out where the CPU deadlock comes from, cause I couldn't
>>> reliable reproduce the issue.
>>>
>>> What is the content of /proc/<pid of X server>/task/*/stack and
>>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in the
>>> deadlock situation?
>>>
>>> I'm pretty sure that we nearly always have a problem when two threads are
>>> waiting for fences on one of them detects that we have a lockup while the
>>> other one keeps holding the exclusive lock. Signaling all fences might work
>>> around that problem, but it probably would be better to fix the underlying
>>> issue.
>>>
>>> Going to take a deeper look into it.
>>>
>>> Christian.
>>>
>>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>> First of all, I can't complain about the reliability of the hardware
>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>> deadlock at the same time.
>>>>
>>>> Regarding the issue with fences, the problem is that the GPU reset
>>>> completes successfully according to dmesg, but X doesn't respond. I
>>>> can move the cursor on the screen, but I can't do anything else and
>>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>>> easily reproduce this, because it's the most common reason why a GPU
>>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>>> can't tell whether the fences are just not signalled or whether there
>>>> is actually a real CPU deadlock I can't see.
>>>>
>>>> This patch makes the problem go away and GPU resets are successful
>>>> (except for extreme cases, see below). With a small enough lockup
>>>> timeout, the lockups are just a minor annoyance and I thought I could
>>>> get through a piglit run just with a few tens or hundreds of GPU
>>>> resets...
>>>>
>>>> A different type of deadlock showed up, though it needs a lot of
>>>> concurrently-running apps like piglit. What happened is that the
>>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>>> to a GPU hang while holding onto the exclusive lock, and another
>>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>>
>>>> That said, I will use the patch locally, because it helps a lot. I got
>>>> a few lockups while writing this email and I'm glad I didn't have to
>>>> reboot.
>>>>
>>>> Marek
>>>>
>>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König <deathsimple@vodafone.de>
>>>> wrote:
>>>>> Possible, but I would rather guess that this doesn't work because the IB
>>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>>> completes.
>>>>>
>>>>> Can you reproduce the problem?
>>>>>
>>>>> If you want to make GPU resets more reliable I would rather suggest to
>>>>> remove the ring lock dependency.
>>>>> Then we should try to give all the fence wait functions a (reliable)
>>>>> timeout and move reset handling a layer up into the ioctl functions. But for
>>>>> this you need to rip out the old PM code first.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>
>>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>>> signalling all remaining fences with an IB test.
>>>>>>>
>>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>>
>>>>>>> Signalling all fences regardless of the outcome of the reset creates
>>>>>>> problems with both types of partial resets.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>
>>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>
>>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>>> in an X server freeze.
>>>>>>>>
>>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>>> lockup.
>>>>>>>>
>>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>>> *rdev)
>>>>>>>>         radeon_save_bios_scratch_regs(rdev);
>>>>>>>>         /* block TTM */
>>>>>>>>         resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>>> +
>>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>>> +
>>>>>>>>         radeon_pm_suspend(rdev);
>>>>>>>>         radeon_suspend(rdev);
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.8.1.2
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-09 11:09         ` Christian König
@ 2013-10-09 12:04           ` Marek Olšák
  2013-10-13 12:47             ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Marek Olšák @ 2013-10-09 12:04 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

The ring test of the first compute ring always fails and it shouldn't
affect the GPU reset in any way.

I can't tell if the deadlock issue is fixed, because the GPU reset
usually fails with your patch. It always succeeded without your patch.

Marek

On Wed, Oct 9, 2013 at 1:09 PM, Christian König <deathsimple@vodafone.de> wrote:
> Mhm, that doesn't looks like anything related but more like the reset of the
> compute ring didn't worked.
>
> How often does that happen? And do you still get the problem where X waits
> for a fence that never comes back?
>
> Christian.
>
> Am 09.10.2013 12:36, schrieb Marek Olšák:
>
>> I'm afraid your patch sometimes causes the GPU reset to fail, which
>> had never happened before IIRC.
>>
>> The dmesg log from the failure is attached.
>>
>> Marek
>>
>> On Tue, Oct 8, 2013 at 6:21 PM, Christian König <deathsimple@vodafone.de>
>> wrote:
>>>
>>> Hi Marek,
>>>
>>> please try the attached patch as a replacement for your signaling all
>>> fences
>>> patch. I'm not 100% sure if it fixes all issues, but it's at least a
>>> start.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 07.10.2013 13:08, schrieb Christian König:
>>>
>>>>> First of all, I can't complain about the reliability of the hardware
>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>> deadlock at the same time.
>>>>
>>>>
>>>> Alex and I spend quite some time on making this reliable again after
>>>> activating more rings and adding VM support. The main problem is that I
>>>> couldn't figure out where the CPU deadlock comes from, cause I couldn't
>>>> reliable reproduce the issue.
>>>>
>>>> What is the content of /proc/<pid of X server>/task/*/stack and
>>>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in
>>>> the
>>>> deadlock situation?
>>>>
>>>> I'm pretty sure that we nearly always have a problem when two threads
>>>> are
>>>> waiting for fences on one of them detects that we have a lockup while
>>>> the
>>>> other one keeps holding the exclusive lock. Signaling all fences might
>>>> work
>>>> around that problem, but it probably would be better to fix the
>>>> underlying
>>>> issue.
>>>>
>>>> Going to take a deeper look into it.
>>>>
>>>> Christian.
>>>>
>>>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>>>
>>>>> First of all, I can't complain about the reliability of the hardware
>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>> deadlock at the same time.
>>>>>
>>>>> Regarding the issue with fences, the problem is that the GPU reset
>>>>> completes successfully according to dmesg, but X doesn't respond. I
>>>>> can move the cursor on the screen, but I can't do anything else and
>>>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>>>> easily reproduce this, because it's the most common reason why a GPU
>>>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>>>> can't tell whether the fences are just not signalled or whether there
>>>>> is actually a real CPU deadlock I can't see.
>>>>>
>>>>> This patch makes the problem go away and GPU resets are successful
>>>>> (except for extreme cases, see below). With a small enough lockup
>>>>> timeout, the lockups are just a minor annoyance and I thought I could
>>>>> get through a piglit run just with a few tens or hundreds of GPU
>>>>> resets...
>>>>>
>>>>> A different type of deadlock showed up, though it needs a lot of
>>>>> concurrently-running apps like piglit. What happened is that the
>>>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>>>> to a GPU hang while holding onto the exclusive lock, and another
>>>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>>>
>>>>> That said, I will use the patch locally, because it helps a lot. I got
>>>>> a few lockups while writing this email and I'm glad I didn't have to
>>>>> reboot.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König
>>>>> <deathsimple@vodafone.de>
>>>>> wrote:
>>>>>>
>>>>>> Possible, but I would rather guess that this doesn't work because the
>>>>>> IB
>>>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>>>> completes.
>>>>>>
>>>>>> Can you reproduce the problem?
>>>>>>
>>>>>> If you want to make GPU resets more reliable I would rather suggest to
>>>>>> remove the ring lock dependency.
>>>>>> Then we should try to give all the fence wait functions a (reliable)
>>>>>> timeout and move reset handling a layer up into the ioctl functions.
>>>>>> But for
>>>>>> this you need to rip out the old PM code first.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>
>>>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>
>>>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>>>> signalling all remaining fences with an IB test.
>>>>>>>>
>>>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>>>
>>>>>>>> Signalling all fences regardless of the outcome of the reset creates
>>>>>>>> problems with both types of partial resets.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>
>>>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>
>>>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>>>> in an X server freeze.
>>>>>>>>>
>>>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>>>> lockup.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>> ---
>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>>>> *rdev)
>>>>>>>>>         radeon_save_bios_scratch_regs(rdev);
>>>>>>>>>         /* block TTM */
>>>>>>>>>         resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>>>> +
>>>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>>>> +
>>>>>>>>>         radeon_pm_suspend(rdev);
>>>>>>>>>         radeon_suspend(rdev);
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 1.8.1.2
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-09 12:04           ` Marek Olšák
@ 2013-10-13 12:47             ` Christian König
  2013-10-13 20:16               ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-13 12:47 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 7787 bytes --]

I've figured out what was wrong with the patch. We need to reset the 
"needs_reset" flag earlier, otherwise the IB test might think we are in 
a lockup and aborts the reset after waiting for the minimum timeout period.

Please try the attached patch instead.

Thanks,
Christian.

Am 09.10.2013 14:04, schrieb Marek Olšák:
> The ring test of the first compute ring always fails and it shouldn't
> affect the GPU reset in any way.
>
> I can't tell if the deadlock issue is fixed, because the GPU reset
> usually fails with your patch. It always succeeded without your patch.
>
> Marek
>
> On Wed, Oct 9, 2013 at 1:09 PM, Christian König <deathsimple@vodafone.de> wrote:
>> Mhm, that doesn't looks like anything related but more like the reset of the
>> compute ring didn't worked.
>>
>> How often does that happen? And do you still get the problem where X waits
>> for a fence that never comes back?
>>
>> Christian.
>>
>> Am 09.10.2013 12:36, schrieb Marek Olšák:
>>
>>> I'm afraid your patch sometimes causes the GPU reset to fail, which
>>> had never happened before IIRC.
>>>
>>> The dmesg log from the failure is attached.
>>>
>>> Marek
>>>
>>> On Tue, Oct 8, 2013 at 6:21 PM, Christian König <deathsimple@vodafone.de>
>>> wrote:
>>>> Hi Marek,
>>>>
>>>> please try the attached patch as a replacement for your signaling all
>>>> fences
>>>> patch. I'm not 100% sure if it fixes all issues, but it's at least a
>>>> start.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 07.10.2013 13:08, schrieb Christian König:
>>>>
>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>> deadlock at the same time.
>>>>>
>>>>> Alex and I spend quite some time on making this reliable again after
>>>>> activating more rings and adding VM support. The main problem is that I
>>>>> couldn't figure out where the CPU deadlock comes from, cause I couldn't
>>>>> reliable reproduce the issue.
>>>>>
>>>>> What is the content of /proc/<pid of X server>/task/*/stack and
>>>>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in
>>>>> the
>>>>> deadlock situation?
>>>>>
>>>>> I'm pretty sure that we nearly always have a problem when two threads
>>>>> are
>>>>> waiting for fences on one of them detects that we have a lockup while
>>>>> the
>>>>> other one keeps holding the exclusive lock. Signaling all fences might
>>>>> work
>>>>> around that problem, but it probably would be better to fix the
>>>>> underlying
>>>>> issue.
>>>>>
>>>>> Going to take a deeper look into it.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>> deadlock at the same time.
>>>>>>
>>>>>> Regarding the issue with fences, the problem is that the GPU reset
>>>>>> completes successfully according to dmesg, but X doesn't respond. I
>>>>>> can move the cursor on the screen, but I can't do anything else and
>>>>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>>>>> easily reproduce this, because it's the most common reason why a GPU
>>>>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>>>>> can't tell whether the fences are just not signalled or whether there
>>>>>> is actually a real CPU deadlock I can't see.
>>>>>>
>>>>>> This patch makes the problem go away and GPU resets are successful
>>>>>> (except for extreme cases, see below). With a small enough lockup
>>>>>> timeout, the lockups are just a minor annoyance and I thought I could
>>>>>> get through a piglit run just with a few tens or hundreds of GPU
>>>>>> resets...
>>>>>>
>>>>>> A different type of deadlock showed up, though it needs a lot of
>>>>>> concurrently-running apps like piglit. What happened is that the
>>>>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>>>>> to a GPU hang while holding onto the exclusive lock, and another
>>>>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>>>>
>>>>>> That said, I will use the patch locally, because it helps a lot. I got
>>>>>> a few lockups while writing this email and I'm glad I didn't have to
>>>>>> reboot.
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König
>>>>>> <deathsimple@vodafone.de>
>>>>>> wrote:
>>>>>>> Possible, but I would rather guess that this doesn't work because the
>>>>>>> IB
>>>>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>>>>> completes.
>>>>>>>
>>>>>>> Can you reproduce the problem?
>>>>>>>
>>>>>>> If you want to make GPU resets more reliable I would rather suggest to
>>>>>>> remove the ring lock dependency.
>>>>>>> Then we should try to give all the fence wait functions a (reliable)
>>>>>>> timeout and move reset handling a layer up into the ioctl functions.
>>>>>>> But for
>>>>>>> this you need to rip out the old PM code first.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>
>>>>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>>>>> signalling all remaining fences with an IB test.
>>>>>>>>>
>>>>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>>>>
>>>>>>>>> Signalling all fences regardless of the outcome of the reset creates
>>>>>>>>> problems with both types of partial resets.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>>
>>>>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>>
>>>>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>>>>> in an X server freeze.
>>>>>>>>>>
>>>>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>>>>> lockup.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>>>>> *rdev)
>>>>>>>>>>          radeon_save_bios_scratch_regs(rdev);
>>>>>>>>>>          /* block TTM */
>>>>>>>>>>          resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>>>>> +
>>>>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>>>>> +
>>>>>>>>>>          radeon_pm_suspend(rdev);
>>>>>>>>>>          radeon_suspend(rdev);
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 1.8.1.2
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dri-devel mailing list
>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-rework-and-fix-reset-detection-v2.patch --]
[-- Type: text/x-diff; name="0001-drm-radeon-rework-and-fix-reset-detection-v2.patch", Size: 17769 bytes --]

>From bdcb7536f8a1b0607c37760cf441888a6fc170c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Tue, 8 Oct 2013 18:02:38 +0200
Subject: [PATCH] drm/radeon: rework and fix reset detection v2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT.
Consolidate the two wait sequence implementations into just one function.
Activate all waiters and remember if the reset was already done instead of
trying to reset from only one thread.

v2: clear reset flag earlier to avoid timeout in IB test

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |   2 +-
 drivers/gpu/drm/radeon/radeon_device.c |   8 +
 drivers/gpu/drm/radeon/radeon_fence.c  | 347 +++++++++++----------------------
 3 files changed, 127 insertions(+), 230 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a400ac1..0201c6e 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -327,7 +327,6 @@ struct radeon_fence_driver {
 	/* sync_seq is protected by ring emission lock */
 	uint64_t			sync_seq[RADEON_NUM_RINGS];
 	atomic64_t			last_seq;
-	unsigned long			last_activity;
 	bool				initialized;
 };
 
@@ -2170,6 +2169,7 @@ struct radeon_device {
 	bool				need_dma32;
 	bool				accel_working;
 	bool				fastfb_working; /* IGP feature*/
+	bool				needs_reset;
 	struct radeon_surface_reg surface_regs[RADEON_GEM_MAX_SURFACES];
 	const struct firmware *me_fw;	/* all family ME firmware */
 	const struct firmware *pfp_fw;	/* r6/700 PFP firmware */
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 841d0e0..3f35f21 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1549,6 +1549,14 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 	int resched;
 
 	down_write(&rdev->exclusive_lock);
+
+	if (!rdev->needs_reset) {
+		up_write(&rdev->exclusive_lock);
+		return 0;
+	}
+
+	rdev->needs_reset = false;
+
 	radeon_save_bios_scratch_regs(rdev);
 	/* block TTM */
 	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index ddb8f8e..b8f68b2 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -190,10 +190,8 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 		}
 	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
 
-	if (wake) {
-		rdev->fence_drv[ring].last_activity = jiffies;
+	if (wake)
 		wake_up_all(&rdev->fence_queue);
-	}
 }
 
 /**
@@ -212,13 +210,13 @@ static void radeon_fence_destroy(struct kref *kref)
 }
 
 /**
- * radeon_fence_seq_signaled - check if a fence sequeuce number has signaled
+ * radeon_fence_seq_signaled - check if a fence sequence number has signaled
  *
  * @rdev: radeon device pointer
  * @seq: sequence number
  * @ring: ring index the fence is associated with
  *
- * Check if the last singled fence sequnce number is >= the requested
+ * Check if the last signaled fence sequnce number is >= the requested
  * sequence number (all asics).
  * Returns true if the fence has signaled (current fence value
  * is >= requested value) or false if it has not (current fence
@@ -263,113 +261,131 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 }
 
 /**
- * radeon_fence_wait_seq - wait for a specific sequence number
+ * radeon_fence_any_seq_signaled - check if any sequence number is signaled
  *
  * @rdev: radeon device pointer
- * @target_seq: sequence number we want to wait for
- * @ring: ring index the fence is associated with
+ * @seq: sequence numbers
+ *
+ * Check if the last signaled fence sequnce number is >= the requested
+ * sequence number (all asics).
+ * Returns true if any has signaled (current value is >= requested value)
+ * or false if it has not. Helper function for radeon_fence_wait_seq.
+ */
+static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
+{
+	unsigned i;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i))
+			return true;
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_wait_seq - wait for a specific sequence numbers
+ *
+ * @rdev: radeon device pointer
+ * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
  * @lock_ring: whether the ring should be locked or not
  *
- * Wait for the requested sequence number to be written (all asics).
+ * Wait for the requested sequence number(s) to be written by any ring
+ * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
- * for radeon_fence_wait(), et al.
+ * for radeon_fence_wait_*().
  * Returns 0 if the sequence number has passed, error for all other cases.
- * -EDEADLK is returned when a GPU lockup has been detected and the ring is
- * marked as not ready so no further jobs get scheduled until a successful
- * reset.
+ * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
-				 unsigned ring, bool intr, bool lock_ring)
+static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
+				 bool intr, bool lock_ring)
 {
-	unsigned long timeout, last_activity;
-	uint64_t seq;
-	unsigned i;
+	uint64_t last_seq[RADEON_NUM_RINGS];
 	bool signaled;
-	int r;
+	int i, r;
+
+	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+
+		/* Save current sequence values, used to check for GPU lockups */
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (!target_seq[i])
+				continue;
 
-	while (target_seq > atomic64_read(&rdev->fence_drv[ring].last_seq)) {
-		if (!rdev->ring[ring].ready) {
-			return -EBUSY;
+			last_seq[i] = atomic64_read(&rdev->fence_drv[i].last_seq);
+			trace_radeon_fence_wait_begin(rdev->ddev, target_seq[i]);
+			radeon_irq_kms_sw_irq_get(rdev, i);
 		}
 
-		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
-		if (time_after(rdev->fence_drv[ring].last_activity, timeout)) {
-			/* the normal case, timeout is somewhere before last_activity */
-			timeout = rdev->fence_drv[ring].last_activity - timeout;
+		if (intr) {
+			r = wait_event_interruptible_timeout(rdev->fence_queue, (
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
+				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
 		} else {
-			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
-			 * anyway we will just wait for the minimum amount and then check for a lockup
-			 */
-			timeout = 1;
+			r = wait_event_timeout(rdev->fence_queue, (
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
+				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
 		}
-		seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
-		/* Save current last activity valuee, used to check for GPU lockups */
-		last_activity = rdev->fence_drv[ring].last_activity;
 
-		trace_radeon_fence_wait_begin(rdev->ddev, seq);
-		radeon_irq_kms_sw_irq_get(rdev, ring);
-		if (intr) {
-			r = wait_event_interruptible_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
-				timeout);
-                } else {
-			r = wait_event_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
-				timeout);
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (!target_seq[i])
+				continue;
+
+			radeon_irq_kms_sw_irq_put(rdev, i);
+			trace_radeon_fence_wait_end(rdev->ddev, target_seq[i]);
 		}
-		radeon_irq_kms_sw_irq_put(rdev, ring);
-		if (unlikely(r < 0)) {
+
+		if (unlikely(r < 0))
 			return r;
-		}
-		trace_radeon_fence_wait_end(rdev->ddev, seq);
 
 		if (unlikely(!signaled)) {
+			if (rdev->needs_reset)
+				return -EDEADLK;
+
 			/* we were interrupted for some reason and fence
 			 * isn't signaled yet, resume waiting */
-			if (r) {
+			if (r)
 				continue;
+
+			for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+				if (!target_seq[i])
+					continue;
+
+				if (last_seq[i] != atomic64_read(&rdev->fence_drv[i].last_seq))
+					break;
 			}
 
-			/* check if sequence value has changed since last_activity */
-			if (seq != atomic64_read(&rdev->fence_drv[ring].last_seq)) {
+			if (i != RADEON_NUM_RINGS)
 				continue;
-			}
 
-			if (lock_ring) {
+			if (lock_ring)
 				mutex_lock(&rdev->ring_lock);
-			}
 
-			/* test if somebody else has already decided that this is a lockup */
-			if (last_activity != rdev->fence_drv[ring].last_activity) {
-				if (lock_ring) {
-					mutex_unlock(&rdev->ring_lock);
-				}
-				continue;
+			for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+				if (!target_seq[i])
+					continue;
+
+				if (radeon_ring_is_lockup(rdev, i, &rdev->ring[i]))
+					break;
 			}
 
-			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
+			if (i < RADEON_NUM_RINGS) {
 				/* good news we believe it's a lockup */
-				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
-					 target_seq, seq);
-
-				/* change last activity so nobody else think there is a lockup */
-				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-					rdev->fence_drv[i].last_activity = jiffies;
-				}
-
-				/* mark the ring as not ready any more */
-				rdev->ring[ring].ready = false;
-				if (lock_ring) {
+				dev_warn(rdev->dev, "GPU lockup (waiting for "
+					 "0x%016llx last fence id 0x%016llx on"
+					 " ring %d)\n",
+					 target_seq[i], last_seq[i], i);
+
+				/* remember that we need an reset */
+				rdev->needs_reset = true;
+				if (lock_ring)
 					mutex_unlock(&rdev->ring_lock);
-				}
+				wake_up_all(&rdev->fence_queue);
 				return -EDEADLK;
 			}
 
-			if (lock_ring) {
+			if (lock_ring)
 				mutex_unlock(&rdev->ring_lock);
-			}
 		}
 	}
 	return 0;
@@ -388,6 +404,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
  */
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 	int r;
 
 	if (fence == NULL) {
@@ -395,147 +412,15 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 		return -EINVAL;
 	}
 
-	r = radeon_fence_wait_seq(fence->rdev, fence->seq,
-				  fence->ring, intr, true);
-	if (r) {
-		return r;
-	}
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
-	return 0;
-}
-
-static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
-{
-	unsigned i;
-
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) {
-			return true;
-		}
-	}
-	return false;
-}
-
-/**
- * radeon_fence_wait_any_seq - wait for a sequence number on any ring
- *
- * @rdev: radeon device pointer
- * @target_seq: sequence number(s) we want to wait for
- * @intr: use interruptable sleep
- *
- * Wait for the requested sequence number(s) to be written by any ring
- * (all asics).  Sequnce number array is indexed by ring id.
- * @intr selects whether to use interruptable (true) or non-interruptable
- * (false) sleep when waiting for the sequence number.  Helper function
- * for radeon_fence_wait_any(), et al.
- * Returns 0 if the sequence number has passed, error for all other cases.
- */
-static int radeon_fence_wait_any_seq(struct radeon_device *rdev,
-				     u64 *target_seq, bool intr)
-{
-	unsigned long timeout, last_activity, tmp;
-	unsigned i, ring = RADEON_NUM_RINGS;
-	bool signaled;
-	int r;
-
-	for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (!target_seq[i]) {
-			continue;
-		}
-
-		/* use the most recent one as indicator */
-		if (time_after(rdev->fence_drv[i].last_activity, last_activity)) {
-			last_activity = rdev->fence_drv[i].last_activity;
-		}
-
-		/* For lockup detection just pick the lowest ring we are
-		 * actively waiting for
-		 */
-		if (i < ring) {
-			ring = i;
-		}
-	}
-
-	/* nothing to wait for ? */
-	if (ring == RADEON_NUM_RINGS) {
-		return -ENOENT;
-	}
-
-	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
-		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
-		if (time_after(last_activity, timeout)) {
-			/* the normal case, timeout is somewhere before last_activity */
-			timeout = last_activity - timeout;
-		} else {
-			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
-			 * anyway we will just wait for the minimum amount and then check for a lockup
-			 */
-			timeout = 1;
-		}
+	seq[fence->ring] = fence->seq;
+	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+		return 0;
 
-		trace_radeon_fence_wait_begin(rdev->ddev, target_seq[ring]);
-		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-			if (target_seq[i]) {
-				radeon_irq_kms_sw_irq_get(rdev, i);
-			}
-		}
-		if (intr) {
-			r = wait_event_interruptible_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
-				timeout);
-		} else {
-			r = wait_event_timeout(rdev->fence_queue,
-				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
-				timeout);
-		}
-		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-			if (target_seq[i]) {
-				radeon_irq_kms_sw_irq_put(rdev, i);
-			}
-		}
-		if (unlikely(r < 0)) {
-			return r;
-		}
-		trace_radeon_fence_wait_end(rdev->ddev, target_seq[ring]);
-
-		if (unlikely(!signaled)) {
-			/* we were interrupted for some reason and fence
-			 * isn't signaled yet, resume waiting */
-			if (r) {
-				continue;
-			}
-
-			mutex_lock(&rdev->ring_lock);
-			for (i = 0, tmp = 0; i < RADEON_NUM_RINGS; ++i) {
-				if (time_after(rdev->fence_drv[i].last_activity, tmp)) {
-					tmp = rdev->fence_drv[i].last_activity;
-				}
-			}
-			/* test if somebody else has already decided that this is a lockup */
-			if (last_activity != tmp) {
-				last_activity = tmp;
-				mutex_unlock(&rdev->ring_lock);
-				continue;
-			}
-
-			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
-				/* good news we believe it's a lockup */
-				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx)\n",
-					 target_seq[ring]);
-
-				/* change last activity so nobody else think there is a lockup */
-				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-					rdev->fence_drv[i].last_activity = jiffies;
-				}
+	r = radeon_fence_wait_seq(fence->rdev, seq, intr, true);
+	if (r)
+		return r;
 
-				/* mark the ring as not ready any more */
-				rdev->ring[ring].ready = false;
-				mutex_unlock(&rdev->ring_lock);
-				return -EDEADLK;
-			}
-			mutex_unlock(&rdev->ring_lock);
-		}
-	}
+	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
 	return 0;
 }
 
@@ -557,7 +442,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 			  bool intr)
 {
 	uint64_t seq[RADEON_NUM_RINGS];
-	unsigned i;
+	unsigned i, num_rings = 0;
 	int r;
 
 	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -567,15 +452,19 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 			continue;
 		}
 
-		if (fences[i]->seq == RADEON_FENCE_SIGNALED_SEQ) {
-			/* something was allready signaled */
-			return 0;
-		}
-
 		seq[i] = fences[i]->seq;
+		++num_rings;
+
+		/* test if something was allready signaled */
+		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
+			return 0;
 	}
 
-	r = radeon_fence_wait_any_seq(rdev, seq, intr);
+	/* nothing to wait for ? */
+	if (num_rings == 0)
+		return -ENOENT;
+
+	r = radeon_fence_wait_seq(rdev, seq, intr, true);
 	if (r) {
 		return r;
 	}
@@ -594,15 +483,15 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  */
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
 {
-	uint64_t seq;
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 
-	seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
-	if (seq >= rdev->fence_drv[ring].sync_seq[ring]) {
+	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
+	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
 		/* nothing to wait for, last_seq is
 		   already the last emited fence */
 		return -ENOENT;
 	}
-	return radeon_fence_wait_seq(rdev, seq, ring, false, false);
+	return radeon_fence_wait_seq(rdev, seq, false, false);
 }
 
 /**
@@ -617,14 +506,15 @@ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
  */
 int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
 {
-	uint64_t seq = rdev->fence_drv[ring].sync_seq[ring];
+	uint64_t seq[RADEON_NUM_RINGS] = {};
 	int r;
 
-	r = radeon_fence_wait_seq(rdev, seq, ring, false, false);
+	seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
+	r = radeon_fence_wait_seq(rdev, seq, false, false);
 	if (r) {
-		if (r == -EDEADLK) {
+		if (r == -EDEADLK)
 			return -EDEADLK;
-		}
+
 		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n",
 			ring, r);
 	}
@@ -826,7 +716,6 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	for (i = 0; i < RADEON_NUM_RINGS; ++i)
 		rdev->fence_drv[ring].sync_seq[i] = 0;
 	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
-	rdev->fence_drv[ring].last_activity = jiffies;
 	rdev->fence_drv[ring].initialized = false;
 }
 
-- 
1.8.1.2


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-13 12:47             ` Christian König
@ 2013-10-13 20:16               ` Marek Olšák
  2013-10-14  9:19                 ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Marek Olšák @ 2013-10-13 20:16 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 8534 bytes --]

This seems to be better. It can do about 3-5 resets correctly, then
the GPU resuming fails:

[  246.882780] [drm:cik_resume] *ERROR* cik startup failed on resume

and then the GPU is being reset again and again endlessly without success.

The dmesg of the endless resets is attached.

Marek

On Sun, Oct 13, 2013 at 2:47 PM, Christian König
<deathsimple@vodafone.de> wrote:
> I've figured out what was wrong with the patch. We need to reset the
> "needs_reset" flag earlier, otherwise the IB test might think we are in a
> lockup and aborts the reset after waiting for the minimum timeout period.
>
> Please try the attached patch instead.
>
> Thanks,
> Christian.
>
> Am 09.10.2013 14:04, schrieb Marek Olšák:
>
>> The ring test of the first compute ring always fails and it shouldn't
>> affect the GPU reset in any way.
>>
>> I can't tell if the deadlock issue is fixed, because the GPU reset
>> usually fails with your patch. It always succeeded without your patch.
>>
>> Marek
>>
>> On Wed, Oct 9, 2013 at 1:09 PM, Christian König <deathsimple@vodafone.de>
>> wrote:
>>>
>>> Mhm, that doesn't looks like anything related but more like the reset of
>>> the
>>> compute ring didn't worked.
>>>
>>> How often does that happen? And do you still get the problem where X
>>> waits
>>> for a fence that never comes back?
>>>
>>> Christian.
>>>
>>> Am 09.10.2013 12:36, schrieb Marek Olšák:
>>>
>>>> I'm afraid your patch sometimes causes the GPU reset to fail, which
>>>> had never happened before IIRC.
>>>>
>>>> The dmesg log from the failure is attached.
>>>>
>>>> Marek
>>>>
>>>> On Tue, Oct 8, 2013 at 6:21 PM, Christian König
>>>> <deathsimple@vodafone.de>
>>>> wrote:
>>>>>
>>>>> Hi Marek,
>>>>>
>>>>> please try the attached patch as a replacement for your signaling all
>>>>> fences
>>>>> patch. I'm not 100% sure if it fixes all issues, but it's at least a
>>>>> start.
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> Am 07.10.2013 13:08, schrieb Christian König:
>>>>>
>>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>>> deadlock at the same time.
>>>>>>
>>>>>>
>>>>>> Alex and I spend quite some time on making this reliable again after
>>>>>> activating more rings and adding VM support. The main problem is that
>>>>>> I
>>>>>> couldn't figure out where the CPU deadlock comes from, cause I
>>>>>> couldn't
>>>>>> reliable reproduce the issue.
>>>>>>
>>>>>> What is the content of /proc/<pid of X server>/task/*/stack and
>>>>>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in
>>>>>> the
>>>>>> deadlock situation?
>>>>>>
>>>>>> I'm pretty sure that we nearly always have a problem when two threads
>>>>>> are
>>>>>> waiting for fences on one of them detects that we have a lockup while
>>>>>> the
>>>>>> other one keeps holding the exclusive lock. Signaling all fences might
>>>>>> work
>>>>>> around that problem, but it probably would be better to fix the
>>>>>> underlying
>>>>>> issue.
>>>>>>
>>>>>> Going to take a deeper look into it.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>>>>>
>>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>>> deadlock at the same time.
>>>>>>>
>>>>>>> Regarding the issue with fences, the problem is that the GPU reset
>>>>>>> completes successfully according to dmesg, but X doesn't respond. I
>>>>>>> can move the cursor on the screen, but I can't do anything else and
>>>>>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>>>>>> easily reproduce this, because it's the most common reason why a GPU
>>>>>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>>>>>> can't tell whether the fences are just not signalled or whether there
>>>>>>> is actually a real CPU deadlock I can't see.
>>>>>>>
>>>>>>> This patch makes the problem go away and GPU resets are successful
>>>>>>> (except for extreme cases, see below). With a small enough lockup
>>>>>>> timeout, the lockups are just a minor annoyance and I thought I could
>>>>>>> get through a piglit run just with a few tens or hundreds of GPU
>>>>>>> resets...
>>>>>>>
>>>>>>> A different type of deadlock showed up, though it needs a lot of
>>>>>>> concurrently-running apps like piglit. What happened is that the
>>>>>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>>>>>> to a GPU hang while holding onto the exclusive lock, and another
>>>>>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>>>>>
>>>>>>> That said, I will use the patch locally, because it helps a lot. I
>>>>>>> got
>>>>>>> a few lockups while writing this email and I'm glad I didn't have to
>>>>>>> reboot.
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König
>>>>>>> <deathsimple@vodafone.de>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Possible, but I would rather guess that this doesn't work because
>>>>>>>> the
>>>>>>>> IB
>>>>>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>>>>>> completes.
>>>>>>>>
>>>>>>>> Can you reproduce the problem?
>>>>>>>>
>>>>>>>> If you want to make GPU resets more reliable I would rather suggest
>>>>>>>> to
>>>>>>>> remove the ring lock dependency.
>>>>>>>> Then we should try to give all the fence wait functions a (reliable)
>>>>>>>> timeout and move reset handling a layer up into the ioctl functions.
>>>>>>>> But for
>>>>>>>> this you need to rip out the old PM code first.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>
>>>>>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>>>>>
>>>>>>>>> Marek
>>>>>>>>>
>>>>>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>
>>>>>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>>>>>> signalling all remaining fences with an IB test.
>>>>>>>>>>
>>>>>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>>>>>
>>>>>>>>>> Signalling all fences regardless of the outcome of the reset
>>>>>>>>>> creates
>>>>>>>>>> problems with both types of partial resets.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>>>
>>>>>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>>>
>>>>>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>>>>>> in an X server freeze.
>>>>>>>>>>>
>>>>>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>>>>>> lockup.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>>>>>> *rdev)
>>>>>>>>>>>          radeon_save_bios_scratch_regs(rdev);
>>>>>>>>>>>          /* block TTM */
>>>>>>>>>>>          resched =
>>>>>>>>>>> ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>>>>>> +
>>>>>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>>>>>> +
>>>>>>>>>>>          radeon_pm_suspend(rdev);
>>>>>>>>>>>          radeon_suspend(rdev);
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> 1.8.1.2
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>

[-- Attachment #2: dmesg --]
[-- Type: application/octet-stream, Size: 88646 bytes --]

[  246.851552] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  246.851563] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  246.851564] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  246.851566] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  246.851567] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  246.851568] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  246.851570] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  246.851571] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  246.851572] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  246.851573] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  246.851575] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  246.851576] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  246.851578] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  246.851579] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  246.851581] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  246.851582] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  246.851583] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  246.851585] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  246.851586] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  246.851587] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  246.851589] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  246.851590] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  246.851592] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  246.851593] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  246.859197] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  246.859248] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  246.860393] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  246.860395] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  246.860396] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  246.860397] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  246.860399] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  246.860400] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  246.860401] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  246.860402] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  246.860404] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  246.860405] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  246.860407] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  246.860408] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  246.860409] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  246.860411] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  246.860412] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  246.860413] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  246.860415] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  246.860416] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  246.860417] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  246.860419] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  246.860425] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  246.877962] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  246.877964] [drm] PCIE gen 3 link speeds already enabled
[  246.880358] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  246.880446] radeon 0000:01:00.0: WB enabled
[  246.880451] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  246.880453] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  246.880454] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  246.880455] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  246.880457] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  246.880835] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  246.882779] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  246.882780] [drm:cik_resume] *ERROR* cik startup failed on resume
[  246.892649] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  247.025237] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  247.025239] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  248.027735] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  248.027738] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  248.027747] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  248.027750] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  248.036601] radeon 0000:01:00.0: Saved 189590 dwords of commands on ring 0.
[  248.036738] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  248.036748] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  248.036749] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  248.036751] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  248.036752] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  248.036753] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  248.036755] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  248.036756] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  248.036757] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  248.036759] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  248.036760] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  248.036762] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  248.036763] radeon 0000:01:00.0:   CP_STAT = 0x84838600
[  248.036764] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  248.036766] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  248.036767] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  248.036768] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  248.036770] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  248.036771] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  248.036772] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  248.036774] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  248.036775] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  248.036777] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  248.036778] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  248.047980] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  248.048032] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  248.049177] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  248.049178] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  248.049180] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  248.049181] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  248.049182] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  248.049183] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  248.049185] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  248.049186] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  248.049187] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  248.049189] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  248.049190] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  248.049191] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  248.049193] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  248.049194] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  248.049195] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  248.049197] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  248.049198] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  248.049200] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  248.049201] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  248.049202] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  248.049209] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  248.066749] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  248.066751] [drm] PCIE gen 3 link speeds already enabled
[  248.069148] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  248.069234] radeon 0000:01:00.0: WB enabled
[  248.069239] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  248.069240] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  248.069241] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  248.069243] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  248.069244] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  248.069622] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  248.071565] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  248.071566] [drm:cik_resume] *ERROR* cik startup failed on resume
[  248.072176] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  248.072177] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  248.072178] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  248.072180] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  248.072183] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  248.072184] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  248.072185] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  248.072186] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  248.209955] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  248.209957] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  248.225925] radeon 0000:01:00.0: Saved 191158 dwords of commands on ring 0.
[  248.226062] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  248.226073] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  248.226075] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  248.226076] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  248.226077] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  248.226078] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  248.226080] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  248.226081] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  248.226082] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  248.226084] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  248.226085] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  248.226087] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  248.226088] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  248.226090] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  248.226091] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  248.226092] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  248.226094] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  248.226095] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  248.226096] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  248.226098] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  248.226099] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  248.226100] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  248.226102] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  248.226103] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  248.232769] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  248.232821] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  248.233966] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  248.233967] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  248.233969] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  248.233970] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  248.233971] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  248.233973] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  248.233974] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  248.233975] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  248.233977] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  248.233978] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  248.233980] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  248.233981] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  248.233982] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  248.233983] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  248.233985] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  248.233986] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  248.233988] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  248.233989] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  248.233990] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  248.233992] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  248.233998] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  248.251535] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  248.251537] [drm] PCIE gen 3 link speeds already enabled
[  248.253928] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  248.254011] radeon 0000:01:00.0: WB enabled
[  248.254016] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  248.254017] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  248.254019] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  248.254020] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  248.254021] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  248.254399] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  248.256318] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  248.256319] [drm:cik_resume] *ERROR* cik startup failed on resume
[  248.256922] radeon 0000:01:00.0: GPU fault detected: 146 0x0522760c
[  248.256923] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000129
[  248.256924] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  248.256925] VM fault (0x0c, vmid 1) at page 297, read from 'CPF' (0x43504600) (118)
[  248.394721] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  248.394722] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  249.717467] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  249.717474] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  249.726395] radeon 0000:01:00.0: Saved 191574 dwords of commands on ring 0.
[  249.726530] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  249.726541] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  249.726542] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  249.726544] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  249.726545] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  249.726547] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  249.726548] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  249.726549] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  249.726550] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  249.726552] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  249.726553] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  249.726555] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  249.726556] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  249.726558] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  249.726559] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  249.726560] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  249.726562] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  249.726563] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  249.726564] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  249.726566] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  249.726567] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  249.726568] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  249.726570] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  249.726571] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  249.734274] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  249.734325] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  249.735470] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  249.735472] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  249.735473] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  249.735474] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  249.735476] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  249.735477] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  249.735478] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  249.735479] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  249.735481] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  249.735482] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  249.735484] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  249.735485] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  249.735486] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  249.735488] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  249.735489] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  249.735490] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  249.735492] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  249.735493] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  249.735494] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  249.735496] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  249.735502] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  249.753040] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  249.753042] [drm] PCIE gen 3 link speeds already enabled
[  249.755534] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  249.755612] radeon 0000:01:00.0: WB enabled
[  249.755617] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  249.755618] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  249.755619] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  249.755621] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  249.755622] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  249.756030] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  249.758005] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  249.758006] [drm:cik_resume] *ERROR* cik startup failed on resume
[  249.768382] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  249.901066] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  249.901067] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  250.902723] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  250.902729] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014851 last fence id 0x00000000000147f4 on ring 0)
[  250.902742] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  250.902748] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  250.911377] radeon 0000:01:00.0: Saved 194294 dwords of commands on ring 0.
[  250.911513] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  250.911524] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  250.911525] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  250.911527] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  250.911528] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  250.911529] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  250.911531] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  250.911532] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  250.911533] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  250.911534] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  250.911536] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  250.911537] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  250.911539] radeon 0000:01:00.0:   CP_STAT = 0x84838600
[  250.911540] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  250.911542] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  250.911543] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  250.911544] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  250.911546] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  250.911547] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  250.911548] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  250.911550] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  250.911551] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  250.911553] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  250.911554] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  250.923060] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  250.923112] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  250.924257] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  250.924258] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  250.924259] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  250.924260] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  250.924262] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  250.924263] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  250.924264] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  250.924266] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  250.924267] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  250.924268] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  250.924270] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  250.924271] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  250.924272] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  250.924274] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  250.924275] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  250.924276] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  250.924278] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  250.924279] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  250.924280] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  250.924282] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  250.924288] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  250.941828] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  250.941830] [drm] PCIE gen 3 link speeds already enabled
[  250.944225] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  250.944302] radeon 0000:01:00.0: WB enabled
[  250.944307] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  250.944309] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  250.944310] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  250.944311] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  250.944312] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  250.944690] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  250.946596] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  250.946597] [drm:cik_resume] *ERROR* cik startup failed on resume
[  250.947221] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  250.947222] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  250.947223] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  250.947225] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  250.947228] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  250.947229] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  250.947230] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  250.947231] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  251.085146] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  251.085147] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  251.101092] radeon 0000:01:00.0: Saved 195862 dwords of commands on ring 0.
[  251.101228] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  251.101239] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  251.101241] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  251.101242] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  251.101243] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  251.101245] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  251.101246] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  251.101247] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  251.101249] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  251.101250] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  251.101251] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  251.101253] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  251.101254] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  251.101256] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  251.101257] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  251.101258] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  251.101260] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  251.101261] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  251.101262] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  251.101264] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  251.101265] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  251.101266] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  251.101268] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  251.101269] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  251.107841] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  251.107893] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  251.109038] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  251.109039] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  251.109040] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  251.109042] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  251.109043] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  251.109044] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  251.109046] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  251.109047] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  251.109048] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  251.109050] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  251.109051] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  251.109053] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  251.109054] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  251.109055] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  251.109057] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  251.109058] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  251.109059] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  251.109061] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  251.109062] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  251.109063] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  251.109070] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  251.126606] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  251.126608] [drm] PCIE gen 3 link speeds already enabled
[  251.129001] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  251.129077] radeon 0000:01:00.0: WB enabled
[  251.129082] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  251.129083] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  251.129084] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  251.129086] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  251.129087] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  251.129465] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  251.131384] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  251.131385] [drm:cik_resume] *ERROR* cik startup failed on resume
[  251.132001] radeon 0000:01:00.0: GPU fault detected: 146 0x0522760c
[  251.132002] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000129
[  251.132004] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  251.132005] VM fault (0x0c, vmid 1) at page 297, read from 'CPF' (0x43504600) (118)
[  251.269801] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  251.269803] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  252.592512] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  252.592518] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014851 last fence id 0x00000000000147f4 on ring 0)
[  252.601477] radeon 0000:01:00.0: Saved 196278 dwords of commands on ring 0.
[  252.601612] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  252.601623] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  252.601625] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  252.601626] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  252.601627] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  252.601629] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  252.601630] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  252.601631] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  252.601632] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  252.601634] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  252.601635] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  252.601637] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  252.601638] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  252.601640] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  252.601641] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  252.601642] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  252.601643] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  252.601645] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  252.601646] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  252.601648] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  252.601649] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  252.601650] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  252.601652] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  252.601653] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  252.609343] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  252.609395] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  252.610540] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  252.610541] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  252.610543] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  252.610544] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  252.610545] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  252.610546] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  252.610548] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  252.610549] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  252.610550] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  252.610552] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  252.610553] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  252.610555] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  252.610556] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  252.610557] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  252.610559] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  252.610560] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  252.610561] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  252.610563] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  252.610564] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  252.610565] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  252.610572] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  252.628166] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  252.628168] [drm] PCIE gen 3 link speeds already enabled
[  252.630560] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  252.630638] radeon 0000:01:00.0: WB enabled
[  252.630643] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  252.630644] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  252.630646] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  252.630647] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  252.630648] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  252.631026] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  252.632969] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  252.632970] [drm:cik_resume] *ERROR* cik startup failed on resume
[  252.643125] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  252.775761] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  252.775762] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  253.777783] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  253.777789] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  253.777802] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  253.777808] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  253.786563] radeon 0000:01:00.0: Saved 198998 dwords of commands on ring 0.
[  253.786699] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  253.786710] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  253.786711] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  253.786713] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  253.786714] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  253.786715] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  253.786716] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  253.786718] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  253.786719] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  253.786720] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  253.786722] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  253.786723] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  253.786725] radeon 0000:01:00.0:   CP_STAT = 0x84838600
[  253.786726] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  253.786728] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  253.786729] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  253.786730] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  253.786732] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  253.786733] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  253.786734] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  253.786736] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  253.786737] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  253.786738] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  253.786740] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  253.798099] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  253.798151] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  253.799296] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  253.799298] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  253.799299] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  253.799300] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  253.799301] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  253.799303] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  253.799304] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  253.799305] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  253.799307] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  253.799308] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  253.799309] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  253.799311] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  253.799312] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  253.799313] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  253.799315] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  253.799316] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  253.799317] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  253.799319] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  253.799320] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  253.799321] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  253.799328] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  253.816870] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  253.816872] [drm] PCIE gen 3 link speeds already enabled
[  253.819268] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  253.819347] radeon 0000:01:00.0: WB enabled
[  253.819352] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  253.819353] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  253.819354] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  253.819356] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  253.819357] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  253.819735] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  253.821658] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  253.821659] [drm:cik_resume] *ERROR* cik startup failed on resume
[  253.822297] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  253.822298] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  253.822299] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  253.822301] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  253.822304] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  253.822305] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  253.822306] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  253.822307] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  253.960442] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  253.960444] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  253.976199] radeon 0000:01:00.0: Saved 200566 dwords of commands on ring 0.
[  253.976335] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  253.976345] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  253.976347] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  253.976348] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  253.976349] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  253.976351] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  253.976352] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  253.976353] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  253.976355] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  253.976356] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  253.976358] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  253.976359] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  253.976360] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  253.976362] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  253.976363] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  253.976364] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  253.976366] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  253.976367] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  253.976369] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  253.976370] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  253.976371] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  253.976373] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  253.976374] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  253.976376] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  253.982872] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  253.982924] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  253.984069] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  253.984070] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  253.984072] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  253.984073] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  253.984074] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  253.984076] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  253.984077] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  253.984078] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  253.984080] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  253.984081] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  253.984083] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  253.984084] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  253.984085] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  253.984087] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  253.984088] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  253.984089] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  253.984091] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  253.984092] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  253.984093] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  253.984095] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  253.984101] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  254.001635] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  254.001637] [drm] PCIE gen 3 link speeds already enabled
[  254.004048] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  254.004126] radeon 0000:01:00.0: WB enabled
[  254.004131] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  254.004132] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  254.004133] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  254.004134] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  254.004136] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  254.004513] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  254.006456] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  254.006458] [drm:cik_resume] *ERROR* cik startup failed on resume
[  254.007088] radeon 0000:01:00.0: GPU fault detected: 146 0x0522760c
[  254.007089] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000129
[  254.007090] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  254.007091] VM fault (0x0c, vmid 1) at page 297, read from 'CPF' (0x43504600) (118)
[  254.144812] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  254.144814] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  255.467559] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  255.467562] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  255.476522] radeon 0000:01:00.0: Saved 200982 dwords of commands on ring 0.
[  255.476658] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  255.476669] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  255.476670] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  255.476672] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  255.476673] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  255.476674] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  255.476676] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  255.476677] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  255.476678] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  255.476680] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  255.476681] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  255.476683] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  255.476684] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  255.476685] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  255.476687] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  255.476688] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  255.476689] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  255.476691] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  255.476692] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  255.476693] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  255.476695] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  255.476696] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  255.476698] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  255.476699] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  255.484376] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  255.484428] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  255.485573] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  255.485574] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  255.485575] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  255.485577] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  255.485578] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  255.485579] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  255.485580] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  255.485582] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  255.485583] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  255.485585] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  255.485586] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  255.485587] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  255.485589] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  255.485590] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  255.485591] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  255.485593] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  255.485594] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  255.485595] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  255.485597] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  255.485598] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  255.485604] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  255.503144] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  255.503146] [drm] PCIE gen 3 link speeds already enabled
[  255.505541] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  255.505621] radeon 0000:01:00.0: WB enabled
[  255.505626] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  255.505627] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  255.505628] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  255.505629] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  255.505631] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  255.506009] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  255.507932] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  255.507933] [drm:cik_resume] *ERROR* cik startup failed on resume
[  255.518300] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  255.647804] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  255.647806] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  256.652832] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  256.652838] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014851 last fence id 0x00000000000147f4 on ring 0)
[  256.652850] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  256.652856] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  256.661507] radeon 0000:01:00.0: Saved 203702 dwords of commands on ring 0.
[  256.661642] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  256.661652] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  256.661654] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  256.661655] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  256.661657] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  256.661658] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  256.661659] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  256.661661] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  256.661662] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  256.661663] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  256.661665] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  256.661666] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  256.661668] radeon 0000:01:00.0:   CP_STAT = 0x80838600
[  256.661669] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  256.661670] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  256.661672] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  256.661673] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  256.661674] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  256.661676] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  256.661677] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  256.661678] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  256.661680] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  256.661681] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  256.661683] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  256.673157] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  256.673209] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  256.674354] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  256.674355] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  256.674356] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  256.674358] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  256.674359] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  256.674360] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  256.674362] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  256.674363] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  256.674364] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  256.674366] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  256.674367] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  256.674369] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  256.674370] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  256.674371] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  256.674373] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  256.674374] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  256.674375] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  256.674377] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  256.674378] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  256.674379] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  256.674386] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  256.691920] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  256.691922] [drm] PCIE gen 3 link speeds already enabled
[  256.694314] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  256.694394] radeon 0000:01:00.0: WB enabled
[  256.694399] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  256.694400] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  256.694401] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  256.694402] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  256.694404] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  256.694782] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  256.696701] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  256.696703] [drm:cik_resume] *ERROR* cik startup failed on resume
[  256.697353] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  256.697354] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  256.697355] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  256.697356] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  256.697360] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  256.697361] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  256.697362] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  256.697363] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  256.835112] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  256.835113] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  256.851156] radeon 0000:01:00.0: Saved 205270 dwords of commands on ring 0.
[  256.851292] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  256.851303] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  256.851304] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  256.851306] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  256.851307] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  256.851308] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  256.851310] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  256.851311] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  256.851312] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  256.851314] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  256.851315] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  256.851317] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  256.851318] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  256.851319] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  256.851321] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  256.851322] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  256.851323] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  256.851325] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  256.851326] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  256.851327] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  256.851329] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  256.851330] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  256.851332] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  256.851333] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  256.857917] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  256.857969] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  256.859114] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  256.859115] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  256.859116] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  256.859118] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  256.859119] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  256.859120] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  256.859121] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  256.859123] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  256.859124] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  256.859126] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  256.859127] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  256.859128] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  256.859130] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  256.859131] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  256.859132] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  256.859134] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  256.859135] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  256.859136] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  256.859138] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  256.859139] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  256.859146] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  256.876718] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  256.876720] [drm] PCIE gen 3 link speeds already enabled
[  256.879110] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  256.879189] radeon 0000:01:00.0: WB enabled
[  256.879193] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  256.879194] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  256.879195] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  256.879197] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  256.879198] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  256.879575] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  256.881518] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  256.881520] [drm:cik_resume] *ERROR* cik startup failed on resume
[  256.882163] radeon 0000:01:00.0: GPU fault detected: 146 0x0522760c
[  256.882164] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000129
[  256.882165] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  256.882167] VM fault (0x0c, vmid 1) at page 297, read from 'CPF' (0x43504600) (118)
[  257.019853] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  257.019854] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  258.342620] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  258.342633] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014851 last fence id 0x00000000000147f4 on ring 0)
[  258.351600] radeon 0000:01:00.0: Saved 205686 dwords of commands on ring 0.
[  258.351735] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  258.351746] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  258.351747] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  258.351749] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  258.351750] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  258.351751] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  258.351752] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  258.351754] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  258.351755] radeon 0000:01:00.0:   SRBM_STATUS=0x20000A40
[  258.351756] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  258.351758] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  258.351759] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  258.351761] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  258.351762] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  258.351763] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  258.351765] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  258.351766] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  258.351767] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  258.351769] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  258.351770] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  258.351771] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  258.351773] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  258.351774] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  258.351776] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  258.359421] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  258.359473] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  258.360618] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  258.360620] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  258.360621] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  258.360622] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  258.360623] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  258.360625] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  258.360626] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  258.360627] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  258.360629] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  258.360630] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  258.360631] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  258.360633] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  258.360634] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  258.360635] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  258.360637] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  258.360638] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  258.360639] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  258.360641] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  258.360642] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  258.360643] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  258.360650] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  258.378185] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  258.378187] [drm] PCIE gen 3 link speeds already enabled
[  258.380580] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  258.380661] radeon 0000:01:00.0: WB enabled
[  258.380666] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  258.380667] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  258.380668] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  258.380669] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  258.380671] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  258.381048] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  258.382967] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  258.382969] [drm:cik_resume] *ERROR* cik startup failed on resume
[  258.394079] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  258.523297] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  258.523299] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  259.527881] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  259.527887] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  259.527900] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  259.527907] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  259.536539] radeon 0000:01:00.0: Saved 208406 dwords of commands on ring 0.
[  259.536674] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  259.536685] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  259.536686] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  259.536688] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  259.536689] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  259.536690] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  259.536691] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  259.536693] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  259.536694] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  259.536695] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  259.536697] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  259.536698] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  259.536700] radeon 0000:01:00.0:   CP_STAT = 0x84838600
[  259.536701] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  259.536702] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  259.536704] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  259.536705] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  259.536706] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  259.536708] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  259.536709] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  259.536711] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  259.536712] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  259.536713] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  259.536715] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  259.548198] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  259.548250] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  259.549395] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  259.549397] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  259.549398] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  259.549399] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  259.549401] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  259.549402] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  259.549403] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  259.549404] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  259.549406] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  259.549407] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  259.549409] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  259.549410] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  259.549411] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  259.549413] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  259.549414] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  259.549415] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  259.549417] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  259.549418] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  259.549419] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  259.549421] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  259.549427] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  259.566964] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  259.566966] [drm] PCIE gen 3 link speeds already enabled
[  259.569361] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  259.569443] radeon 0000:01:00.0: WB enabled
[  259.569447] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  259.569448] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  259.569450] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  259.569451] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  259.569452] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  259.569830] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  259.571773] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  259.571774] [drm:cik_resume] *ERROR* cik startup failed on resume
[  259.572439] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  259.572440] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  259.572441] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  259.572443] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  259.572446] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  259.572447] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  259.572448] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  259.572449] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  259.710197] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  259.710199] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  259.726202] radeon 0000:01:00.0: Saved 209974 dwords of commands on ring 0.
[  259.726338] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  259.726349] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  259.726350] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  259.726352] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  259.726353] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  259.726354] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  259.726356] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  259.726357] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  259.726358] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  259.726360] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  259.726361] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  259.726363] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  259.726364] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  259.726365] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  259.726367] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  259.726368] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  259.726369] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  259.726371] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  259.726372] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  259.726373] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  259.726375] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  259.726376] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  259.726377] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  259.726379] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  259.732951] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  259.733003] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  259.734148] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  259.734149] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  259.734150] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  259.734152] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  259.734153] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  259.734154] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  259.734156] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  259.734157] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  259.734158] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  259.734160] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  259.734161] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  259.734162] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  259.734164] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  259.734165] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  259.734166] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  259.734168] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  259.734169] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  259.734171] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  259.734172] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  259.734173] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  259.734180] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  259.751720] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  259.751723] [drm] PCIE gen 3 link speeds already enabled
[  259.754112] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  259.754193] radeon 0000:01:00.0: WB enabled
[  259.754197] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  259.754198] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  259.754200] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  259.754201] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  259.754202] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  259.754580] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  259.756503] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  259.756504] [drm:cik_resume] *ERROR* cik startup failed on resume
[  259.757163] radeon 0000:01:00.0: GPU fault detected: 146 0x0522760c
[  259.757164] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000129
[  259.757165] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  259.757167] VM fault (0x0c, vmid 1) at page 297, read from 'CPF' (0x43504600) (118)
[  259.895224] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  259.895225] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  261.217691] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  261.217697] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014850 last fence id 0x00000000000147f4 on ring 0)
[  261.226753] radeon 0000:01:00.0: Saved 210390 dwords of commands on ring 0.
[  261.226888] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  261.226898] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  261.226899] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  261.226901] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  261.226902] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  261.226903] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  261.226905] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  261.226906] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  261.226907] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  261.226909] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  261.226910] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  261.226912] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  261.226913] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  261.226914] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  261.226916] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  261.226917] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  261.226918] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  261.226920] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  261.226921] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  261.226922] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  261.226924] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  261.226925] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  261.226927] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  261.226928] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  261.234467] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  261.234519] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  261.235664] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  261.235665] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  261.235667] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  261.235668] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  261.235669] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  261.235671] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  261.235672] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  261.235673] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  261.235675] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  261.235676] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  261.235677] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  261.235679] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  261.235680] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  261.235681] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  261.235683] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  261.235684] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  261.235685] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  261.235687] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  261.235688] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  261.235689] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  261.235696] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  261.253233] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  261.253235] [drm] PCIE gen 3 link speeds already enabled
[  261.255625] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  261.255707] radeon 0000:01:00.0: WB enabled
[  261.255711] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  261.255712] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  261.255713] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  261.255715] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  261.255716] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  261.256094] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  261.258037] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  261.258038] [drm:cik_resume] *ERROR* cik startup failed on resume
[  261.269184] [drm:cik_irq_process] *ERROR* Illegal register access in command stream
[  261.402173] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  261.402174] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  262.402971] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  262.402977] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000001489c last fence id 0x00000000000147f4 on ring 0)
[  262.403045] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[  262.403049] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014851 last fence id 0x00000000000147f4 on ring 0)
[  262.412303] radeon 0000:01:00.0: Saved 213110 dwords of commands on ring 0.
[  262.412438] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  262.412448] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  262.412450] radeon 0000:01:00.0:   GRBM_STATUS=0xA0083028
[  262.412451] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  262.412453] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  262.412454] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  262.412455] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  262.412456] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  262.412458] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  262.412459] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  262.412461] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  262.412462] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  262.412463] radeon 0000:01:00.0:   CP_STAT = 0x84838600
[  262.412465] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x20000c00
[  262.412466] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
[  262.412467] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  262.412469] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440006
[  262.412470] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  262.412472] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008063
[  262.412473] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  262.412474] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  262.412476] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  262.412477] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  262.412479] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  262.427255] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  262.427307] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  262.428452] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  262.428454] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  262.428455] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  262.428456] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  262.428458] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  262.428459] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  262.428460] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  262.428461] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  262.428463] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  262.428464] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  262.428466] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  262.428467] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  262.428468] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  262.428470] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  262.428471] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  262.428472] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  262.428474] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  262.428475] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  262.428476] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  262.428478] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  262.428484] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  262.446027] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  262.446029] [drm] PCIE gen 3 link speeds already enabled
[  262.448420] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  262.448503] radeon 0000:01:00.0: WB enabled
[  262.448507] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffc78c00
[  262.448509] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffc78c04
[  262.448510] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffc78c08
[  262.448511] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffc78c0c
[  262.448512] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffc78c10
[  262.448890] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xf8bb6c98
[  262.450813] [drm:cik_ring_test] *ERROR* radeon: cp failed to get scratch reg (-22).
[  262.450814] [drm:cik_resume] *ERROR* cik startup failed on resume
[  262.451495] radeon 0000:01:00.0: GPU fault detected: 146 0x04c2760c
[  262.451496] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000126
[  262.451497] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207600C
[  262.451499] VM fault (0x0c, vmid 1) at page 294, read from 'CPF' (0x43504600) (118)
[  262.451502] radeon 0000:01:00.0: GPU fault detected: 146 0x04e2760c
[  262.451503] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  262.451504] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  262.451505] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[  262.589423] [drm:ci_dpm_enable] *ERROR* ci_start_dpm failed
[  262.589424] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[  262.605408] radeon 0000:01:00.0: Saved 214678 dwords of commands on ring 0.
[  262.605544] radeon 0000:01:00.0: Saved 131065 dwords of commands on ring 2.
[  262.605555] radeon 0000:01:00.0: GPU softreset: 0x00000008
[  262.605556] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  262.605558] radeon 0000:01:00.0:   GRBM_STATUS2=0x70000008
[  262.605559] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  262.605560] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  262.605562] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  262.605563] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  262.605564] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  262.605566] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  262.605567] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  262.605569] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  262.605570] radeon 0000:01:00.0:   CP_STAT = 0x84008200
[  262.605571] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[  262.605573] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000100
[  262.605574] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
[  262.605575] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440002
[  262.605577] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000001
[  262.605578] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008023
[  262.605579] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  262.605581] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  262.605582] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  262.605584] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  262.605585] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  262.612017] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[  262.612068] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  262.613214] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003028
[  262.613215] radeon 0000:01:00.0:   GRBM_STATUS2=0x30000008
[  262.613216] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[  262.613217] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[  262.613219] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[  262.613220] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[  262.613221] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[  262.613223] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[  262.613224] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[  262.613226] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[  262.613227] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[  262.613228] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[  262.613230] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[  262.613231] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[  262.613232] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x48440000
[  262.613234] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[  262.613235] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80008003
[  262.613236] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000408
[  262.613238] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[  262.613239] radeon 0000:01:00.0:   CP_CPC_STATUS = 0xa0000041
[  262.613246] radeon 0000:01:00.0: GPU reset succeeded, trying to resume

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-13 20:16               ` Marek Olšák
@ 2013-10-14  9:19                 ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2013-10-14  9:19 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 8893 bytes --]

Ok, that one was easy to fix. Please apply the attached patch as well.

Going to send out both for inclusion in 3.12 in a minute.

Christian.

Am 13.10.2013 22:16, schrieb Marek Olšák:
> This seems to be better. It can do about 3-5 resets correctly, then
> the GPU resuming fails:
>
> [  246.882780] [drm:cik_resume] *ERROR* cik startup failed on resume
>
> and then the GPU is being reset again and again endlessly without success.
>
> The dmesg of the endless resets is attached.
>
> Marek
>
> On Sun, Oct 13, 2013 at 2:47 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> I've figured out what was wrong with the patch. We need to reset the
>> "needs_reset" flag earlier, otherwise the IB test might think we are in a
>> lockup and aborts the reset after waiting for the minimum timeout period.
>>
>> Please try the attached patch instead.
>>
>> Thanks,
>> Christian.
>>
>> Am 09.10.2013 14:04, schrieb Marek Olšák:
>>
>>> The ring test of the first compute ring always fails and it shouldn't
>>> affect the GPU reset in any way.
>>>
>>> I can't tell if the deadlock issue is fixed, because the GPU reset
>>> usually fails with your patch. It always succeeded without your patch.
>>>
>>> Marek
>>>
>>> On Wed, Oct 9, 2013 at 1:09 PM, Christian König <deathsimple@vodafone.de>
>>> wrote:
>>>> Mhm, that doesn't looks like anything related but more like the reset of
>>>> the
>>>> compute ring didn't worked.
>>>>
>>>> How often does that happen? And do you still get the problem where X
>>>> waits
>>>> for a fence that never comes back?
>>>>
>>>> Christian.
>>>>
>>>> Am 09.10.2013 12:36, schrieb Marek Olšák:
>>>>
>>>>> I'm afraid your patch sometimes causes the GPU reset to fail, which
>>>>> had never happened before IIRC.
>>>>>
>>>>> The dmesg log from the failure is attached.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Tue, Oct 8, 2013 at 6:21 PM, Christian König
>>>>> <deathsimple@vodafone.de>
>>>>> wrote:
>>>>>> Hi Marek,
>>>>>>
>>>>>> please try the attached patch as a replacement for your signaling all
>>>>>> fences
>>>>>> patch. I'm not 100% sure if it fixes all issues, but it's at least a
>>>>>> start.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 07.10.2013 13:08, schrieb Christian König:
>>>>>>
>>>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>>>> deadlock at the same time.
>>>>>>>
>>>>>>> Alex and I spend quite some time on making this reliable again after
>>>>>>> activating more rings and adding VM support. The main problem is that
>>>>>>> I
>>>>>>> couldn't figure out where the CPU deadlock comes from, cause I
>>>>>>> couldn't
>>>>>>> reliable reproduce the issue.
>>>>>>>
>>>>>>> What is the content of /proc/<pid of X server>/task/*/stack and
>>>>>>> sys/kernel/debug/dri/0/radeon_fence_info when the X server is stuck in
>>>>>>> the
>>>>>>> deadlock situation?
>>>>>>>
>>>>>>> I'm pretty sure that we nearly always have a problem when two threads
>>>>>>> are
>>>>>>> waiting for fences on one of them detects that we have a lockup while
>>>>>>> the
>>>>>>> other one keeps holding the exclusive lock. Signaling all fences might
>>>>>>> work
>>>>>>> around that problem, but it probably would be better to fix the
>>>>>>> underlying
>>>>>>> issue.
>>>>>>>
>>>>>>> Going to take a deeper look into it.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 03.10.2013 02:45, schrieb Marek Olšák:
>>>>>>>> First of all, I can't complain about the reliability of the hardware
>>>>>>>> GPU reset. It's mostly the kernel driver that happens to run into a
>>>>>>>> deadlock at the same time.
>>>>>>>>
>>>>>>>> Regarding the issue with fences, the problem is that the GPU reset
>>>>>>>> completes successfully according to dmesg, but X doesn't respond. I
>>>>>>>> can move the cursor on the screen, but I can't do anything else and
>>>>>>>> the UI is frozen. gdb says that X is stuck in GEM_WAIT_IDLE. I can
>>>>>>>> easily reproduce this, because it's the most common reason why a GPU
>>>>>>>> lockup leads to frozen X. The GPU actually recovers, but X is hung. I
>>>>>>>> can't tell whether the fences are just not signalled or whether there
>>>>>>>> is actually a real CPU deadlock I can't see.
>>>>>>>>
>>>>>>>> This patch makes the problem go away and GPU resets are successful
>>>>>>>> (except for extreme cases, see below). With a small enough lockup
>>>>>>>> timeout, the lockups are just a minor annoyance and I thought I could
>>>>>>>> get through a piglit run just with a few tens or hundreds of GPU
>>>>>>>> resets...
>>>>>>>>
>>>>>>>> A different type of deadlock showed up, though it needs a lot of
>>>>>>>> concurrently-running apps like piglit. What happened is that the
>>>>>>>> kernel driver was stuck/deadlocked in radeon_cs_ioctl presumably due
>>>>>>>> to a GPU hang while holding onto the exclusive lock, and another
>>>>>>>> thread wanting to do the GPU reset was unable to acquire the lock.
>>>>>>>>
>>>>>>>> That said, I will use the patch locally, because it helps a lot. I
>>>>>>>> got
>>>>>>>> a few lockups while writing this email and I'm glad I didn't have to
>>>>>>>> reboot.
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2013 at 4:50 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de>
>>>>>>>> wrote:
>>>>>>>>> Possible, but I would rather guess that this doesn't work because
>>>>>>>>> the
>>>>>>>>> IB
>>>>>>>>> test runs into a deadlock situation and so the GPU reset never fully
>>>>>>>>> completes.
>>>>>>>>>
>>>>>>>>> Can you reproduce the problem?
>>>>>>>>>
>>>>>>>>> If you want to make GPU resets more reliable I would rather suggest
>>>>>>>>> to
>>>>>>>>> remove the ring lock dependency.
>>>>>>>>> Then we should try to give all the fence wait functions a (reliable)
>>>>>>>>> timeout and move reset handling a layer up into the ioctl functions.
>>>>>>>>> But for
>>>>>>>>> this you need to rip out the old PM code first.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>>
>>>>>>>>>> I'm afraid signalling the fences with an IB test is not reliable.
>>>>>>>>>>
>>>>>>>>>> Marek
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 2, 2013 at 3:52 PM, Christian König
>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>> NAK, after recovering from a lockup the first thing we do is
>>>>>>>>>>> signalling all remaining fences with an IB test.
>>>>>>>>>>>
>>>>>>>>>>> If we don't recover we indeed signal all fences manually.
>>>>>>>>>>>
>>>>>>>>>>> Signalling all fences regardless of the outcome of the reset
>>>>>>>>>>> creates
>>>>>>>>>>> problems with both types of partial resets.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Marek Olšák <maraeo@gmail.com> schrieb:
>>>>>>>>>>>
>>>>>>>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>>>>
>>>>>>>>>>>> After a lockup, fences are not signalled sometimes, causing
>>>>>>>>>>>> the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>>>>>>>>>> in an X server freeze.
>>>>>>>>>>>>
>>>>>>>>>>>> This fixes only one of many deadlocks which can occur during a
>>>>>>>>>>>> lockup.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>>>>>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>>> index 841d0e0..7b97baa 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>>>>>>>>>> @@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device
>>>>>>>>>>>> *rdev)
>>>>>>>>>>>>           radeon_save_bios_scratch_regs(rdev);
>>>>>>>>>>>>           /* block TTM */
>>>>>>>>>>>>           resched =
>>>>>>>>>>>> ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>>>>>>>>>> +
>>>>>>>>>>>> +      mutex_lock(&rdev->ring_lock);
>>>>>>>>>>>> +      radeon_fence_driver_force_completion(rdev);
>>>>>>>>>>>> +      mutex_unlock(&rdev->ring_lock);
>>>>>>>>>>>> +
>>>>>>>>>>>>           radeon_pm_suspend(rdev);
>>>>>>>>>>>>           radeon_suspend(rdev);
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> 1.8.1.2
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-stop-the-leaks-in-cik_ib_test.patch --]
[-- Type: text/x-diff; name="0001-drm-radeon-stop-the-leaks-in-cik_ib_test.patch", Size: 1411 bytes --]

>From 3b690a7b016dc63ef49ae7ac593c8f7f09f80d0d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Mon, 14 Oct 2013 11:11:28 +0200
Subject: [PATCH 1/2] drm/radeon: stop the leaks in cik_ib_test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stop leaking IB memory and scratch register space when the test fails.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/cik.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index b874ccd..8f393df 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -3182,6 +3182,7 @@ int cik_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 	r = radeon_ib_get(rdev, ring->idx, &ib, NULL, 256);
 	if (r) {
 		DRM_ERROR("radeon: failed to get ib (%d).\n", r);
+		radeon_scratch_free(rdev, scratch);
 		return r;
 	}
 	ib.ptr[0] = PACKET3(PACKET3_SET_UCONFIG_REG, 1);
@@ -3198,6 +3199,8 @@ int cik_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 	r = radeon_fence_wait(ib.fence, false);
 	if (r) {
 		DRM_ERROR("radeon: fence wait failed (%d).\n", r);
+		radeon_scratch_free(rdev, scratch);
+		radeon_ib_free(rdev, &ib);
 		return r;
 	}
 	for (i = 0; i < rdev->usec_timeout; i++) {
-- 
1.8.1.2


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
  2013-10-02 13:52 Christian König
@ 2013-10-02 13:59 ` Marek Olšák
  0 siblings, 0 replies; 13+ messages in thread
From: Marek Olšák @ 2013-10-02 13:59 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

I'm afraid signalling the fences with an IB test is not reliable.

Marek

On Wed, Oct 2, 2013 at 3:52 PM, Christian König <deathsimple@vodafone.de> wrote:
> NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.
>
> If we don't recover we indeed signal all fences manually.
>
> Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.
>
> Christian.
>
> Marek Olšák <maraeo@gmail.com> schrieb:
>
>>From: Marek Olšák <marek.olsak@amd.com>
>>
>>After a lockup, fences are not signalled sometimes, causing
>>the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>in an X server freeze.
>>
>>This fixes only one of many deadlocks which can occur during a lockup.
>>
>>Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>---
>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>index 841d0e0..7b97baa 100644
>>--- a/drivers/gpu/drm/radeon/radeon_device.c
>>+++ b/drivers/gpu/drm/radeon/radeon_device.c
>>@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>       radeon_save_bios_scratch_regs(rdev);
>>       /* block TTM */
>>       resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>+
>>+      mutex_lock(&rdev->ring_lock);
>>+      radeon_fence_driver_force_completion(rdev);
>>+      mutex_unlock(&rdev->ring_lock);
>>+
>>       radeon_pm_suspend(rdev);
>>       radeon_suspend(rdev);
>>
>>--
>>1.8.1.2
>>
>>_______________________________________________
>>dri-devel mailing list
>>dri-devel@lists.freedesktop.org
>>http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
@ 2013-10-02 13:52 Christian König
  2013-10-02 13:59 ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2013-10-02 13:52 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.

If we don't recover we indeed signal all fences manually.

Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.

Christian.

Marek Olšák <maraeo@gmail.com> schrieb:

>From: Marek Olšák <marek.olsak@amd.com>
>
>After a lockup, fences are not signalled sometimes, causing
>the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>in an X server freeze.
>
>This fixes only one of many deadlocks which can occur during a lockup.
>
>Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>---
> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
>diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>index 841d0e0..7b97baa 100644
>--- a/drivers/gpu/drm/radeon/radeon_device.c
>+++ b/drivers/gpu/drm/radeon/radeon_device.c
>@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
> 	radeon_save_bios_scratch_regs(rdev);
> 	/* block TTM */
> 	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>+
>+	mutex_lock(&rdev->ring_lock);
>+	radeon_fence_driver_force_completion(rdev);
>+	mutex_unlock(&rdev->ring_lock);
>+
> 	radeon_pm_suspend(rdev);
> 	radeon_suspend(rdev);
> 
>-- 
>1.8.1.2
>
>_______________________________________________
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT
@ 2013-10-02 13:35 Marek Olšák
  0 siblings, 0 replies; 13+ messages in thread
From: Marek Olšák @ 2013-10-02 13:35 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

After a lockup, fences are not signalled sometimes, causing
the GEM_WAIT_IDLE ioctl to never return, which sometimes results
in an X server freeze.

This fixes only one of many deadlocks which can occur during a lockup.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 841d0e0..7b97baa 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 	radeon_save_bios_scratch_regs(rdev);
 	/* block TTM */
 	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+
+	mutex_lock(&rdev->ring_lock);
+	radeon_fence_driver_force_completion(rdev);
+	mutex_unlock(&rdev->ring_lock);
+
 	radeon_pm_suspend(rdev);
 	radeon_suspend(rdev);
 
-- 
1.8.1.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-14  9:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-02 14:50 [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT Christian König
2013-10-03  0:45 ` Marek Olšák
2013-10-07 11:08   ` Christian König
2013-10-08 16:21     ` Christian König
2013-10-09 10:36       ` Marek Olšák
2013-10-09 11:09         ` Christian König
2013-10-09 12:04           ` Marek Olšák
2013-10-13 12:47             ` Christian König
2013-10-13 20:16               ` Marek Olšák
2013-10-14  9:19                 ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2013-10-02 13:52 Christian König
2013-10-02 13:59 ` Marek Olšák
2013-10-02 13:35 Marek Olšák

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.