All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault
@ 2021-09-17 11:30 Andrey Grodzovsky
  2021-09-17 11:30 ` [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone Andrey Grodzovsky
  2021-09-17 12:00 ` [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault James Zhu
  0 siblings, 2 replies; 7+ messages in thread
From: Andrey Grodzovsky @ 2021-09-17 11:30 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Andrey Grodzovsky

Add more guards to MMIO access post device
unbind/unplug

Bug:https://bugs.archlinux.org/task/72092?project=1&order=dateopened&sort=desc&pagenum=1
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c |  8 ++++++--
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 17 +++++++++++------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
index e6e9ef50719e..a03c0fc8338f 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
@@ -22,6 +22,7 @@
  */
 
 #include <linux/firmware.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_vcn.h"
@@ -194,11 +195,14 @@ static int vcn_v2_0_sw_init(void *handle)
  */
 static int vcn_v2_0_sw_fini(void *handle)
 {
-	int r;
+	int r, idx;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 	volatile struct amdgpu_fw_shared *fw_shared = adev->vcn.inst->fw_shared_cpu_addr;
 
-	fw_shared->present_flag_0 = 0;
+	if (drm_dev_enter(&adev->ddev, &idx)) {
+		fw_shared->present_flag_0 = 0;
+		drm_dev_exit(idx);
+	}
 
 	amdgpu_virt_free_mm_table(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
index 2e6b7913bf6c..1780ad1eacd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
@@ -22,6 +22,7 @@
  */
 
 #include <linux/firmware.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_vcn.h"
@@ -235,17 +236,21 @@ static int vcn_v2_5_sw_init(void *handle)
  */
 static int vcn_v2_5_sw_fini(void *handle)
 {
-	int i, r;
+	int i, r, idx;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 	volatile struct amdgpu_fw_shared *fw_shared;
 
-	for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
-		if (adev->vcn.harvest_config & (1 << i))
-			continue;
-		fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
-		fw_shared->present_flag_0 = 0;
+	if (drm_dev_enter(&adev->ddev, &idx)) {
+		for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
+			if (adev->vcn.harvest_config & (1 << i))
+				continue;
+			fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
+			fw_shared->present_flag_0 = 0;
+		}
+		drm_dev_exit(idx);
 	}
 
+
 	if (amdgpu_sriov_vf(adev))
 		amdgpu_virt_free_mm_table(adev);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone
  2021-09-17 11:30 [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault Andrey Grodzovsky
@ 2021-09-17 11:30 ` Andrey Grodzovsky
  2021-09-17 20:50   ` Andrey Grodzovsky
  2021-09-20 16:12   ` Alex Deucher
  2021-09-17 12:00 ` [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault James Zhu
  1 sibling, 2 replies; 7+ messages in thread
From: Andrey Grodzovsky @ 2021-09-17 11:30 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Andrey Grodzovsky

Problem:
When device goes into suspend and unplugged during it
then all HW programming during resume fails leading
to a bad SW during pci remove handling which follows.
Because device is first resumed and only later removed
we cannot rely on drm_dev_enter/exit here.

Fix:
Use a flag we use for PCIe error recovery to avoid
accessing registres. This allows to successfully complete
pm resume sequence and finish pci remove.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index db21af5e84ed..04fb4e74fb20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1522,6 +1522,10 @@ static int amdgpu_pmops_resume(struct device *dev)
 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
 	int r;
 
+	/* Avoids registers access if device is physically gone */
+	if (!pci_device_is_present(adev->pdev))
+		adev->no_hw_access = true;
+
 	r = amdgpu_device_resume(drm_dev, true);
 	if (amdgpu_acpi_is_s0ix_active(adev))
 		adev->in_s0ix = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault
  2021-09-17 11:30 [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault Andrey Grodzovsky
  2021-09-17 11:30 ` [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone Andrey Grodzovsky
@ 2021-09-17 12:00 ` James Zhu
  2021-09-17 12:04   ` James Zhu
  1 sibling, 1 reply; 7+ messages in thread
From: James Zhu @ 2021-09-17 12:00 UTC (permalink / raw)
  To: amd-gfx

Hi Andrey

Can you apply this improvement  on vcn_v3_0_sw_init also?

With this adding, This patch is Reviewed-by: James Zhu <James.Zhu@amd.com>

Thanks & Best Regards!

James

On 2021-09-17 7:30 a.m., Andrey Grodzovsky wrote:
> Add more guards to MMIO access post device
> unbind/unplug
>
> Bug:https://bugs.archlinux.org/task/72092?project=1&order=dateopened&sort=desc&pagenum=1
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c |  8 ++++++--
>   drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 17 +++++++++++------
>   2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
> index e6e9ef50719e..a03c0fc8338f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
> @@ -22,6 +22,7 @@
>    */
>   
>   #include <linux/firmware.h>
> +#include <drm/drm_drv.h>
>   
>   #include "amdgpu.h"
>   #include "amdgpu_vcn.h"
> @@ -194,11 +195,14 @@ static int vcn_v2_0_sw_init(void *handle)
>    */
>   static int vcn_v2_0_sw_fini(void *handle)
>   {
> -	int r;
> +	int r, idx;
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   	volatile struct amdgpu_fw_shared *fw_shared = adev->vcn.inst->fw_shared_cpu_addr;
>   
> -	fw_shared->present_flag_0 = 0;
> +	if (drm_dev_enter(&adev->ddev, &idx)) {
> +		fw_shared->present_flag_0 = 0;
> +		drm_dev_exit(idx);
> +	}
>   
>   	amdgpu_virt_free_mm_table(adev);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
> index 2e6b7913bf6c..1780ad1eacd6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
> @@ -22,6 +22,7 @@
>    */
>   
>   #include <linux/firmware.h>
> +#include <drm/drm_drv.h>
>   
>   #include "amdgpu.h"
>   #include "amdgpu_vcn.h"
> @@ -235,17 +236,21 @@ static int vcn_v2_5_sw_init(void *handle)
>    */
>   static int vcn_v2_5_sw_fini(void *handle)
>   {
> -	int i, r;
> +	int i, r, idx;
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   	volatile struct amdgpu_fw_shared *fw_shared;
>   
> -	for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
> -		if (adev->vcn.harvest_config & (1 << i))
> -			continue;
> -		fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
> -		fw_shared->present_flag_0 = 0;
> +	if (drm_dev_enter(&adev->ddev, &idx)) {
> +		for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
> +			if (adev->vcn.harvest_config & (1 << i))
> +				continue;
> +			fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
> +			fw_shared->present_flag_0 = 0;
> +		}
> +		drm_dev_exit(idx);
>   	}
>   
> +
>   	if (amdgpu_sriov_vf(adev))
>   		amdgpu_virt_free_mm_table(adev);
>   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault
  2021-09-17 12:00 ` [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault James Zhu
@ 2021-09-17 12:04   ` James Zhu
  2021-09-17 14:06     ` Andrey Grodzovsky
  0 siblings, 1 reply; 7+ messages in thread
From: James Zhu @ 2021-09-17 12:04 UTC (permalink / raw)
  To: amd-gfx

typo. vcn_v3_0_sw_init   -->  vcn_v3_0_sw_fini

On 2021-09-17 8:00 a.m., James Zhu wrote:
> Hi Andrey
>
> Can you apply this improvement  on vcn_v3_0_sw_init also?
>
> With this adding, This patch is Reviewed-by: James Zhu 
> <James.Zhu@amd.com>
>
> Thanks & Best Regards!
>
> James
>
> On 2021-09-17 7:30 a.m., Andrey Grodzovsky wrote:
>> Add more guards to MMIO access post device
>> unbind/unplug
>>
>> Bug:https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.archlinux.org%2Ftask%2F72092%3Fproject%3D1%26order%3Ddateopened%26sort%3Ddesc%26pagenum%3D1&amp;data=04%7C01%7Cjames.zhu%40amd.com%7C79fe530a08c049d250c408d979d2c56b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637674768432274237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RsLTZM0ajMSf0GQpGYzVrvBNdAAEBSUTZ2qw2M2o0w4%3D&amp;reserved=0 
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c |  8 ++++++--
>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 17 +++++++++++------
>>   2 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>> index e6e9ef50719e..a03c0fc8338f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>> @@ -22,6 +22,7 @@
>>    */
>>     #include <linux/firmware.h>
>> +#include <drm/drm_drv.h>
>>     #include "amdgpu.h"
>>   #include "amdgpu_vcn.h"
>> @@ -194,11 +195,14 @@ static int vcn_v2_0_sw_init(void *handle)
>>    */
>>   static int vcn_v2_0_sw_fini(void *handle)
>>   {
>> -    int r;
>> +    int r, idx;
>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>       volatile struct amdgpu_fw_shared *fw_shared = 
>> adev->vcn.inst->fw_shared_cpu_addr;
>>   -    fw_shared->present_flag_0 = 0;
>> +    if (drm_dev_enter(&adev->ddev, &idx)) {
>> +        fw_shared->present_flag_0 = 0;
>> +        drm_dev_exit(idx);
>> +    }
>>         amdgpu_virt_free_mm_table(adev);
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c 
>> b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>> index 2e6b7913bf6c..1780ad1eacd6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>> @@ -22,6 +22,7 @@
>>    */
>>     #include <linux/firmware.h>
>> +#include <drm/drm_drv.h>
>>     #include "amdgpu.h"
>>   #include "amdgpu_vcn.h"
>> @@ -235,17 +236,21 @@ static int vcn_v2_5_sw_init(void *handle)
>>    */
>>   static int vcn_v2_5_sw_fini(void *handle)
>>   {
>> -    int i, r;
>> +    int i, r, idx;
>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>       volatile struct amdgpu_fw_shared *fw_shared;
>>   -    for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
>> -        if (adev->vcn.harvest_config & (1 << i))
>> -            continue;
>> -        fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
>> -        fw_shared->present_flag_0 = 0;
>> +    if (drm_dev_enter(&adev->ddev, &idx)) {
>> +        for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
>> +            if (adev->vcn.harvest_config & (1 << i))
>> +                continue;
>> +            fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
>> +            fw_shared->present_flag_0 = 0;
>> +        }
>> +        drm_dev_exit(idx);
>>       }
>>   +
>>       if (amdgpu_sriov_vf(adev))
>>           amdgpu_virt_free_mm_table(adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault
  2021-09-17 12:04   ` James Zhu
@ 2021-09-17 14:06     ` Andrey Grodzovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Andrey Grodzovsky @ 2021-09-17 14:06 UTC (permalink / raw)
  To: James Zhu, amd-gfx

Note that it already has this protection.

Andrey

On 2021-09-17 8:04 a.m., James Zhu wrote:
> typo. vcn_v3_0_sw_init   -->  vcn_v3_0_sw_fini
>
> On 2021-09-17 8:00 a.m., James Zhu wrote:
>> Hi Andrey
>>
>> Can you apply this improvement  on vcn_v3_0_sw_init also?
>>
>> With this adding, This patch is Reviewed-by: James Zhu 
>> <James.Zhu@amd.com>
>>
>> Thanks & Best Regards!
>>
>> James
>>
>> On 2021-09-17 7:30 a.m., Andrey Grodzovsky wrote:
>>> Add more guards to MMIO access post device
>>> unbind/unplug
>>>
>>> Bug:https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.archlinux.org%2Ftask%2F72092%3Fproject%3D1%26order%3Ddateopened%26sort%3Ddesc%26pagenum%3D1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C209112865fef455dba0208d979d35e93%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637674771021422447%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pskUESxJBYWMjGxbTqb5W%2FwpXUpui9c%2FyUEl7HX9PA8%3D&amp;reserved=0 
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c |  8 ++++++--
>>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 17 +++++++++++------
>>>   2 files changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>> index e6e9ef50719e..a03c0fc8338f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>> @@ -22,6 +22,7 @@
>>>    */
>>>     #include <linux/firmware.h>
>>> +#include <drm/drm_drv.h>
>>>     #include "amdgpu.h"
>>>   #include "amdgpu_vcn.h"
>>> @@ -194,11 +195,14 @@ static int vcn_v2_0_sw_init(void *handle)
>>>    */
>>>   static int vcn_v2_0_sw_fini(void *handle)
>>>   {
>>> -    int r;
>>> +    int r, idx;
>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>       volatile struct amdgpu_fw_shared *fw_shared = 
>>> adev->vcn.inst->fw_shared_cpu_addr;
>>>   -    fw_shared->present_flag_0 = 0;
>>> +    if (drm_dev_enter(&adev->ddev, &idx)) {
>>> +        fw_shared->present_flag_0 = 0;
>>> +        drm_dev_exit(idx);
>>> +    }
>>>         amdgpu_virt_free_mm_table(adev);
>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c 
>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>>> index 2e6b7913bf6c..1780ad1eacd6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
>>> @@ -22,6 +22,7 @@
>>>    */
>>>     #include <linux/firmware.h>
>>> +#include <drm/drm_drv.h>
>>>     #include "amdgpu.h"
>>>   #include "amdgpu_vcn.h"
>>> @@ -235,17 +236,21 @@ static int vcn_v2_5_sw_init(void *handle)
>>>    */
>>>   static int vcn_v2_5_sw_fini(void *handle)
>>>   {
>>> -    int i, r;
>>> +    int i, r, idx;
>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>       volatile struct amdgpu_fw_shared *fw_shared;
>>>   -    for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
>>> -        if (adev->vcn.harvest_config & (1 << i))
>>> -            continue;
>>> -        fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
>>> -        fw_shared->present_flag_0 = 0;
>>> +    if (drm_dev_enter(&adev->ddev, &idx)) {
>>> +        for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
>>> +            if (adev->vcn.harvest_config & (1 << i))
>>> +                continue;
>>> +            fw_shared = adev->vcn.inst[i].fw_shared_cpu_addr;
>>> +            fw_shared->present_flag_0 = 0;
>>> +        }
>>> +        drm_dev_exit(idx);
>>>       }
>>>   +
>>>       if (amdgpu_sriov_vf(adev))
>>>           amdgpu_virt_free_mm_table(adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone
  2021-09-17 11:30 ` [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone Andrey Grodzovsky
@ 2021-09-17 20:50   ` Andrey Grodzovsky
  2021-09-20 16:12   ` Alex Deucher
  1 sibling, 0 replies; 7+ messages in thread
From: Andrey Grodzovsky @ 2021-09-17 20:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher

Ping

Andrey

On 2021-09-17 7:30 a.m., Andrey Grodzovsky wrote:
> Problem:
> When device goes into suspend and unplugged during it
> then all HW programming during resume fails leading
> to a bad SW during pci remove handling which follows.
> Because device is first resumed and only later removed
> we cannot rely on drm_dev_enter/exit here.
>
> Fix:
> Use a flag we use for PCIe error recovery to avoid
> accessing registres. This allows to successfully complete
> pm resume sequence and finish pci remove.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index db21af5e84ed..04fb4e74fb20 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1522,6 +1522,10 @@ static int amdgpu_pmops_resume(struct device *dev)
>   	struct amdgpu_device *adev = drm_to_adev(drm_dev);
>   	int r;
>   
> +	/* Avoids registers access if device is physically gone */
> +	if (!pci_device_is_present(adev->pdev))
> +		adev->no_hw_access = true;
> +
>   	r = amdgpu_device_resume(drm_dev, true);
>   	if (amdgpu_acpi_is_s0ix_active(adev))
>   		adev->in_s0ix = false;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone
  2021-09-17 11:30 ` [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone Andrey Grodzovsky
  2021-09-17 20:50   ` Andrey Grodzovsky
@ 2021-09-20 16:12   ` Alex Deucher
  1 sibling, 0 replies; 7+ messages in thread
From: Alex Deucher @ 2021-09-20 16:12 UTC (permalink / raw)
  To: Andrey Grodzovsky; +Cc: amd-gfx list, Deucher, Alexander

Series is:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

On Fri, Sep 17, 2021 at 7:31 AM Andrey Grodzovsky
<andrey.grodzovsky@amd.com> wrote:
>
> Problem:
> When device goes into suspend and unplugged during it
> then all HW programming during resume fails leading
> to a bad SW during pci remove handling which follows.
> Because device is first resumed and only later removed
> we cannot rely on drm_dev_enter/exit here.
>
> Fix:
> Use a flag we use for PCIe error recovery to avoid
> accessing registres. This allows to successfully complete
> pm resume sequence and finish pci remove.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index db21af5e84ed..04fb4e74fb20 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1522,6 +1522,10 @@ static int amdgpu_pmops_resume(struct device *dev)
>         struct amdgpu_device *adev = drm_to_adev(drm_dev);
>         int r;
>
> +       /* Avoids registers access if device is physically gone */
> +       if (!pci_device_is_present(adev->pdev))
> +               adev->no_hw_access = true;
> +
>         r = amdgpu_device_resume(drm_dev, true);
>         if (amdgpu_acpi_is_s0ix_active(adev))
>                 adev->in_s0ix = false;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-20 16:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17 11:30 [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault Andrey Grodzovsky
2021-09-17 11:30 ` [PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone Andrey Grodzovsky
2021-09-17 20:50   ` Andrey Grodzovsky
2021-09-20 16:12   ` Alex Deucher
2021-09-17 12:00 ` [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault James Zhu
2021-09-17 12:04   ` James Zhu
2021-09-17 14:06     ` Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.