amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO
@ 2020-06-24 19:05 Alex Deucher
  2020-06-24 19:05 ` [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0 Alex Deucher
  2020-06-25  0:16 ` [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Bhardwaj, Rajneesh
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Deucher @ 2020-06-24 19:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Add a switch statement to simplify asic checks.  Note
that BACO is not supported on APUs, so there is no
need to check them.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 34 ++++++++++++++++---------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 4ec544783a45..0fec39eed164 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -167,19 +167,29 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 	}
 
 	if (amdgpu_device_supports_boco(dev) &&
-	    (amdgpu_runtime_pm != 0)) /* enable runpm by default for boco */
-		adev->runpm = true;
-	else if (amdgpu_device_supports_baco(dev) &&
-		 (amdgpu_runtime_pm != 0) &&
-		 (adev->asic_type >= CHIP_TOPAZ) &&
-		 (adev->asic_type != CHIP_VEGA10) &&
-		 (adev->asic_type != CHIP_VEGA20) &&
-		 (adev->asic_type != CHIP_SIENNA_CICHLID) &&
-		 (adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
-		adev->runpm = true;
-	else if (amdgpu_device_supports_baco(dev) &&
-		 (amdgpu_runtime_pm > 0))  /* enable runpm if runpm=1 on CI */
+	    (amdgpu_runtime_pm != 0)) { /* enable runpm by default for boco */
 		adev->runpm = true;
+	} else if (amdgpu_device_supports_baco(dev) &&
+		   (amdgpu_runtime_pm != 0)) {
+		switch (adev->asic_type) {
+#ifdef CONFIG_DRM_AMDGPU_CIK
+		case CHIP_BONAIRE:
+		case CHIP_HAWAII:
+#endif
+		case CHIP_VEGA10:
+		case CHIP_VEGA20:
+		case CHIP_ARCTURUS:
+		case CHIP_SIENNA_CICHLID:
+			/* enable runpm if runpm=1 */
+			if (amdgpu_runtime_pm > 0)
+				adev->runpm = true;
+			break;
+		default:
+			/* enable runpm on VI+ */
+			adev->runpm = true;
+			break;
+		}
+	}
 
 	/* Call ACPI methods: require modeset init
 	 * but failure is not fatal
-- 
2.25.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0
  2020-06-24 19:05 [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Alex Deucher
@ 2020-06-24 19:05 ` Alex Deucher
  2020-06-25  0:06   ` Bhardwaj, Rajneesh
  2020-06-25  0:16 ` [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Bhardwaj, Rajneesh
  1 sibling, 1 reply; 6+ messages in thread
From: Alex Deucher @ 2020-06-24 19:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

The failures with ROCm only happen with noretry=1, so
enable runtime pm when noretry=0 (the current default).

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 0fec39eed164..341d072edd95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -176,7 +176,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 		case CHIP_BONAIRE:
 		case CHIP_HAWAII:
 #endif
-		case CHIP_VEGA10:
 		case CHIP_VEGA20:
 		case CHIP_ARCTURUS:
 		case CHIP_SIENNA_CICHLID:
@@ -184,6 +183,11 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 			if (amdgpu_runtime_pm > 0)
 				adev->runpm = true;
 			break;
+		case CHIP_VEGA10:
+			/* turn runpm on if noretry=0 */
+			if (!amdgpu_noretry)
+				adev->runpm = true;
+			break;
 		default:
 			/* enable runpm on VI+ */
 			adev->runpm = true;
-- 
2.25.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0
  2020-06-24 19:05 ` [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0 Alex Deucher
@ 2020-06-25  0:06   ` Bhardwaj, Rajneesh
  2020-06-25  2:06     ` Felix Kuehling
  0 siblings, 1 reply; 6+ messages in thread
From: Bhardwaj, Rajneesh @ 2020-06-25  0:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix Kuehling


On 6/24/2020 3:05 PM, Alex Deucher wrote:
> [CAUTION: External Email]
>
> The failures with ROCm only happen with noretry=1, so
> enable runtime pm when noretry=0 (the current default).
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 0fec39eed164..341d072edd95 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -176,7 +176,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>                  case CHIP_BONAIRE:
>                  case CHIP_HAWAII:
>   #endif
> -               case CHIP_VEGA10:
>                  case CHIP_VEGA20:
>                  case CHIP_ARCTURUS:
>                  case CHIP_SIENNA_CICHLID:
> @@ -184,6 +183,11 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>                          if (amdgpu_runtime_pm > 0)
>                                  adev->runpm = true;
>                          break;
> +               case CHIP_VEGA10:
> +                       /* turn runpm on if noretry=0 */
> +                       if (!amdgpu_noretry)
> +                               adev->runpm = true;
> +                       break;

Though it fixes the ROCm pytorch issue but aren't there any stability 
and performance optimization concerns as it will impact recoverable page 
faults?

I have no objection to this otherwise.

+ felix

Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>


>                  default:
>                          /* enable runpm on VI+ */
>                          adev->runpm = true;
> --
> 2.25.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Crajneesh.bhardwaj%40amd.com%7Cc985ef0414bd41b48eb508d8187196ed%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286223437553622&amp;sdata=wRJbu3%2F3zu%2BHZ3KA%2FZmyh1yhgATM2zONRr%2FvI5KsxrM%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO
  2020-06-24 19:05 [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Alex Deucher
  2020-06-24 19:05 ` [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0 Alex Deucher
@ 2020-06-25  0:16 ` Bhardwaj, Rajneesh
  2020-06-25 13:45   ` Alex Deucher
  1 sibling, 1 reply; 6+ messages in thread
From: Bhardwaj, Rajneesh @ 2020-06-25  0:16 UTC (permalink / raw)
  To: amd-gfx


On 6/24/2020 3:05 PM, Alex Deucher wrote:
> [CAUTION: External Email]
>
> Add a switch statement to simplify asic checks.  Note
> that BACO is not supported on APUs, so there is no
> need to check them.

why not base this on smu_context to really query the 
SMU_FEATURE_BACO_BIT and then base the below flow on that instead of 
nested logic vs case? I am not sure if there was any issue with 
smu_context earlier?

>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 34 ++++++++++++++++---------
>   1 file changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 4ec544783a45..0fec39eed164 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -167,19 +167,29 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>          }
>
>          if (amdgpu_device_supports_boco(dev) &&
> -           (amdgpu_runtime_pm != 0)) /* enable runpm by default for boco */
> -               adev->runpm = true;
> -       else if (amdgpu_device_supports_baco(dev) &&
> -                (amdgpu_runtime_pm != 0) &&
> -                (adev->asic_type >= CHIP_TOPAZ) &&
> -                (adev->asic_type != CHIP_VEGA10) &&
> -                (adev->asic_type != CHIP_VEGA20) &&
> -                (adev->asic_type != CHIP_SIENNA_CICHLID) &&
> -                (adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
> -               adev->runpm = true;
> -       else if (amdgpu_device_supports_baco(dev) &&
> -                (amdgpu_runtime_pm > 0))  /* enable runpm if runpm=1 on CI */
> +           (amdgpu_runtime_pm != 0)) { /* enable runpm by default for boco */
>                  adev->runpm = true;
> +       } else if (amdgpu_device_supports_baco(dev) &&
> +                  (amdgpu_runtime_pm != 0)) {
> +               switch (adev->asic_type) {
> +#ifdef CONFIG_DRM_AMDGPU_CIK
> +               case CHIP_BONAIRE:
> +               case CHIP_HAWAII:
> +#endif
> +               case CHIP_VEGA10:
> +               case CHIP_VEGA20:
> +               case CHIP_ARCTURUS:
> +               case CHIP_SIENNA_CICHLID:
> +                       /* enable runpm if runpm=1 */
> +                       if (amdgpu_runtime_pm > 0)
> +                               adev->runpm = true;
> +                       break;
> +               default:
> +                       /* enable runpm on VI+ */
> +                       adev->runpm = true;
> +                       break;
> +               }
> +       }
>
>          /* Call ACPI methods: require modeset init
>           * but failure is not fatal
> --
> 2.25.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Crajneesh.bhardwaj%40amd.com%7Cd5d794bda2c44e0c902008d818719558%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286223410087832&amp;sdata=yErSk5DyDDXPX8Y1cXp14QxX9pgwRlIj6%2FuIhNKYN%2Bk%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0
  2020-06-25  0:06   ` Bhardwaj, Rajneesh
@ 2020-06-25  2:06     ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2020-06-25  2:06 UTC (permalink / raw)
  To: Bhardwaj, Rajneesh, amd-gfx, Alex Deucher, Alex Deucher


Am 2020-06-24 um 8:06 p.m. schrieb Bhardwaj, Rajneesh:
>
> On 6/24/2020 3:05 PM, Alex Deucher wrote:
>> [CAUTION: External Email]
>>
>> The failures with ROCm only happen with noretry=1, so
>> enable runtime pm when noretry=0 (the current default).
>>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 0fec39eed164..341d072edd95 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -176,7 +176,6 @@ int amdgpu_driver_load_kms(struct drm_device
>> *dev, unsigned long flags)
>>                  case CHIP_BONAIRE:
>>                  case CHIP_HAWAII:
>>   #endif
>> -               case CHIP_VEGA10:
>>                  case CHIP_VEGA20:
>>                  case CHIP_ARCTURUS:
>>                  case CHIP_SIENNA_CICHLID:
>> @@ -184,6 +183,11 @@ int amdgpu_driver_load_kms(struct drm_device
>> *dev, unsigned long flags)
>>                          if (amdgpu_runtime_pm > 0)
>>                                  adev->runpm = true;
>>                          break;
>> +               case CHIP_VEGA10:
>> +                       /* turn runpm on if noretry=0 */
>> +                       if (!amdgpu_noretry)
>> +                               adev->runpm = true;
>> +                       break;
>
> Though it fixes the ROCm pytorch issue but aren't there any stability
> and performance optimization concerns as it will impact recoverable
> page faults?
>
> I have no objection to this otherwise.
>
> + felix

This only disables runtime PM in some corner case as a workaround until
we understand the root cause of the problem. The only impact should be
more power consumption while the GPU is idle. I don't expect a
performance or stability impact from this change.

The series is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Regards,
  Felix

>
> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
>
>
>>                  default:
>>                          /* enable runpm on VI+ */
>>                          adev->runpm = true;
>> -- 
>> 2.25.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Crajneesh.bhardwaj%40amd.com%7Cc985ef0414bd41b48eb508d8187196ed%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286223437553622&amp;sdata=wRJbu3%2F3zu%2BHZ3KA%2FZmyh1yhgATM2zONRr%2FvI5KsxrM%3D&amp;reserved=0
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO
  2020-06-25  0:16 ` [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Bhardwaj, Rajneesh
@ 2020-06-25 13:45   ` Alex Deucher
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2020-06-25 13:45 UTC (permalink / raw)
  To: Bhardwaj, Rajneesh; +Cc: amd-gfx list

On Wed, Jun 24, 2020 at 8:17 PM Bhardwaj, Rajneesh
<rajneesh.bhardwaj@amd.com> wrote:
>
>
> On 6/24/2020 3:05 PM, Alex Deucher wrote:
> > [CAUTION: External Email]
> >
> > Add a switch statement to simplify asic checks.  Note
> > that BACO is not supported on APUs, so there is no
> > need to check them.
>
> why not base this on smu_context to really query the
> SMU_FEATURE_BACO_BIT and then base the below flow on that instead of
> nested logic vs case? I am not sure if there was any issue with
> smu_context earlier?

We already check whether the asic supports BACO in
amdgpu_device_supports_baco().  The additional logic is just there so
selectively disable runtime pm on certain asics.  E.g., Arcturus and
vega20 support BACO, but had issues with ROCm IIRC (although, maybe it
is worth double checking those asics with noretry=0).  This is mostly
just prep work for patch 2/2 which adds a special case for vega10.

Alex

>
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 34 ++++++++++++++++---------
> >   1 file changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > index 4ec544783a45..0fec39eed164 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > @@ -167,19 +167,29 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
> >          }
> >
> >          if (amdgpu_device_supports_boco(dev) &&
> > -           (amdgpu_runtime_pm != 0)) /* enable runpm by default for boco */
> > -               adev->runpm = true;
> > -       else if (amdgpu_device_supports_baco(dev) &&
> > -                (amdgpu_runtime_pm != 0) &&
> > -                (adev->asic_type >= CHIP_TOPAZ) &&
> > -                (adev->asic_type != CHIP_VEGA10) &&
> > -                (adev->asic_type != CHIP_VEGA20) &&
> > -                (adev->asic_type != CHIP_SIENNA_CICHLID) &&
> > -                (adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
> > -               adev->runpm = true;
> > -       else if (amdgpu_device_supports_baco(dev) &&
> > -                (amdgpu_runtime_pm > 0))  /* enable runpm if runpm=1 on CI */
> > +           (amdgpu_runtime_pm != 0)) { /* enable runpm by default for boco */
> >                  adev->runpm = true;
> > +       } else if (amdgpu_device_supports_baco(dev) &&
> > +                  (amdgpu_runtime_pm != 0)) {
> > +               switch (adev->asic_type) {
> > +#ifdef CONFIG_DRM_AMDGPU_CIK
> > +               case CHIP_BONAIRE:
> > +               case CHIP_HAWAII:
> > +#endif
> > +               case CHIP_VEGA10:
> > +               case CHIP_VEGA20:
> > +               case CHIP_ARCTURUS:
> > +               case CHIP_SIENNA_CICHLID:
> > +                       /* enable runpm if runpm=1 */
> > +                       if (amdgpu_runtime_pm > 0)
> > +                               adev->runpm = true;
> > +                       break;
> > +               default:
> > +                       /* enable runpm on VI+ */
> > +                       adev->runpm = true;
> > +                       break;
> > +               }
> > +       }
> >
> >          /* Call ACPI methods: require modeset init
> >           * but failure is not fatal
> > --
> > 2.25.4
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Crajneesh.bhardwaj%40amd.com%7Cd5d794bda2c44e0c902008d818719558%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286223410087832&amp;sdata=yErSk5DyDDXPX8Y1cXp14QxX9pgwRlIj6%2FuIhNKYN%2Bk%3D&amp;reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-25 13:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 19:05 [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Alex Deucher
2020-06-24 19:05 ` [PATCH 2/2] drm/amdgpu: enable runtime pm on vega10 when noretry=0 Alex Deucher
2020-06-25  0:06   ` Bhardwaj, Rajneesh
2020-06-25  2:06     ` Felix Kuehling
2020-06-25  0:16 ` [PATCH 1/2] drm/amdgpu: rework runtime pm enablement for BACO Bhardwaj, Rajneesh
2020-06-25 13:45   ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).