AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
@ 2020-07-29 23:11 Paweł Gronowski
  2020-07-30  2:53 ` Alex Deucher
  0 siblings, 1 reply; 9+ messages in thread
From: Paweł Gronowski @ 2020-07-29 23:11 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx

Regression was introduced in commit 38e0c89a19fd
("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
-EINVAL for previously valid input. This was caused by an empty
string (starting at the \0 character) being passed to the kstrtol.

Signed-off-by: Paweł Gronowski <me@woland.xyz>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index ebb8a28ff002..cbf623ff03bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
 		tmp_str++;
 	while (isspace(*++tmp_str));
 
-	while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
+	while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
 		ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
 		if (ret)
 			return -EINVAL;
 		parameter_size++;
 
+		if (!tmp_str)
+			break;
 		while (isspace(*tmp_str))
 			tmp_str++;
 	}
@@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
 			i++;
 		memcpy(buf_cpy, buf, count-i);
 		tmp_str = buf_cpy;
-		while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
+		while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
 			ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
 			if (ret)
 				return -EINVAL;
 			parameter_size++;
+
+			if (!tmp_str)
+				break;
 			while (isspace(*tmp_str))
 				tmp_str++;
 		}
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-29 23:11 [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile Paweł Gronowski
@ 2020-07-30  2:53 ` Alex Deucher
  2020-07-30 14:49   ` Matt Coffin
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2020-07-30  2:53 UTC (permalink / raw)
  To: Paweł Gronowski; +Cc: Alex Deucher, amd-gfx list

On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>
> Regression was introduced in commit 38e0c89a19fd
> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> -EINVAL for previously valid input. This was caused by an empty
> string (starting at the \0 character) being passed to the kstrtol.
>
> Signed-off-by: Paweł Gronowski <me@woland.xyz>

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index ebb8a28ff002..cbf623ff03bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>                 tmp_str++;
>         while (isspace(*++tmp_str));
>
> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>                 if (ret)
>                         return -EINVAL;
>                 parameter_size++;
>
> +               if (!tmp_str)
> +                       break;
>                 while (isspace(*tmp_str))
>                         tmp_str++;
>         }
> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>                         i++;
>                 memcpy(buf_cpy, buf, count-i);
>                 tmp_str = buf_cpy;
> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>                         if (ret)
>                                 return -EINVAL;
>                         parameter_size++;
> +
> +                       if (!tmp_str)
> +                               break;
>                         while (isspace(*tmp_str))
>                                 tmp_str++;
>                 }
> --
> 2.25.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-30  2:53 ` Alex Deucher
@ 2020-07-30 14:49   ` Matt Coffin
  2020-07-31  0:31     ` Paweł Gronowski
  0 siblings, 1 reply; 9+ messages in thread
From: Matt Coffin @ 2020-07-30 14:49 UTC (permalink / raw)
  To: Alex Deucher, Paweł Gronowski; +Cc: Alex Deucher, amd-gfx list

[-- Attachment #1.1.1: Type: text/plain, Size: 3298 bytes --]

Hello all, I just did some testing with this applied, and while it no
longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
/sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
to, and staying at 100% CPU usage, with no indicating information in
`dmesg` from the kernel.

It appeared to work at least ONCE, but potentially not after.

This is not unique to Navi, and caused the problem on a POLARIS10 card
as well.

Sorry for the bad news, and thanks for any insight you may have,
Matt Coffin

On 7/29/20 8:53 PM, Alex Deucher wrote:
> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>>
>> Regression was introduced in commit 38e0c89a19fd
>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>> -EINVAL for previously valid input. This was caused by an empty
>> string (starting at the \0 character) being passed to the kstrtol.
>>
>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
> 
> Applied.  Thanks!
> 
> Alex
> 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> index ebb8a28ff002..cbf623ff03bd 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>                 tmp_str++;
>>         while (isspace(*++tmp_str));
>>
>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>                 if (ret)
>>                         return -EINVAL;
>>                 parameter_size++;
>>
>> +               if (!tmp_str)
>> +                       break;
>>                 while (isspace(*tmp_str))
>>                         tmp_str++;
>>         }
>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>                         i++;
>>                 memcpy(buf_cpy, buf, count-i);
>>                 tmp_str = buf_cpy;
>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>                         if (ret)
>>                                 return -EINVAL;
>>                         parameter_size++;
>> +
>> +                       if (!tmp_str)
>> +                               break;
>>                         while (isspace(*tmp_str))
>>                                 tmp_str++;
>>                 }
>> --
>> 2.25.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-30 14:49   ` Matt Coffin
@ 2020-07-31  0:31     ` Paweł Gronowski
  2020-07-31  0:34       ` Matt Coffin
  0 siblings, 1 reply; 9+ messages in thread
From: Paweł Gronowski @ 2020-07-31  0:31 UTC (permalink / raw)
  To: Matt Coffin; +Cc: Alex Deucher, amd-gfx

Hello Matt,

Thank you for your testing. It seems that my gpu (RX 570) does not support the
vc setting so I can not exactly reproduce the issue. However I did trace the
code path the test case takes and it seems to correctly pass through the while
loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
The 'parameter' array is populated the same way as the original code did. Since
the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
unfortunately caused by something else.


Paweł Gronowski

On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
> Hello all, I just did some testing with this applied, and while it no
> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
> to, and staying at 100% CPU usage, with no indicating information in
> `dmesg` from the kernel.
> 
> It appeared to work at least ONCE, but potentially not after.
> 
> This is not unique to Navi, and caused the problem on a POLARIS10 card
> as well.
> 
> Sorry for the bad news, and thanks for any insight you may have,
> Matt Coffin
> 
> On 7/29/20 8:53 PM, Alex Deucher wrote:
> > On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
> >>
> >> Regression was introduced in commit 38e0c89a19fd
> >> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> >> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> >> -EINVAL for previously valid input. This was caused by an empty
> >> string (starting at the \0 character) being passed to the kstrtol.
> >>
> >> Signed-off-by: Paweł Gronowski <me@woland.xyz>
> > 
> > Applied.  Thanks!
> > 
> > Alex
> > 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
> >>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >> index ebb8a28ff002..cbf623ff03bd 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> >>                 tmp_str++;
> >>         while (isspace(*++tmp_str));
> >>
> >> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> >> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> >>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> >>                 if (ret)
> >>                         return -EINVAL;
> >>                 parameter_size++;
> >>
> >> +               if (!tmp_str)
> >> +                       break;
> >>                 while (isspace(*tmp_str))
> >>                         tmp_str++;
> >>         }
> >> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
> >>                         i++;
> >>                 memcpy(buf_cpy, buf, count-i);
> >>                 tmp_str = buf_cpy;
> >> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> >> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> >>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> >>                         if (ret)
> >>                                 return -EINVAL;
> >>                         parameter_size++;
> >> +
> >> +                       if (!tmp_str)
> >> +                               break;
> >>                         while (isspace(*tmp_str))
> >>                                 tmp_str++;
> >>                 }
> >> --
> >> 2.25.1
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > 
> 



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-31  0:31     ` Paweł Gronowski
@ 2020-07-31  0:34       ` Matt Coffin
  2020-07-31 13:34         ` Paweł Gronowski
  0 siblings, 1 reply; 9+ messages in thread
From: Matt Coffin @ 2020-07-31  0:34 UTC (permalink / raw)
  To: Paweł Gronowski; +Cc: Alex Deucher, amd-gfx

[-- Attachment #1.1.1: Type: text/plain, Size: 4681 bytes --]

Hey Pawel,

I did confirm that this patch *introduced* the issue both with the
bisect, and by testing reverting it.

Now, there's a lot of fragile pieces in the dpm handling, so it could be
this patch's interaction with something else that's causing it and it
may well not be the fault of this code, but this is the patch that
introduced the issue.

I'll have some more time tomorrow to try to get down to root cause here,
so maybe I'll have more to offer then.

Thanks for taking a look,
Matt

On 7/30/20 6:31 PM, Paweł Gronowski wrote:
> Hello Matt,
> 
> Thank you for your testing. It seems that my gpu (RX 570) does not support the
> vc setting so I can not exactly reproduce the issue. However I did trace the
> code path the test case takes and it seems to correctly pass through the while
> loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
> The 'parameter' array is populated the same way as the original code did. Since
> the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
> unfortunately caused by something else.
> 
> 
> Paweł Gronowski
> 
> On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
>> Hello all, I just did some testing with this applied, and while it no
>> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
>> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
>> to, and staying at 100% CPU usage, with no indicating information in
>> `dmesg` from the kernel.
>>
>> It appeared to work at least ONCE, but potentially not after.
>>
>> This is not unique to Navi, and caused the problem on a POLARIS10 card
>> as well.
>>
>> Sorry for the bad news, and thanks for any insight you may have,
>> Matt Coffin
>>
>> On 7/29/20 8:53 PM, Alex Deucher wrote:
>>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>>>>
>>>> Regression was introduced in commit 38e0c89a19fd
>>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>>>> -EINVAL for previously valid input. This was caused by an empty
>>>> string (starting at the \0 character) being passed to the kstrtol.
>>>>
>>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
>>>
>>> Applied.  Thanks!
>>>
>>> Alex
>>>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> index ebb8a28ff002..cbf623ff03bd 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>>>                 tmp_str++;
>>>>         while (isspace(*++tmp_str));
>>>>
>>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>                 if (ret)
>>>>                         return -EINVAL;
>>>>                 parameter_size++;
>>>>
>>>> +               if (!tmp_str)
>>>> +                       break;
>>>>                 while (isspace(*tmp_str))
>>>>                         tmp_str++;
>>>>         }
>>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>>>                         i++;
>>>>                 memcpy(buf_cpy, buf, count-i);
>>>>                 tmp_str = buf_cpy;
>>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>                         if (ret)
>>>>                                 return -EINVAL;
>>>>                         parameter_size++;
>>>> +
>>>> +                       if (!tmp_str)
>>>> +                               break;
>>>>                         while (isspace(*tmp_str))
>>>>                                 tmp_str++;
>>>>                 }
>>>> --
>>>> 2.25.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
> 
> 
> 


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-31  0:34       ` Matt Coffin
@ 2020-07-31 13:34         ` Paweł Gronowski
  2020-07-31 20:20           ` Paweł Gronowski
  0 siblings, 1 reply; 9+ messages in thread
From: Paweł Gronowski @ 2020-07-31 13:34 UTC (permalink / raw)
  To: Matt Coffin; +Cc: Alex Deucher, amd-gfx

Hey Matt,

I have just tested the amd-staging-drm-next branch 
(dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
and I can reproduce your issue with:

  sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'

Which makes the sh hang with 100% usage.

The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
both without and with my patches reapplied.
So the problem must be related to some commit that is present in the
amd-staging-drm-next but not in the mainline.


Paweł Gronowski

On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
> Hey Pawel,
> 
> I did confirm that this patch *introduced* the issue both with the
> bisect, and by testing reverting it.
> 
> Now, there's a lot of fragile pieces in the dpm handling, so it could be
> this patch's interaction with something else that's causing it and it
> may well not be the fault of this code, but this is the patch that
> introduced the issue.
> 
> I'll have some more time tomorrow to try to get down to root cause here,
> so maybe I'll have more to offer then.
> 
> Thanks for taking a look,
> Matt
> 
> On 7/30/20 6:31 PM, Paweł Gronowski wrote:
> > Hello Matt,
> > 
> > Thank you for your testing. It seems that my gpu (RX 570) does not support the
> > vc setting so I can not exactly reproduce the issue. However I did trace the
> > code path the test case takes and it seems to correctly pass through the while
> > loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
> > The 'parameter' array is populated the same way as the original code did. Since
> > the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
> > unfortunately caused by something else.
> > 
> > 
> > Paweł Gronowski
> > 
> > On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
> >> Hello all, I just did some testing with this applied, and while it no
> >> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
> >> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
> >> to, and staying at 100% CPU usage, with no indicating information in
> >> `dmesg` from the kernel.
> >>
> >> It appeared to work at least ONCE, but potentially not after.
> >>
> >> This is not unique to Navi, and caused the problem on a POLARIS10 card
> >> as well.
> >>
> >> Sorry for the bad news, and thanks for any insight you may have,
> >> Matt Coffin
> >>
> >> On 7/29/20 8:53 PM, Alex Deucher wrote:
> >>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
> >>>>
> >>>> Regression was introduced in commit 38e0c89a19fd
> >>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> >>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> >>>> -EINVAL for previously valid input. This was caused by an empty
> >>>> string (starting at the \0 character) being passed to the kstrtol.
> >>>>
> >>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
> >>>
> >>> Applied.  Thanks!
> >>>
> >>> Alex
> >>>
> >>>> ---
> >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
> >>>>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >>>> index ebb8a28ff002..cbf623ff03bd 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> >>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> >>>>                 tmp_str++;
> >>>>         while (isspace(*++tmp_str));
> >>>>
> >>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> >>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> >>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> >>>>                 if (ret)
> >>>>                         return -EINVAL;
> >>>>                 parameter_size++;
> >>>>
> >>>> +               if (!tmp_str)
> >>>> +                       break;
> >>>>                 while (isspace(*tmp_str))
> >>>>                         tmp_str++;
> >>>>         }
> >>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
> >>>>                         i++;
> >>>>                 memcpy(buf_cpy, buf, count-i);
> >>>>                 tmp_str = buf_cpy;
> >>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> >>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> >>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> >>>>                         if (ret)
> >>>>                                 return -EINVAL;
> >>>>                         parameter_size++;
> >>>> +
> >>>> +                       if (!tmp_str)
> >>>> +                               break;
> >>>>                         while (isspace(*tmp_str))
> >>>>                                 tmp_str++;
> >>>>                 }
> >>>> --
> >>>> 2.25.1
> >>>>
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx@lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>> _______________________________________________
> >>> amd-gfx mailing list
> >>> amd-gfx@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>>
> >>
> > 
> > 
> > 
> 



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-31 13:34         ` Paweł Gronowski
@ 2020-07-31 20:20           ` Paweł Gronowski
  2020-07-31 21:25             ` Matt Coffin
  2020-07-31 21:39             ` Matt Coffin
  0 siblings, 2 replies; 9+ messages in thread
From: Paweł Gronowski @ 2020-07-31 20:20 UTC (permalink / raw)
  To: Matt Coffin; +Cc: Alex Deucher, amd-gfx

Hello again,

I just finished a bisect of amd-staging-drm-next and it looks like
the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
("drm/amdgpu: fix system hang issue during GPU reset").

It is a bit tricky, because it is commited on top of my first faulty patch
7173949df45482 ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") so
it needs to be reverted fix the premature -INVAL.

Test case:
  sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
Results:
  edad8312cbbf9a3 + revert 7173949df45482 = hang
  edad8312cbbf9a3~1 + revert 7173949df45482 = no hang

Could you confirm that you get the same results?

Thanks,
Paweł Gronowski


On Fri, Jul 31, 2020 at 03:34:40PM +0200, Paweł Gronowski wrote:
> Hey Matt,
> 
> I have just tested the amd-staging-drm-next branch 
> (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
> and I can reproduce your issue with:
> 
>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
> 
> Which makes the sh hang with 100% usage.
> 
> The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
> both without and with my patches reapplied.
> So the problem must be related to some commit that is present in the
> amd-staging-drm-next but not in the mainline.
> 
> 
> Paweł Gronowski
> 
> On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
> > Hey Pawel,
> > 
> > I did confirm that this patch *introduced* the issue both with the
> > bisect, and by testing reverting it.
> > 
> > Now, there's a lot of fragile pieces in the dpm handling, so it could be
> > this patch's interaction with something else that's causing it and it
> > may well not be the fault of this code, but this is the patch that
> > introduced the issue.
> > 
> > I'll have some more time tomorrow to try to get down to root cause here,
> > so maybe I'll have more to offer then.
> > 
> > Thanks for taking a look,
> > Matt
> > 
> > On 7/30/20 6:31 PM, Paweł Gronowski wrote:
> > > Hello Matt,
> > > 
> > > Thank you for your testing. It seems that my gpu (RX 570) does not support the
> > > vc setting so I can not exactly reproduce the issue. However I did trace the
> > > code path the test case takes and it seems to correctly pass through the while
> > > loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
> > > The 'parameter' array is populated the same way as the original code did. Since
> > > the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
> > > unfortunately caused by something else.
> > > 
> > > 
> > > Paweł Gronowski
> > > 
> > > On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
> > >> Hello all, I just did some testing with this applied, and while it no
> > >> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
> > >> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
> > >> to, and staying at 100% CPU usage, with no indicating information in
> > >> `dmesg` from the kernel.
> > >>
> > >> It appeared to work at least ONCE, but potentially not after.
> > >>
> > >> This is not unique to Navi, and caused the problem on a POLARIS10 card
> > >> as well.
> > >>
> > >> Sorry for the bad news, and thanks for any insight you may have,
> > >> Matt Coffin
> > >>
> > >> On 7/29/20 8:53 PM, Alex Deucher wrote:
> > >>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
> > >>>>
> > >>>> Regression was introduced in commit 38e0c89a19fd
> > >>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> > >>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> > >>>> -EINVAL for previously valid input. This was caused by an empty
> > >>>> string (starting at the \0 character) being passed to the kstrtol.
> > >>>>
> > >>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
> > >>>
> > >>> Applied.  Thanks!
> > >>>
> > >>> Alex
> > >>>
> > >>>> ---
> > >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
> > >>>>  1 file changed, 7 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> index ebb8a28ff002..cbf623ff03bd 100644
> > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> > >>>>                 tmp_str++;
> > >>>>         while (isspace(*++tmp_str));
> > >>>>
> > >>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> > >>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> > >>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> > >>>>                 if (ret)
> > >>>>                         return -EINVAL;
> > >>>>                 parameter_size++;
> > >>>>
> > >>>> +               if (!tmp_str)
> > >>>> +                       break;
> > >>>>                 while (isspace(*tmp_str))
> > >>>>                         tmp_str++;
> > >>>>         }
> > >>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
> > >>>>                         i++;
> > >>>>                 memcpy(buf_cpy, buf, count-i);
> > >>>>                 tmp_str = buf_cpy;
> > >>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> > >>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> > >>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> > >>>>                         if (ret)
> > >>>>                                 return -EINVAL;
> > >>>>                         parameter_size++;
> > >>>> +
> > >>>> +                       if (!tmp_str)
> > >>>> +                               break;
> > >>>>                         while (isspace(*tmp_str))
> > >>>>                                 tmp_str++;
> > >>>>                 }
> > >>>> --
> > >>>> 2.25.1
> > >>>>
> > >>>> _______________________________________________
> > >>>> amd-gfx mailing list
> > >>>> amd-gfx@lists.freedesktop.org
> > >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > >>> _______________________________________________
> > >>> amd-gfx mailing list
> > >>> amd-gfx@lists.freedesktop.org
> > >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > >>>
> > >>
> > > 
> > > 
> > > 
> > 
> 
> 
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-31 20:20           ` Paweł Gronowski
@ 2020-07-31 21:25             ` Matt Coffin
  2020-07-31 21:39             ` Matt Coffin
  1 sibling, 0 replies; 9+ messages in thread
From: Matt Coffin @ 2020-07-31 21:25 UTC (permalink / raw)
  To: Paweł Gronowski; +Cc: Alex Deucher, amd-gfx

[-- Attachment #1.1.1: Type: text/plain, Size: 6969 bytes --]

I actually *just* finished my bisect, and arrived at the same
conclusion. The hang appears to be introduced in
edad8312cbbf9a33c86873fc4093664f150dd5c1.

There are some conflicts with an automatic `git revert`, so I'm picking
through the changes now to fully understand what happened and come up
with a fix.

Thanks again for the help,
Matt

On 7/31/20 2:20 PM, Paweł Gronowski wrote:
> Hello again,
> 
> I just finished a bisect of amd-staging-drm-next and it looks like
> the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
> ("drm/amdgpu: fix system hang issue during GPU reset").
> 
> It is a bit tricky, because it is commited on top of my first faulty patch
> 7173949df45482 ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") so
> it needs to be reverted fix the premature -INVAL.
> 
> Test case:
>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
> Results:
>   edad8312cbbf9a3 + revert 7173949df45482 = hang
>   edad8312cbbf9a3~1 + revert 7173949df45482 = no hang
> 
> Could you confirm that you get the same results?
> 
> Thanks,
> Paweł Gronowski
> 
> 
> On Fri, Jul 31, 2020 at 03:34:40PM +0200, Paweł Gronowski wrote:
>> Hey Matt,
>>
>> I have just tested the amd-staging-drm-next branch 
>> (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
>> and I can reproduce your issue with:
>>
>>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
>>
>> Which makes the sh hang with 100% usage.
>>
>> The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
>> both without and with my patches reapplied.
>> So the problem must be related to some commit that is present in the
>> amd-staging-drm-next but not in the mainline.
>>
>>
>> Paweł Gronowski
>>
>> On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
>>> Hey Pawel,
>>>
>>> I did confirm that this patch *introduced* the issue both with the
>>> bisect, and by testing reverting it.
>>>
>>> Now, there's a lot of fragile pieces in the dpm handling, so it could be
>>> this patch's interaction with something else that's causing it and it
>>> may well not be the fault of this code, but this is the patch that
>>> introduced the issue.
>>>
>>> I'll have some more time tomorrow to try to get down to root cause here,
>>> so maybe I'll have more to offer then.
>>>
>>> Thanks for taking a look,
>>> Matt
>>>
>>> On 7/30/20 6:31 PM, Paweł Gronowski wrote:
>>>> Hello Matt,
>>>>
>>>> Thank you for your testing. It seems that my gpu (RX 570) does not support the
>>>> vc setting so I can not exactly reproduce the issue. However I did trace the
>>>> code path the test case takes and it seems to correctly pass through the while
>>>> loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
>>>> The 'parameter' array is populated the same way as the original code did. Since
>>>> the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
>>>> unfortunately caused by something else.
>>>>
>>>>
>>>> Paweł Gronowski
>>>>
>>>> On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
>>>>> Hello all, I just did some testing with this applied, and while it no
>>>>> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
>>>>> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
>>>>> to, and staying at 100% CPU usage, with no indicating information in
>>>>> `dmesg` from the kernel.
>>>>>
>>>>> It appeared to work at least ONCE, but potentially not after.
>>>>>
>>>>> This is not unique to Navi, and caused the problem on a POLARIS10 card
>>>>> as well.
>>>>>
>>>>> Sorry for the bad news, and thanks for any insight you may have,
>>>>> Matt Coffin
>>>>>
>>>>> On 7/29/20 8:53 PM, Alex Deucher wrote:
>>>>>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>>>>>>>
>>>>>>> Regression was introduced in commit 38e0c89a19fd
>>>>>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>>>>>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>>>>>>> -EINVAL for previously valid input. This was caused by an empty
>>>>>>> string (starting at the \0 character) being passed to the kstrtol.
>>>>>>>
>>>>>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
>>>>>>
>>>>>> Applied.  Thanks!
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> index ebb8a28ff002..cbf623ff03bd 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>>>>>>                 tmp_str++;
>>>>>>>         while (isspace(*++tmp_str));
>>>>>>>
>>>>>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>>>>                 if (ret)
>>>>>>>                         return -EINVAL;
>>>>>>>                 parameter_size++;
>>>>>>>
>>>>>>> +               if (!tmp_str)
>>>>>>> +                       break;
>>>>>>>                 while (isspace(*tmp_str))
>>>>>>>                         tmp_str++;
>>>>>>>         }
>>>>>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>>>>>>                         i++;
>>>>>>>                 memcpy(buf_cpy, buf, count-i);
>>>>>>>                 tmp_str = buf_cpy;
>>>>>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>>>>                         if (ret)
>>>>>>>                                 return -EINVAL;
>>>>>>>                         parameter_size++;
>>>>>>> +
>>>>>>> +                       if (!tmp_str)
>>>>>>> +                               break;
>>>>>>>                         while (isspace(*tmp_str))
>>>>>>>                                 tmp_str++;
>>>>>>>                 }
>>>>>>> --
>>>>>>> 2.25.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
  2020-07-31 20:20           ` Paweł Gronowski
  2020-07-31 21:25             ` Matt Coffin
@ 2020-07-31 21:39             ` Matt Coffin
  1 sibling, 0 replies; 9+ messages in thread
From: Matt Coffin @ 2020-07-31 21:39 UTC (permalink / raw)
  To: Paweł Gronowski; +Cc: Alex Deucher, Dennis Li, amd-gfx, Hawking Zhang

My bisect resulted in the same conclusion, that the problem began with
edad8312cbbf9a33c86873fc4093664f150dd5c1.

That commit has a LOT of changes, so I'm having problems following what
might be relevant, so in case Hawking or Dennis have any insight they
could contribute towards letting us know where to look, I've added them
to the CC list.

If you guys know why those GPU reset changes would have effected the
sysfs interfaces in this way, it could save me a bunch of investigation
time.

Thanks in advance,
Matt

On 7/31/20 2:20 PM, Paweł Gronowski wrote:
> Hello again,
> 
> I just finished a bisect of amd-staging-drm-next and it looks like
> the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
> ("drm/amdgpu: fix system hang issue during GPU reset").
> 
> It is a bit tricky, because it is commited on top of my first faulty patch
> 7173949df45482 ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") so
> it needs to be reverted fix the premature -INVAL.
> 
> Test case:
>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
> Results:
>   edad8312cbbf9a3 + revert 7173949df45482 = hang
>   edad8312cbbf9a3~1 + revert 7173949df45482 = no hang
> 
> Could you confirm that you get the same results?
> 
> Thanks,
> Paweł Gronowski
> 
> 
> On Fri, Jul 31, 2020 at 03:34:40PM +0200, Paweł Gronowski wrote:
>> Hey Matt,
>>
>> I have just tested the amd-staging-drm-next branch 
>> (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
>> and I can reproduce your issue with:
>>
>>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
>>
>> Which makes the sh hang with 100% usage.
>>
>> The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
>> both without and with my patches reapplied.
>> So the problem must be related to some commit that is present in the
>> amd-staging-drm-next but not in the mainline.
>>
>>
>> Paweł Gronowski
>>
>> On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
>>> Hey Pawel,
>>>
>>> I did confirm that this patch *introduced* the issue both with the
>>> bisect, and by testing reverting it.
>>>
>>> Now, there's a lot of fragile pieces in the dpm handling, so it could be
>>> this patch's interaction with something else that's causing it and it
>>> may well not be the fault of this code, but this is the patch that
>>> introduced the issue.
>>>
>>> I'll have some more time tomorrow to try to get down to root cause here,
>>> so maybe I'll have more to offer then.
>>>
>>> Thanks for taking a look,
>>> Matt
>>>
>>> On 7/30/20 6:31 PM, Paweł Gronowski wrote:
>>>> Hello Matt,
>>>>
>>>> Thank you for your testing. It seems that my gpu (RX 570) does not support the
>>>> vc setting so I can not exactly reproduce the issue. However I did trace the
>>>> code path the test case takes and it seems to correctly pass through the while
>>>> loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
>>>> The 'parameter' array is populated the same way as the original code did. Since
>>>> the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
>>>> unfortunately caused by something else.
>>>>
>>>>
>>>> Paweł Gronowski
>>>>
>>>> On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
>>>>> Hello all, I just did some testing with this applied, and while it no
>>>>> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
>>>>> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
>>>>> to, and staying at 100% CPU usage, with no indicating information in
>>>>> `dmesg` from the kernel.
>>>>>
>>>>> It appeared to work at least ONCE, but potentially not after.
>>>>>
>>>>> This is not unique to Navi, and caused the problem on a POLARIS10 card
>>>>> as well.
>>>>>
>>>>> Sorry for the bad news, and thanks for any insight you may have,
>>>>> Matt Coffin
>>>>>
>>>>> On 7/29/20 8:53 PM, Alex Deucher wrote:
>>>>>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>>>>>>>
>>>>>>> Regression was introduced in commit 38e0c89a19fd
>>>>>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>>>>>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>>>>>>> -EINVAL for previously valid input. This was caused by an empty
>>>>>>> string (starting at the \0 character) being passed to the kstrtol.
>>>>>>>
>>>>>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
>>>>>>
>>>>>> Applied.  Thanks!
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> index ebb8a28ff002..cbf623ff03bd 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>>>>>>                 tmp_str++;
>>>>>>>         while (isspace(*++tmp_str));
>>>>>>>
>>>>>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>>>>                 if (ret)
>>>>>>>                         return -EINVAL;
>>>>>>>                 parameter_size++;
>>>>>>>
>>>>>>> +               if (!tmp_str)
>>>>>>> +                       break;
>>>>>>>                 while (isspace(*tmp_str))
>>>>>>>                         tmp_str++;
>>>>>>>         }
>>>>>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>>>>>>                         i++;
>>>>>>>                 memcpy(buf_cpy, buf, count-i);
>>>>>>>                 tmp_str = buf_cpy;
>>>>>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>>>>                         if (ret)
>>>>>>>                                 return -EINVAL;
>>>>>>>                         parameter_size++;
>>>>>>> +
>>>>>>> +                       if (!tmp_str)
>>>>>>> +                               break;
>>>>>>>                         while (isspace(*tmp_str))
>>>>>>>                                 tmp_str++;
>>>>>>>                 }
>>>>>>> --
>>>>>>> 2.25.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 23:11 [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile Paweł Gronowski
2020-07-30  2:53 ` Alex Deucher
2020-07-30 14:49   ` Matt Coffin
2020-07-31  0:31     ` Paweł Gronowski
2020-07-31  0:34       ` Matt Coffin
2020-07-31 13:34         ` Paweł Gronowski
2020-07-31 20:20           ` Paweł Gronowski
2020-07-31 21:25             ` Matt Coffin
2020-07-31 21:39             ` Matt Coffin

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git