From: Matt Coffin <mcoffin13@gmail.com>
To: "Paweł Gronowski" <me@woland.xyz>
Cc: Alex Deucher <alexander.deucher@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
Date: Fri, 31 Jul 2020 15:25:16 -0600 [thread overview]
Message-ID: <7cbe7df5-7798-148d-9b43-f99965704407@gmail.com> (raw)
In-Reply-To: <20200731202014.GA3750@tower>
[-- Attachment #1.1.1: Type: text/plain, Size: 6969 bytes --]
I actually *just* finished my bisect, and arrived at the same
conclusion. The hang appears to be introduced in
edad8312cbbf9a33c86873fc4093664f150dd5c1.
There are some conflicts with an automatic `git revert`, so I'm picking
through the changes now to fully understand what happened and come up
with a fix.
Thanks again for the help,
Matt
On 7/31/20 2:20 PM, Paweł Gronowski wrote:
> Hello again,
>
> I just finished a bisect of amd-staging-drm-next and it looks like
> the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
> ("drm/amdgpu: fix system hang issue during GPU reset").
>
> It is a bit tricky, because it is commited on top of my first faulty patch
> 7173949df45482 ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") so
> it needs to be reverted fix the premature -INVAL.
>
> Test case:
> sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
> Results:
> edad8312cbbf9a3 + revert 7173949df45482 = hang
> edad8312cbbf9a3~1 + revert 7173949df45482 = no hang
>
> Could you confirm that you get the same results?
>
> Thanks,
> Paweł Gronowski
>
>
> On Fri, Jul 31, 2020 at 03:34:40PM +0200, Paweł Gronowski wrote:
>> Hey Matt,
>>
>> I have just tested the amd-staging-drm-next branch
>> (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
>> and I can reproduce your issue with:
>>
>> sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
>>
>> Which makes the sh hang with 100% usage.
>>
>> The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
>> both without and with my patches reapplied.
>> So the problem must be related to some commit that is present in the
>> amd-staging-drm-next but not in the mainline.
>>
>>
>> Paweł Gronowski
>>
>> On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
>>> Hey Pawel,
>>>
>>> I did confirm that this patch *introduced* the issue both with the
>>> bisect, and by testing reverting it.
>>>
>>> Now, there's a lot of fragile pieces in the dpm handling, so it could be
>>> this patch's interaction with something else that's causing it and it
>>> may well not be the fault of this code, but this is the patch that
>>> introduced the issue.
>>>
>>> I'll have some more time tomorrow to try to get down to root cause here,
>>> so maybe I'll have more to offer then.
>>>
>>> Thanks for taking a look,
>>> Matt
>>>
>>> On 7/30/20 6:31 PM, Paweł Gronowski wrote:
>>>> Hello Matt,
>>>>
>>>> Thank you for your testing. It seems that my gpu (RX 570) does not support the
>>>> vc setting so I can not exactly reproduce the issue. However I did trace the
>>>> code path the test case takes and it seems to correctly pass through the while
>>>> loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
>>>> The 'parameter' array is populated the same way as the original code did. Since
>>>> the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
>>>> unfortunately caused by something else.
>>>>
>>>>
>>>> Paweł Gronowski
>>>>
>>>> On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
>>>>> Hello all, I just did some testing with this applied, and while it no
>>>>> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
>>>>> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
>>>>> to, and staying at 100% CPU usage, with no indicating information in
>>>>> `dmesg` from the kernel.
>>>>>
>>>>> It appeared to work at least ONCE, but potentially not after.
>>>>>
>>>>> This is not unique to Navi, and caused the problem on a POLARIS10 card
>>>>> as well.
>>>>>
>>>>> Sorry for the bad news, and thanks for any insight you may have,
>>>>> Matt Coffin
>>>>>
>>>>> On 7/29/20 8:53 PM, Alex Deucher wrote:
>>>>>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
>>>>>>>
>>>>>>> Regression was introduced in commit 38e0c89a19fd
>>>>>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>>>>>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>>>>>>> -EINVAL for previously valid input. This was caused by an empty
>>>>>>> string (starting at the \0 character) being passed to the kstrtol.
>>>>>>>
>>>>>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
>>>>>>
>>>>>> Applied. Thanks!
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> index ebb8a28ff002..cbf623ff03bd 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>>>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>>>>>> tmp_str++;
>>>>>>> while (isspace(*++tmp_str));
>>>>>>>
>>>>>>> - while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> + while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>> ret = kstrtol(sub_str, 0, ¶meter[parameter_size]);
>>>>>>> if (ret)
>>>>>>> return -EINVAL;
>>>>>>> parameter_size++;
>>>>>>>
>>>>>>> + if (!tmp_str)
>>>>>>> + break;
>>>>>>> while (isspace(*tmp_str))
>>>>>>> tmp_str++;
>>>>>>> }
>>>>>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>>>>>> i++;
>>>>>>> memcpy(buf_cpy, buf, count-i);
>>>>>>> tmp_str = buf_cpy;
>>>>>>> - while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>>>>> + while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>>>> ret = kstrtol(sub_str, 0, ¶meter[parameter_size]);
>>>>>>> if (ret)
>>>>>>> return -EINVAL;
>>>>>>> parameter_size++;
>>>>>>> +
>>>>>>> + if (!tmp_str)
>>>>>>> + break;
>>>>>>> while (isspace(*tmp_str))
>>>>>>> tmp_str++;
>>>>>>> }
>>>>>>> --
>>>>>>> 2.25.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2020-07-31 21:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-29 23:11 [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile Paweł Gronowski
2020-07-30 2:53 ` Alex Deucher
2020-07-30 14:49 ` Matt Coffin
2020-07-31 0:31 ` Paweł Gronowski
2020-07-31 0:34 ` Matt Coffin
2020-07-31 13:34 ` Paweł Gronowski
2020-07-31 20:20 ` Paweł Gronowski
2020-07-31 21:25 ` Matt Coffin [this message]
2020-07-31 21:39 ` Matt Coffin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7cbe7df5-7798-148d-9b43-f99965704407@gmail.com \
--to=mcoffin13@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=me@woland.xyz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).