AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: "Paweł Gronowski" <me@woland.xyz>
To: Matt Coffin <mcoffin13@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile
Date: Fri, 31 Jul 2020 22:20:14 +0200
Message-ID: <20200731202014.GA3750@tower> (raw)
In-Reply-To: <20200731133437.GA4878@tower>

Hello again,

I just finished a bisect of amd-staging-drm-next and it looks like
the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
("drm/amdgpu: fix system hang issue during GPU reset").

It is a bit tricky, because it is commited on top of my first faulty patch
7173949df45482 ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") so
it needs to be reverted fix the premature -INVAL.

Test case:
  sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
Results:
  edad8312cbbf9a3 + revert 7173949df45482 = hang
  edad8312cbbf9a3~1 + revert 7173949df45482 = no hang

Could you confirm that you get the same results?

Thanks,
Paweł Gronowski


On Fri, Jul 31, 2020 at 03:34:40PM +0200, Paweł Gronowski wrote:
> Hey Matt,
> 
> I have just tested the amd-staging-drm-next branch 
> (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
> and I can reproduce your issue with:
> 
>   sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage'
> 
> Which makes the sh hang with 100% usage.
> 
> The issue does not happen on the mainline (d8b9faec54ae4bc2fff68bcd0befa93ace8256ce)
> both without and with my patches reapplied.
> So the problem must be related to some commit that is present in the
> amd-staging-drm-next but not in the mainline.
> 
> 
> Paweł Gronowski
> 
> On Thu, Jul 30, 2020 at 06:34:14PM -0600, Matt Coffin wrote:
> > Hey Pawel,
> > 
> > I did confirm that this patch *introduced* the issue both with the
> > bisect, and by testing reverting it.
> > 
> > Now, there's a lot of fragile pieces in the dpm handling, so it could be
> > this patch's interaction with something else that's causing it and it
> > may well not be the fault of this code, but this is the patch that
> > introduced the issue.
> > 
> > I'll have some more time tomorrow to try to get down to root cause here,
> > so maybe I'll have more to offer then.
> > 
> > Thanks for taking a look,
> > Matt
> > 
> > On 7/30/20 6:31 PM, Paweł Gronowski wrote:
> > > Hello Matt,
> > > 
> > > Thank you for your testing. It seems that my gpu (RX 570) does not support the
> > > vc setting so I can not exactly reproduce the issue. However I did trace the
> > > code path the test case takes and it seems to correctly pass through the while
> > > loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
> > > The 'parameter' array is populated the same way as the original code did. Since
> > > the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
> > > unfortunately caused by something else.
> > > 
> > > 
> > > Paweł Gronowski
> > > 
> > > On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
> > >> Hello all, I just did some testing with this applied, and while it no
> > >> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
> > >> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
> > >> to, and staying at 100% CPU usage, with no indicating information in
> > >> `dmesg` from the kernel.
> > >>
> > >> It appeared to work at least ONCE, but potentially not after.
> > >>
> > >> This is not unique to Navi, and caused the problem on a POLARIS10 card
> > >> as well.
> > >>
> > >> Sorry for the bad news, and thanks for any insight you may have,
> > >> Matt Coffin
> > >>
> > >> On 7/29/20 8:53 PM, Alex Deucher wrote:
> > >>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me@woland.xyz> wrote:
> > >>>>
> > >>>> Regression was introduced in commit 38e0c89a19fd
> > >>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> > >>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> > >>>> -EINVAL for previously valid input. This was caused by an empty
> > >>>> string (starting at the \0 character) being passed to the kstrtol.
> > >>>>
> > >>>> Signed-off-by: Paweł Gronowski <me@woland.xyz>
> > >>>
> > >>> Applied.  Thanks!
> > >>>
> > >>> Alex
> > >>>
> > >>>> ---
> > >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
> > >>>>  1 file changed, 7 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> index ebb8a28ff002..cbf623ff03bd 100644
> > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > >>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> > >>>>                 tmp_str++;
> > >>>>         while (isspace(*++tmp_str));
> > >>>>
> > >>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> > >>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> > >>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> > >>>>                 if (ret)
> > >>>>                         return -EINVAL;
> > >>>>                 parameter_size++;
> > >>>>
> > >>>> +               if (!tmp_str)
> > >>>> +                       break;
> > >>>>                 while (isspace(*tmp_str))
> > >>>>                         tmp_str++;
> > >>>>         }
> > >>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
> > >>>>                         i++;
> > >>>>                 memcpy(buf_cpy, buf, count-i);
> > >>>>                 tmp_str = buf_cpy;
> > >>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
> > >>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
> > >>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
> > >>>>                         if (ret)
> > >>>>                                 return -EINVAL;
> > >>>>                         parameter_size++;
> > >>>> +
> > >>>> +                       if (!tmp_str)
> > >>>> +                               break;
> > >>>>                         while (isspace(*tmp_str))
> > >>>>                                 tmp_str++;
> > >>>>                 }
> > >>>> --
> > >>>> 2.25.1
> > >>>>
> > >>>> _______________________________________________
> > >>>> amd-gfx mailing list
> > >>>> amd-gfx@lists.freedesktop.org
> > >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > >>> _______________________________________________
> > >>> amd-gfx mailing list
> > >>> amd-gfx@lists.freedesktop.org
> > >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > >>>
> > >>
> > > 
> > > 
> > > 
> > 
> 
> 
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-29 23:11 Paweł Gronowski
2020-07-30  2:53 ` Alex Deucher
2020-07-30 14:49   ` Matt Coffin
2020-07-31  0:31     ` Paweł Gronowski
2020-07-31  0:34       ` Matt Coffin
2020-07-31 13:34         ` Paweł Gronowski
2020-07-31 20:20           ` Paweł Gronowski [this message]
2020-07-31 21:25             ` Matt Coffin
2020-07-31 21:39             ` Matt Coffin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200731202014.GA3750@tower \
    --to=me@woland.xyz \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=mcoffin13@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git