From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Huang Subject: Re: [PATCH] drm/amd/powerply: fix power reading on Fiji Date: Fri, 30 Mar 2018 11:08:53 -0400 Message-ID: References: <1522351312-27566-1-git-send-email-JinHuiEric.Huang@amd.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0598317337==" Return-path: In-Reply-To: Content-Language: en-US List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "amd-gfx" To: "Zhu, Rex" , "Deucher, Alexander" , "amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" This is a multi-part message in MIME format. --===============0598317337== Content-Type: multipart/alternative; boundary="------------F99B255D21A6DB3946648E01" Content-Language: en-US This is a multi-part message in MIME format. --------------F99B255D21A6DB3946648E01 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit I reproduced the issue reported by customer. When running a HSA test, repeating to read power via AGT and rocm-smi (driver). We set power limit of 175w to a Fiji. The results from AGT are all below 175w and the results from driver have a lot of value over 175, some are almost double of 175. So your test cases are not enough, you should run some OCL and HSA tests. I have tested 100ms and 150ms, the results still have some wrong. 200ms is good. It seems more sampling more accurate. The theoretical period is quoted from smu team and tools team. AGT is using more than 1sec of period. I don't know how long one cycle of dpm task is, but is sampling based on dpm task cycle? we should ask smu team to confirm. Regards, Eric On 03/30/2018 03:52 AM, Zhu, Rex wrote: > > >> Power value is wrong reported by customer. > > Hi Eric, > > What is the wrong value customer reported? > > In my end, there is no big difference between 20ms and 200ms or 2s. I > tested on Fiji/Tonga when gpu idle or running fullscreen glxgears. > > why need 50 ms? > > How long does the SMU core take to complete one cycle of dpm tasks? I > tested, it is less than 1 ms. > > > So when we delay 20 ms, The output is the average value of more than > 20 sampling. > > Best Regards > > Rex > > *From:*amd-gfx [mailto:amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org] *On > Behalf Of *Deucher, Alexander > *Sent:* Friday, March 30, 2018 4:00 AM > *To:* Huang, JinHuiEric; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > *Subject:* Re: [PATCH] drm/amd/powerply: fix power reading on Fiji > > Fiji and tonga I presume.  The current code seems to work fine on > tonga at least. > > Alex > > ------------------------------------------------------------------------ > > *From:*Huang, JinHuiEric > *Sent:* Thursday, March 29, 2018 3:58:42 PM > *To:* Deucher, Alexander; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > > *Subject:* Re: [PATCH] drm/amd/powerply: fix power reading on Fiji > > Right. This is only for Fiji. We should use PPSMC_MSG_GetCurrPkgPwr on > poaris. > > Thanks, > > Eric > > On 2018-03-29 03:54 PM, Deucher, Alexander wrote: > > Thanks. Patch is: > > Acked-by: Alex Deucher > > > Care to make a patch to use PPSMC_MSG_GetCurrPkgPwr on polaris > boards so we don't have to worry about the delay on them? > > Alex > > ------------------------------------------------------------------------ > > *From:*Huang, JinHuiEric > *Sent:* Thursday, March 29, 2018 3:40:22 PM > *To:* Deucher, Alexander; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > > *Subject:* Re: [PATCH] drm/amd/powerply: fix power reading on Fiji > > This reading method is shared with AGT tool only on Fiji, because > SMU FW doesn't support PPSMC_MSG_GetCurrPkgPwr message on Fiji. > But since polaris10, PPSMC_MSG_GetCurrPkgPwr has been supported. > We also use PPSMC_MSG_GetCurrPkgPwr on vega which SMU FW control > sampling period. Driver will not care about it. > > Eric > > On 2018-03-29 03:31 PM, Deucher, Alexander wrote: > > Do you know what the sampling period is on vega?  We should > try and be consistent.  How about making this selectable via > hwmon: > > power[1-*]_average_interval       Power use averaging > interval.  A poll > >                           notification is sent to this file if the > >                           hardware changes the averaging interval. > >                           Unit: milliseconds > >                           RW > > power[1-*]_average_interval_max   Maximum power use averaging > interval > >                           Unit: milliseconds > >                           RO > > power[1-*]_average_interval_min   Minimum power use averaging > interval > >                           Unit: milliseconds > >                           RO > > Then the user can select the interval they want. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Eric Huang > > *Sent:* Thursday, March 29, 2018 3:21:52 PM > *To:* amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > > *Cc:* Huang, JinHuiEric > *Subject:* [PATCH] drm/amd/powerply: fix power reading on Fiji > > Power value is wrong reported by customer. It is a regression by > > commit a7c7bc4c0c47eaac77b8fa92f0672032df7f4254 > Author: Rex Zhu > Date:   Mon Mar 27 15:32:59 2017 +0800 > >     drm/amd/powerplay: reduce sample period time > >     for power readings. > >     Signed-off-by: Rex Zhu > >     Reviewed-by: Alex Deucher > >     Signed-off-by: Alex Deucher > > > The theoretical sampling period is from 50ms to 4sec, original > 2sec > is long but correct, and 20ms is too short. change it to more > reasonable 200ms. > > Signed-off-by: Eric Huang > > --- >  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 3 ++- >  1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c > b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c > index a03b7fe..7631d80 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c > @@ -3377,7 +3377,8 @@ static int smu7_get_gpu_power(struct > pp_hwmgr *hwmgr, >                          "Failed to start pm status log!", >                          return -1); > > -       msleep_interruptible(20); > +       /* Sampling period from 50ms to 4sec */ > +       msleep_interruptible(200); > > PP_ASSERT_WITH_CODE(!smum_send_msg_to_smc(hwmgr, > PPSMC_MSG_PmStatusLogSample), > -- > 2.7.4 > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > --------------F99B255D21A6DB3946648E01 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit I reproduced the issue reported by customer. When running a HSA test, repeating to read power via AGT and rocm-smi (driver). We set power limit of 175w to a Fiji. The results from AGT are all below 175w and the results from driver have a lot of value over 175, some are almost double of 175. So your test cases are not enough, you should run some OCL and HSA tests.

I have tested 100ms and 150ms, the results still have some wrong. 200ms is good. It seems more sampling more accurate.

The theoretical period is quoted from smu team and tools team. AGT is using more than 1sec of period. I don't know how long one cycle of dpm task is, but is sampling based on dpm task cycle? we should ask smu team to confirm.

Regards,
Eric


On 03/30/2018 03:52 AM, Zhu, Rex wrote:

>> Power value is wrong reported by customer.

 

Hi Eric,

 

What is the wrong value customer reported?

 

In my end, there is no big difference between 20ms and 200ms or 2s. I tested on Fiji/Tonga when gpu idle or running fullscreen glxgears.

 

why need 50 ms?

 

How long does the SMU core take to complete one cycle of dpm tasks? I tested, it is less than 1 ms.


So when we delay 20 ms, The output is the average value of more than 20 sampling.

 

Best Regards

Rex

 

 

From: amd-gfx [mailto:amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org] On Behalf Of Deucher, Alexander
Sent: Friday, March 30, 2018 4:00 AM
To: Huang, JinHuiEric; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH] drm/amd/powerply: fix power reading on Fiji

 

Fiji and tonga I presume.  The current code seems to work fine on tonga at least.

 

Alex


From: Huang, JinHuiEric
Sent: Thursday, March 29, 2018 3:58:42 PM
To: Deucher, Alexander; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH] drm/amd/powerply: fix power reading on Fiji

 

Right. This is only for Fiji. We should use PPSMC_MSG_GetCurrPkgPwr on poaris.

 

Thanks,

Eric

 

On 2018-03-29 03:54 PM, Deucher, Alexander wrote:

Thanks. Patch is:

Acked-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>

Care to make a patch to use PPSMC_MSG_GetCurrPkgPwr on polaris boards so we don't have to worry about the delay on them?

 

Alex


From: Huang, JinHuiEric
Sent: Thursday, March 29, 2018 3:40:22 PM
To: Deucher, Alexander; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH] drm/amd/powerply: fix power reading on Fiji

 

This reading method is shared with AGT tool only on Fiji, because SMU FW doesn't support PPSMC_MSG_GetCurrPkgPwr message on Fiji. But since polaris10, PPSMC_MSG_GetCurrPkgPwr has been supported. We also use PPSMC_MSG_GetCurrPkgPwr on vega which SMU FW control sampling period. Driver will not care about it.

 

Eric

 

On 2018-03-29 03:31 PM, Deucher, Alexander wrote:

Do you know what the sampling period is on vega?  We should try and be consistent.  How about making this selectable via hwmon:

power[1-*]_average_interval       Power use averaging interval.  A poll
                          notification is sent to this file if the
                          hardware changes the averaging interval.
                          Unit: milliseconds
                          RW
 
power[1-*]_average_interval_max   Maximum power use averaging interval
                          Unit: milliseconds
                          RO
 
power[1-*]_average_interval_min   Minimum power use averaging interval
                          Unit: milliseconds
                          RO

 

Then the user can select the interval they want.

 

Alex


From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of Eric Huang <JinHuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
Sent: Thursday, March 29, 2018 3:21:52 PM
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: Huang, JinHuiEric
Subject: [PATCH] drm/amd/powerply: fix power reading on Fiji

 

Power value is wrong reported by customer. It is a regression by

commit a7c7bc4c0c47eaac77b8fa92f0672032df7f4254
Author: Rex Zhu <Rex.Zhu-5C7GfCeVMHo@public.gmane.org>
Date:   Mon Mar 27 15:32:59 2017 +0800

    drm/amd/powerplay: reduce sample period time

    for power readings.

    Signed-off-by: Rex Zhu <Rex.Zhu-5C7GfCeVMHo@public.gmane.org>
    Reviewed-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>
    Signed-off-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>

The theoretical sampling period is from 50ms to 4sec, original 2sec
is long but correct, and 20ms is too short. change it to more
reasonable 200ms.

Signed-off-by: Eric Huang <JinHuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index a03b7fe..7631d80 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -3377,7 +3377,8 @@ static int smu7_get_gpu_power(struct pp_hwmgr *hwmgr,
                         "Failed to start pm status log!",
                         return -1);
 
-       msleep_interruptible(20);
+       /* Sampling period from 50ms to 4sec */
+       msleep_interruptible(200);
 
         PP_ASSERT_WITH_CODE(!smum_send_msg_to_smc(hwmgr,
                         PPSMC_MSG_PmStatusLogSample),
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

 

 


--------------F99B255D21A6DB3946648E01-- --===============0598317337== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KYW1kLWdmeCBt YWlsaW5nIGxpc3QKYW1kLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5m cmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9hbWQtZ2Z4Cg== --===============0598317337==--