All of lore.kernel.org
 help / color / mirror / Atom feed
* k10temp: ZEN3 readings are broken
@ 2020-12-22  1:45 Gabriel C
  2020-12-22  3:58 ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2020-12-22  1:45 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-hwmon, LKML, Wei Huang

Hello Guenter,

while trying to add ZEN3 support for zenpower out of tree modules, I find out
the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).

commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:

case 0x0 ... 0x1:       /* Zen3 */

however, this is wrong, we look for a model which is 0x21 for ZEN3,
these seem to
be steppings?

Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
the model.

Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
also ) that should be:

PLANE0  (ZEN_SVI_BASE + 0x10)
PLANE1  (ZEN_SVI_BASE + 0xc)

Which is the same as for ZEN2 >= 0x71. Since this is not really
documented and I have some
confirmations of these numbers from *somewhere* :-) I created a demo patch only.

I would like AMD people to really have a look at the driver and
confirm the changes, since
getting information from *somewhere*,  dosen't mean they are 100%
correct. However, the driver
is working with these changes.

In any way the model needs changing to 0x21 even if we let the other
readings broken.

There is my demo patch:

https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch

Also, there is some discuss and testing for both drivers:

https://github.com/ocerman/zenpower/issues/39


Best Regards,

Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  1:45 k10temp: ZEN3 readings are broken Gabriel C
@ 2020-12-22  3:58 ` Guenter Roeck
  2020-12-22  4:33   ` Wei Huang
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-12-22  3:58 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-hwmon, LKML, Wei Huang

Hi,

On 12/21/20 5:45 PM, Gabriel C wrote:
> Hello Guenter,
> 
> while trying to add ZEN3 support for zenpower out of tree modules, I find out
> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> 
> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
> 
> case 0x0 ... 0x1:       /* Zen3 */
> 
> however, this is wrong, we look for a model which is 0x21 for ZEN3,
> these seem to
> be steppings?
> 
> Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
> the model.
> 
> Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
> also ) that should be:
> 
> PLANE0  (ZEN_SVI_BASE + 0x10)
> PLANE1  (ZEN_SVI_BASE + 0xc)
> 
> Which is the same as for ZEN2 >= 0x71. Since this is not really
> documented and I have some
> confirmations of these numbers from *somewhere* :-) I created a demo patch only.
> 
> I would like AMD people to really have a look at the driver and
> confirm the changes, since
> getting information from *somewhere*,  dosen't mean they are 100%
> correct. However, the driver
> is working with these changes.
> 
> In any way the model needs changing to 0x21 even if we let the other
> readings broken.
> 
> There is my demo patch:
> 
> https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
> 
> Also, there is some discuss and testing for both drivers:
> 
> https://github.com/ocerman/zenpower/issues/39
> 

Thanks for the information. However, since I do not have time to actively maintain
the driver, since each chip variant seems to use different addresses and scales,
and since the information about voltages and currents is unpublished by AMD,
I'll remove support for voltage/current readings from the upstream driver.
I plan to send the patch doing that to Linus shortly after the commit window
closes (or even before that).

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  3:58 ` Guenter Roeck
@ 2020-12-22  4:33   ` Wei Huang
  2020-12-22  5:09     ` Gabriel C
  2020-12-22  4:33   ` Gabriel C
  2020-12-23 10:41   ` Jan Engelhardt
  2 siblings, 1 reply; 14+ messages in thread
From: Wei Huang @ 2020-12-22  4:33 UTC (permalink / raw)
  To: Guenter Roeck, Gabriel C; +Cc: linux-hwmon, LKML



On 12/21/20 9:58 PM, Guenter Roeck wrote:
> Hi,
> 
> On 12/21/20 5:45 PM, Gabriel C wrote:
>> Hello Guenter,
>>
>> while trying to add ZEN3 support for zenpower out of tree modules, I find out
>> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
>>
>> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
>>
>> case 0x0 ... 0x1:       /* Zen3 */
>>
>> however, this is wrong, we look for a model which is 0x21 for ZEN3,
>> these seem to
>> be steppings?

These are model numbers for server CPUs. I believe 0x21 is for desktop 
CPUs. In other words, current upstream code doesn't support your CPUs. 
You are welcomed to add support for 0x21, but it is wrong to remove 
support for 0x00/0x01.

>>
>> Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
>> the model.
>>
>> Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
>> also ) that should be:
>>
>> PLANE0  (ZEN_SVI_BASE + 0x10)
>> PLANE1  (ZEN_SVI_BASE + 0xc)

Same problem here with model 0x71. 0x31 is for server CPUs.

>>
>> Which is the same as for ZEN2 >= 0x71. Since this is not really
>> documented and I have some
>> confirmations of these numbers from *somewhere* :-) I created a demo patch only.
>>
>> I would like AMD people to really have a look at the driver and
>> confirm the changes, since
>> getting information from *somewhere*,  dosen't mean they are 100%
>> correct. However, the driver
>> is working with these changes.
>>
>> In any way the model needs changing to 0x21 even if we let the other
>> readings broken.
>>
>> There is my demo patch:
>>
>> https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch

For family 19h, the patch should look like. But this might not matter 
anymore as suggested by Guenter below.

  /* F19h thermal registers through SMN */
#define F19H_M01_SVI_TEL_PLANE0			(ZEN_SVI_BASE + 0x14)
#define F19H_M01_SVI_TEL_PLANE1			(ZEN_SVI_BASE + 0x10)
+/* Zen3 Ryzen */
+#define F19H_M21H_SVI_TEL_PLANE0		(ZEN_SVI_BASE + 0x10)
+#define F19H_M21H_SVI_TEL_PLANE1		(ZEN_SVI_BASE + 0xc)

Then add the following change:

  		switch (boot_cpu_data.x86_model) {
		case 0x0 ... 0x1:	/* Zen3 */
  			data->show_current = true;
			data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
			data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
  			data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
  			data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
  			k10temp_get_ccd_support(pdev, data, 8);
+		case 0x21:	/* Zen3 */
+ 			data->show_current = true;
+			data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
+			data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
+ 			data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
+ 			data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
+ 			k10temp_get_ccd_support(pdev, data, 8);

>>
>> Also, there is some discuss and testing for both drivers:
>>
>> https://github.com/ocerman/zenpower/issues/39
>>
> 
> Thanks for the information. However, since I do not have time to actively maintain
> the driver, since each chip variant seems to use different addresses and scales,
> and since the information about voltages and currents is unpublished by AMD,
> I'll remove support for voltage/current readings from the upstream driver.
> I plan to send the patch doing that to Linus shortly after the commit window
> closes (or even before that).

I believe Guenter is talking about 
https://www.spinics.net/lists/linux-hwmon/msg10252.html.

> 
> Thanks,
> Guenter
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  3:58 ` Guenter Roeck
  2020-12-22  4:33   ` Wei Huang
@ 2020-12-22  4:33   ` Gabriel C
  2020-12-22  6:07     ` Guenter Roeck
  2020-12-22  6:16     ` Guenter Roeck
  2020-12-23 10:41   ` Jan Engelhardt
  2 siblings, 2 replies; 14+ messages in thread
From: Gabriel C @ 2020-12-22  4:33 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-hwmon, LKML, Wei Huang

Am Di., 22. Dez. 2020 um 04:58 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
>
> Hi,
>
> On 12/21/20 5:45 PM, Gabriel C wrote:
> > Hello Guenter,
> >
> > while trying to add ZEN3 support for zenpower out of tree modules, I find out
> > the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> >
> > commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
> >
> > case 0x0 ... 0x1:       /* Zen3 */
> >
> > however, this is wrong, we look for a model which is 0x21 for ZEN3,
> > these seem to
> > be steppings?
> >
> > Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
> > the model.
> >
> > Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
> > also ) that should be:
> >
> > PLANE0  (ZEN_SVI_BASE + 0x10)
> > PLANE1  (ZEN_SVI_BASE + 0xc)
> >
> > Which is the same as for ZEN2 >= 0x71. Since this is not really
> > documented and I have some
> > confirmations of these numbers from *somewhere* :-) I created a demo patch only.
> >
> > I would like AMD people to really have a look at the driver and
> > confirm the changes, since
> > getting information from *somewhere*,  dosen't mean they are 100%
> > correct. However, the driver
> > is working with these changes.
> >
> > In any way the model needs changing to 0x21 even if we let the other
> > readings broken.
> >
> > There is my demo patch:
> >
> > https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
> >
> > Also, there is some discuss and testing for both drivers:
> >
> > https://github.com/ocerman/zenpower/issues/39
> >
>
> Thanks for the information. However, since I do not have time to actively maintain
> the driver, since each chip variant seems to use different addresses and scales,
> and since the information about voltages and currents is unpublished by AMD,
> I'll remove support for voltage/current readings from the upstream driver.
> I plan to send the patch doing that to Linus shortly after the commit window
> closes (or even before that).

Yes I saw that commit, and it is a shame how AMD is unwilling to
support 'sensors'
in their CPUs in 2020. I can understand why you can't maintain that
mess, but I don't
understand AMD.

However, it is not only about the Voltage, ZEN3 Ryzen Desktop CPUs,
 have a model ID of 0x21, meaning while only 0x0 & 0x1 is here now we
only hit the else code and shows some weird temps, and no info about CCD's.

See:

smpboot: CPU0: AMD Ryzen 7 5800X 8-Core Processor (family: 0x19,
model: 0x21, stepping: 0x0)
smpboot: CPU0: AMD Ryzen 9 5900X 12-Core Processor (family: 0x19,
model: 0x21, stepping: 0x0)

etc...

So we need at least:

...
case 0x0 ... 0x1: /* ZEN3 SP3 ?!? */
case 0x21: /* ZEN3 Ryzen Desktop */
...

I believe 0x0 & 0x1 are NOT yet released EPYC/TR CPUs based on ZEN3.
At least is what the weird amd_energy driver added and since is only supporting
fam 17h model 0x31 which is TR 3000 & SP3 Rome, I guess fam 19h 0x1 is
TR/SP3 ZEN3.

( BTW off-topic this amd_energ driver should be removed or depend on BROKEN,
since is working as root only and breaks the sensors command output )

 If that is the case, even if you remove the code, I think I
understand how the PLANEX registers are working
and can at least help the out of tree driver with these.

Maybe one day AMD is getting serious, who knows.

>
> Thanks,
> Guenter

Best Regards,

Gabriel C.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  4:33   ` Wei Huang
@ 2020-12-22  5:09     ` Gabriel C
  2020-12-22  6:08       ` Wei Huang
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2020-12-22  5:09 UTC (permalink / raw)
  To: Wei Huang; +Cc: Guenter Roeck, linux-hwmon, LKML

Am Di., 22. Dez. 2020 um 05:33 Uhr schrieb Wei Huang <wei.huang2@amd.com>:
>
>
>
> On 12/21/20 9:58 PM, Guenter Roeck wrote:
> > Hi,
> >
> > On 12/21/20 5:45 PM, Gabriel C wrote:
> >> Hello Guenter,
> >>
> >> while trying to add ZEN3 support for zenpower out of tree modules, I find out
> >> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> >>
> >> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
> >>
> >> case 0x0 ... 0x1:       /* Zen3 */
> >>
> >> however, this is wrong, we look for a model which is 0x21 for ZEN3,
> >> these seem to
> >> be steppings?
>
> These are model numbers for server CPUs. I believe 0x21 is for desktop
> CPUs. In other words, current upstream code doesn't support your CPUs.
> You are welcomed to add support for 0x21, but it is wrong to remove
> support for 0x00/0x01.

I figured that myself after seeing what was committed to amd_energy driver.
Would be better you as the author of the patch to have a better commit
message to start with.


commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e
Author: Wei Huang <wei.huang2@amd.com>
Date:   Mon Sep 14 15:07:15 2020 -0500

   hwmon: (k10temp) Add support for Zen3 CPUs
....

Which you didn't. That should read:

"Added support for NOT yet released SP3 ZEN3 CPU"

Right?

>
> >>
> >> Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
> >> the model.
> >>
> >> Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
> >> also ) that should be:
> >>
> >> PLANE0  (ZEN_SVI_BASE + 0x10)
> >> PLANE1  (ZEN_SVI_BASE + 0xc)
>
> Same problem here with model 0x71. 0x31 is for server CPUs.

Yes, is why I split both in my 'guess what the eff is this about' patch.

0x31 is TR 3000/ Sp3 ZEN2 , while 0x71 is ZEN2 Desktop.
>
> >>
> >> Which is the same as for ZEN2 >= 0x71. Since this is not really
> >> documented and I have some
> >> confirmations of these numbers from *somewhere* :-) I created a demo patch only.
> >>
> >> I would like AMD people to really have a look at the driver and
> >> confirm the changes, since
> >> getting information from *somewhere*,  dosen't mean they are 100%
> >> correct. However, the driver
> >> is working with these changes.
> >>
> >> In any way the model needs changing to 0x21 even if we let the other
> >> readings broken.
> >>
> >> There is my demo patch:
> >>
> >> https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
>
> For family 19h, the patch should look like. But this might not matter
> anymore as suggested by Guenter below.
>
>   /* F19h thermal registers through SMN */
> #define F19H_M01_SVI_TEL_PLANE0                 (ZEN_SVI_BASE + 0x14)
> #define F19H_M01_SVI_TEL_PLANE1                 (ZEN_SVI_BASE + 0x10)
> +/* Zen3 Ryzen */
> +#define F19H_M21H_SVI_TEL_PLANE0               (ZEN_SVI_BASE + 0x10)
> +#define F19H_M21H_SVI_TEL_PLANE1               (ZEN_SVI_BASE + 0xc)
>
> Then add the following change:
>
>                 switch (boot_cpu_data.x86_model) {
>                 case 0x0 ... 0x1:       /* Zen3 */
>                         data->show_current = true;
>                         data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
>                         data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
>                         data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
>                         data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
>                         k10temp_get_ccd_support(pdev, data, 8);
> +               case 0x21:      /* Zen3 */
> +                       data->show_current = true;
> +                       data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
> +                       data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
> +                       data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
> +                       data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
> +                       k10temp_get_ccd_support(pdev, data, 8);
>
> >>

You are a really funny guy.
After _all_ these are YOUR Company CPUs, and want me to fix these without docs?
Sure I can, but the confusion started with your wrong commit message.

Besides, is that how AMD operates now?
Let the customer pay thousands of euros for HW and then tell
him to fix or add drivers support himself? Very interesting.

And yes it matters even after removing these.

case 0x0 ... 0x1:       /* Zen3 SP3 ( NOT YET RELEASED ) */
case 0x21:      /* Zen3 Ryzen Desktop  */
   ....

Right?

> >> Also, there is some discuss and testing for both drivers:
> >>
> >> https://github.com/ocerman/zenpower/issues/39
> >>
> >
> > Thanks for the information. However, since I do not have time to actively maintain
> > the driver, since each chip variant seems to use different addresses and scales,
> > and since the information about voltages and currents is unpublished by AMD,
> > I'll remove support for voltage/current readings from the upstream driver.
> > I plan to send the patch doing that to Linus shortly after the commit window
> > closes (or even before that).
>
> I believe Guenter is talking about
> https://www.spinics.net/lists/linux-hwmon/msg10252.html.

I know and don't get me started about your replay to that because
it looks like you believe the Linux community or your Linux customers
are somewhat stupid.

Best Reagrds,

Gabriel C.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  4:33   ` Gabriel C
@ 2020-12-22  6:07     ` Guenter Roeck
  2020-12-22  6:16     ` Guenter Roeck
  1 sibling, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-12-22  6:07 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-hwmon, LKML, Wei Huang

On Tue, Dec 22, 2020 at 05:33:17AM +0100, Gabriel C wrote:
> 
> ( BTW off-topic this amd_energ driver should be removed or depend on BROKEN,
> since is working as root only and breaks the sensors command output )
> 

That is because of a security issue. It just needs to be reworked
to cache readings for a while to avoid that. Any volunteers ?

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  5:09     ` Gabriel C
@ 2020-12-22  6:08       ` Wei Huang
  0 siblings, 0 replies; 14+ messages in thread
From: Wei Huang @ 2020-12-22  6:08 UTC (permalink / raw)
  To: Gabriel C; +Cc: Guenter Roeck, linux-hwmon, LKML



On 12/21/20 11:09 PM, Gabriel C wrote:
> Am Di., 22. Dez. 2020 um 05:33 Uhr schrieb Wei Huang <wei.huang2@amd.com>:
>>
>>
>>
>> On 12/21/20 9:58 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On 12/21/20 5:45 PM, Gabriel C wrote:
>>>> Hello Guenter,
>>>>
>>>> while trying to add ZEN3 support for zenpower out of tree modules, I find out
>>>> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
>>>>
>>>> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
>>>>
>>>> case 0x0 ... 0x1:       /* Zen3 */
>>>>
>>>> however, this is wrong, we look for a model which is 0x21 for ZEN3,
>>>> these seem to
>>>> be steppings?
>>
>> These are model numbers for server CPUs. I believe 0x21 is for desktop
>> CPUs. In other words, current upstream code doesn't support your CPUs.
>> You are welcomed to add support for 0x21, but it is wrong to remove
>> support for 0x00/0x01.
> 
> I figured that myself after seeing what was committed to amd_energy driver.
> Would be better you as the author of the patch to have a better commit
> message to start with.
> 
> 
> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e
> Author: Wei Huang <wei.huang2@amd.com>
> Date:   Mon Sep 14 15:07:15 2020 -0500
> 
>     hwmon: (k10temp) Add support for Zen3 CPUs
> ....
> 
> Which you didn't. That should read:
> 
> "Added support for NOT yet released SP3 ZEN3 CPU"
> 
> Right?

Yes. This subject line can be more clear with something like "Add 
support for Zen3 Server and TR CPUs".

> 
>>
>>>>
>>>> Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
>>>> the model.
>>>>
>>>> Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
>>>> also ) that should be:
>>>>
>>>> PLANE0  (ZEN_SVI_BASE + 0x10)
>>>> PLANE1  (ZEN_SVI_BASE + 0xc)
>>
>> Same problem here with model 0x71. 0x31 is for server CPUs.
> 
> Yes, is why I split both in my 'guess what the eff is this about' patch.
> 
> 0x31 is TR 3000/ Sp3 ZEN2 , while 0x71 is ZEN2 Desktop.
>>
>>>>
>>>> Which is the same as for ZEN2 >= 0x71. Since this is not really
>>>> documented and I have some
>>>> confirmations of these numbers from *somewhere* :-) I created a demo patch only.
>>>>
>>>> I would like AMD people to really have a look at the driver and
>>>> confirm the changes, since
>>>> getting information from *somewhere*,  dosen't mean they are 100%
>>>> correct. However, the driver
>>>> is working with these changes.
>>>>
>>>> In any way the model needs changing to 0x21 even if we let the other
>>>> readings broken.
>>>>
>>>> There is my demo patch:
>>>>
>>>> https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
>>
>> For family 19h, the patch should look like. But this might not matter
>> anymore as suggested by Guenter below.
>>
>>    /* F19h thermal registers through SMN */
>> #define F19H_M01_SVI_TEL_PLANE0                 (ZEN_SVI_BASE + 0x14)
>> #define F19H_M01_SVI_TEL_PLANE1                 (ZEN_SVI_BASE + 0x10)
>> +/* Zen3 Ryzen */
>> +#define F19H_M21H_SVI_TEL_PLANE0               (ZEN_SVI_BASE + 0x10)
>> +#define F19H_M21H_SVI_TEL_PLANE1               (ZEN_SVI_BASE + 0xc)
>>
>> Then add the following change:
>>
>>                  switch (boot_cpu_data.x86_model) {
>>                  case 0x0 ... 0x1:       /* Zen3 */
>>                          data->show_current = true;
>>                          data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
>>                          data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
>>                          data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
>>                          data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
>>                          k10temp_get_ccd_support(pdev, data, 8);
>> +               case 0x21:      /* Zen3 */
>> +                       data->show_current = true;
>> +                       data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
>> +                       data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
>> +                       data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
>> +                       data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
>> +                       k10temp_get_ccd_support(pdev, data, 8);
>>
>>>>
> 
> You are a really funny guy.
> After _all_ these are YOUR Company CPUs, and want me to fix these without docs?
> Sure I can, but the confusion started with your wrong commit message.

Sorry for the confusion. The review comments above was merely to point 
out server parts won't be supported if 0x0..0x1 is removed. I do 
appreciate the test results and bug report. The original commit 
unfortunately doesn't work on your CPUs. It was indeed a misfire from my 
side.

> 
> Besides, is that how AMD operates now?
> Let the customer pay thousands of euros for HW and then tell
> him to fix or add drivers support himself? Very interesting.
> 
> And yes it matters even after removing these.
> 
> case 0x0 ... 0x1:       /* Zen3 SP3 ( NOT YET RELEASED ) */
> case 0x21:      /* Zen3 Ryzen Desktop  */
>     ....
> 
> Right?
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  4:33   ` Gabriel C
  2020-12-22  6:07     ` Guenter Roeck
@ 2020-12-22  6:16     ` Guenter Roeck
  2020-12-22 15:26       ` Gabriel C
  1 sibling, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2020-12-22  6:16 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-hwmon, LKML, Wei Huang

On Tue, Dec 22, 2020 at 05:33:17AM +0100, Gabriel C wrote:
[ ... ]
> At least is what the weird amd_energy driver added and since is only supporting
> fam 17h model 0x31 which is TR 3000 & SP3 Rome, I guess fam 19h 0x1 is
> TR/SP3 ZEN3.

The limited model support is because people nowadays are not willing to
accept that reported values may not always be perfect ... and the reported
energy for non-server parts is known to be not always perfect. Kind of an
odd situation: If we support non-server parts, we have people complain
that values are not perfect. If we only support server parts, we have
people complain that only server parts are supported. For us, that is
a lose-lose situation. I used to think that is is better to report
_something_, but the (sometimes loud) complaints about lack of perfection
teached me a lesson. So now my reaction is to drop support if I get
complaints about lack of perfection.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  6:16     ` Guenter Roeck
@ 2020-12-22 15:26       ` Gabriel C
  2020-12-22 15:51         ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2020-12-22 15:26 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-hwmon, LKML, Wei Huang

Am Di., 22. Dez. 2020 um 07:16 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
>
> On Tue, Dec 22, 2020 at 05:33:17AM +0100, Gabriel C wrote:
> [ ... ]
> > At least is what the weird amd_energy driver added and since is only supporting
> > fam 17h model 0x31 which is TR 3000 & SP3 Rome, I guess fam 19h 0x1 is
> > TR/SP3 ZEN3.
>
> The limited model support is because people nowadays are not willing to
> accept that reported values may not always be perfect ... and the reported
> energy for non-server parts is known to be not always perfect. Kind of an
> odd situation: If we support non-server parts, we have people complain
> that values are not perfect. If we only support server parts, we have
> people complain that only server parts are supported. For us, that is
> a lose-lose situation. I used to think that is is better to report
> _something_, but the (sometimes loud) complaints about lack of perfection
> teached me a lesson. So now my reaction is to drop support if I get
> complaints about lack of perfection.
>

I agree it is an odd situation with these modules, but having
something is better than nothing.
As for the amd_energy driver, yes it is off on some platforms by 2%-5%
or alike but without having
that support in the kernel, regardless of the module, we cannot ever
come to perfection or near it.

For both k10temp & amd_energy driver I suggest to not drop the support
 but add kernel modules
options disabled by default, much like a lot laptop platform drivers
have for different reasons.

The amd_energy driver could have some any_ryzen option which turned
off by default.
That way people may decide if they want to use it even when not 100%
perfect and can report
back on platforms the reporting is accurate.
Waiting for AMD to give us ID of what may be in their eyes accurate is
like waiting for pigs to fly.

The k10temp module much like the same, some experimental_voltage_report module
option will be fine for now, I think.

I'm also sure owner of AMD HW will help out optimizing and maintaining the code.

> Guenter

Best Regards,

Gabriel C.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22 15:26       ` Gabriel C
@ 2020-12-22 15:51         ` Guenter Roeck
  0 siblings, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-12-22 15:51 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-hwmon, LKML, Wei Huang

On 12/22/20 7:26 AM, Gabriel C wrote:
> Am Di., 22. Dez. 2020 um 07:16 Uhr schrieb Guenter Roeck <linux@roeck-us.net>:
>>
>> On Tue, Dec 22, 2020 at 05:33:17AM +0100, Gabriel C wrote:
>> [ ... ]
>>> At least is what the weird amd_energy driver added and since is only supporting
>>> fam 17h model 0x31 which is TR 3000 & SP3 Rome, I guess fam 19h 0x1 is
>>> TR/SP3 ZEN3.
>>
>> The limited model support is because people nowadays are not willing to
>> accept that reported values may not always be perfect ... and the reported
>> energy for non-server parts is known to be not always perfect. Kind of an
>> odd situation: If we support non-server parts, we have people complain
>> that values are not perfect. If we only support server parts, we have
>> people complain that only server parts are supported. For us, that is
>> a lose-lose situation. I used to think that is is better to report
>> _something_, but the (sometimes loud) complaints about lack of perfection
>> teached me a lesson. So now my reaction is to drop support if I get
>> complaints about lack of perfection.
>>
> 
> I agree it is an odd situation with these modules, but having
> something is better than nothing.

That is your opinion, and it used to be mine as well. As I said, I have
learned from the feedback.

> As for the amd_energy driver, yes it is off on some platforms by 2%-5%
> or alike but without having
> that support in the kernel, regardless of the module, we cannot ever
> come to perfection or near it.
> 
> For both k10temp & amd_energy driver I suggest to not drop the support
>  but add kernel modules
> options disabled by default, much like a lot laptop platform drivers
> have for different reasons.
> 

That would just add complexity for little gain. The code would still have
to be maintained, and as experience (and the out-of-tree driver) has shown
this is a never ending story. Plus, it would still be inaccurate, leading
to complaints, module parameter or not.

> The amd_energy driver could have some any_ryzen option which turned
> off by default.
> That way people may decide if they want to use it even when not 100%
> perfect and can report
> back on platforms the reporting is accurate.
> Waiting for AMD to give us ID of what may be in their eyes accurate is
> like waiting for pigs to fly.
> 
> The k10temp module much like the same, some experimental_voltage_report module
> option will be fine for now, I think.
> 
> I'm also sure owner of AMD HW will help out optimizing and maintaining the code.
> 

Not really. My experience is that almost everyone will just complain.
It was a bad idea to add voltage/current reporting to the k10temp driver,
and it is time to revert it. If someone else wants to write (and maintain)
a separate amd_voltage or similar driver, I am all open to accept it.

Note that even you suggested to _drop_ the amd energy driver instead of
fixing it. I'll take that as a qed.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-22  3:58 ` Guenter Roeck
  2020-12-22  4:33   ` Wei Huang
  2020-12-22  4:33   ` Gabriel C
@ 2020-12-23 10:41   ` Jan Engelhardt
  2020-12-23 11:22     ` Gabriel C
                       ` (2 more replies)
  2 siblings, 3 replies; 14+ messages in thread
From: Jan Engelhardt @ 2020-12-23 10:41 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Gabriel C, linux-hwmon, LKML, Wei Huang


On Tuesday 2020-12-22 04:58, Guenter Roeck wrote:
>On 12/21/20 5:45 PM, Gabriel C wrote:
>> Hello Guenter,
>> 
>> while trying to add ZEN3 support for zenpower out of tree modules, I find out
>> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
>
>[...] since I do not have time to actively maintain
>the driver, since each chip variant seems to use different addresses and scales,
>and since the information about voltages and currents is unpublished by AMD,
>I'll remove support for voltage/current readings from the upstream driver.

I support that decision.

/proc/cpuinfo::AMD Ryzen 7 3700X 8-Core Processor, fam 23 model 113 step 0

A synthetic load (perl -e '1 while 1') x 16 shows:
Adapter: PCI adapter
Vcore:        +1.28 V
Vsoc:         +1.02 V
Tctl:         +94.8°C
Tdie:         +94.8°C
Tccd1:        +94.8°C
Icore:       +76.00 A
Isoc:         +6.75 A

A BOINC workload on average:
k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.17 V  
Vsoc:         +1.02 V  
Tctl:         +94.9°C  
Tdie:         +94.9°C  
Tccd1:        +95.0°C  
Icore:       +88.00 A  
Isoc:         +8.00 A  

The BOINC workload, when it momentarily spikes:
Adapter: PCI adapter
Vcore:        +1.32 V  
Vsoc:         +1.02 V  
Tctl:         +94.1°C  
Tdie:         +94.1°C  
Tccd1:        +96.0°C  
Icore:       +105.00 A  
Isoc:         +7.75 A  

For a processor sold as a 65 W part, observing reported sensors as 
88 A x 1.17 V + 8 A x 1.02 V = 111.12 W just can't be. We are off by a 
factor of about 2.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-23 10:41   ` Jan Engelhardt
@ 2020-12-23 11:22     ` Gabriel C
  2020-12-23 11:27     ` Gabriel C
  2020-12-23 14:25     ` Guenter Roeck
  2 siblings, 0 replies; 14+ messages in thread
From: Gabriel C @ 2020-12-23 11:22 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Guenter Roeck, linux-hwmon, LKML, Wei Huang

Am Mi., 23. Dez. 2020 um 11:41 Uhr schrieb Jan Engelhardt <jengelh@inai.de>:
>
>
> On Tuesday 2020-12-22 04:58, Guenter Roeck wrote:
> >On 12/21/20 5:45 PM, Gabriel C wrote:
> >> Hello Guenter,
> >>
> >> while trying to add ZEN3 support for zenpower out of tree modules, I find out
> >> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> >
> >[...] since I do not have time to actively maintain
> >the driver, since each chip variant seems to use different addresses and scales,
> >and since the information about voltages and currents is unpublished by AMD,
> >I'll remove support for voltage/current readings from the upstream driver.
>
> I support that decision.
>
> /proc/cpuinfo::AMD Ryzen 7 3700X 8-Core Processor, fam 23 model 113 step 0
>
> A synthetic load (perl -e '1 while 1') x 16 shows:
> Adapter: PCI adapter
> Vcore:        +1.28 V
> Vsoc:         +1.02 V
> Tctl:         +94.8°C
> Tdie:         +94.8°C
> Tccd1:        +94.8°C
> Icore:       +76.00 A
> Isoc:         +6.75 A
>
> A BOINC workload on average:
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.17 V
> Vsoc:         +1.02 V
> Tctl:         +94.9°C
> Tdie:         +94.9°C
> Tccd1:        +95.0°C
> Icore:       +88.00 A
> Isoc:         +8.00 A
>
> The BOINC workload, when it momentarily spikes:
> Adapter: PCI adapter
> Vcore:        +1.32 V
> Vsoc:         +1.02 V
> Tctl:         +94.1°C
> Tdie:         +94.1°C
> Tccd1:        +96.0°C
> Icore:       +105.00 A
> Isoc:         +7.75 A
>
> For a processor sold as a 65 W part, observing reported sensors as
> 88 A x 1.17 V + 8 A x 1.02 V = 111.12 W just can't be. We are off by a
> factor of about 2.

Yes, those are wrong, bc the code is wrong.
ZEN2 desktop is mixed with ZEN2 Server/TR code.

Best Reagrds,

Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-23 10:41   ` Jan Engelhardt
  2020-12-23 11:22     ` Gabriel C
@ 2020-12-23 11:27     ` Gabriel C
  2020-12-23 14:25     ` Guenter Roeck
  2 siblings, 0 replies; 14+ messages in thread
From: Gabriel C @ 2020-12-23 11:27 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Guenter Roeck, linux-hwmon, LKML, Wei Huang

Am Mi., 23. Dez. 2020 um 11:41 Uhr schrieb Jan Engelhardt <jengelh@inai.de>:
>
>
> On Tuesday 2020-12-22 04:58, Guenter Roeck wrote:
> >On 12/21/20 5:45 PM, Gabriel C wrote:
> >> Hello Guenter,
> >>
> >> while trying to add ZEN3 support for zenpower out of tree modules, I find out
> >> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> >
> >[...] since I do not have time to actively maintain
> >the driver, since each chip variant seems to use different addresses and scales,
> >and since the information about voltages and currents is unpublished by AMD,
> >I'll remove support for voltage/current readings from the upstream driver.
>
> I support that decision.
>
> /proc/cpuinfo::AMD Ryzen 7 3700X 8-Core Processor, fam 23 model 113 step 0
>
> A synthetic load (perl -e '1 while 1') x 16 shows:
> Adapter: PCI adapter
> Vcore:        +1.28 V
> Vsoc:         +1.02 V
> Tctl:         +94.8°C
> Tdie:         +94.8°C
> Tccd1:        +94.8°C
> Icore:       +76.00 A
> Isoc:         +6.75 A
>
> A BOINC workload on average:
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.17 V
> Vsoc:         +1.02 V
> Tctl:         +94.9°C
> Tdie:         +94.9°C
> Tccd1:        +95.0°C
> Icore:       +88.00 A
> Isoc:         +8.00 A
>
> The BOINC workload, when it momentarily spikes:
> Adapter: PCI adapter
> Vcore:        +1.32 V
> Vsoc:         +1.02 V
> Tctl:         +94.1°C
> Tdie:         +94.1°C
> Tccd1:        +96.0°C
> Icore:       +105.00 A
> Isoc:         +7.75 A
>
> For a processor sold as a 65 W part, observing reported sensors as
> 88 A x 1.17 V + 8 A x 1.02 V = 111.12 W just can't be. We are off by a
> factor of about 2.

Just before I forget, even with 100% correct code you could still be off by
a factor of 2 with a broken BIOS or a vendor who is trying to bypass AMD
spec limits.

See as an example this topic:
https://cutt.ly/7h1bT48

Best Regards,

Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: k10temp: ZEN3 readings are broken
  2020-12-23 10:41   ` Jan Engelhardt
  2020-12-23 11:22     ` Gabriel C
  2020-12-23 11:27     ` Gabriel C
@ 2020-12-23 14:25     ` Guenter Roeck
  2 siblings, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-12-23 14:25 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Gabriel C, linux-hwmon, LKML, Wei Huang

On 12/23/20 2:41 AM, Jan Engelhardt wrote:
> 
> On Tuesday 2020-12-22 04:58, Guenter Roeck wrote:
>> On 12/21/20 5:45 PM, Gabriel C wrote:
>>> Hello Guenter,
>>>
>>> while trying to add ZEN3 support for zenpower out of tree modules, I find out
>>> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
>>
>> [...] since I do not have time to actively maintain
>> the driver, since each chip variant seems to use different addresses and scales,
>> and since the information about voltages and currents is unpublished by AMD,
>> I'll remove support for voltage/current readings from the upstream driver.
> 
> I support that decision.
> 
> /proc/cpuinfo::AMD Ryzen 7 3700X 8-Core Processor, fam 23 model 113 step 0
> 
> A synthetic load (perl -e '1 while 1') x 16 shows:
> Adapter: PCI adapter
> Vcore:        +1.28 V
> Vsoc:         +1.02 V
> Tctl:         +94.8°C
> Tdie:         +94.8°C
> Tccd1:        +94.8°C
> Icore:       +76.00 A
> Isoc:         +6.75 A
> 
> A BOINC workload on average:
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.17 V  
> Vsoc:         +1.02 V  
> Tctl:         +94.9°C  
> Tdie:         +94.9°C  
> Tccd1:        +95.0°C  
> Icore:       +88.00 A  
> Isoc:         +8.00 A  
> 
> The BOINC workload, when it momentarily spikes:
> Adapter: PCI adapter
> Vcore:        +1.32 V  
> Vsoc:         +1.02 V  
> Tctl:         +94.1°C  
> Tdie:         +94.1°C  
> Tccd1:        +96.0°C  
> Icore:       +105.00 A  
> Isoc:         +7.75 A  
> 
> For a processor sold as a 65 W part, observing reported sensors as 
> 88 A x 1.17 V + 8 A x 1.02 V = 111.12 W just can't be. We are off by a 
> factor of about 2.
> 

Currents were always supposed to be unscaled. So this post is again
proving my point.

Either case, the code removing voltage and current support is now upstream.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-23 14:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22  1:45 k10temp: ZEN3 readings are broken Gabriel C
2020-12-22  3:58 ` Guenter Roeck
2020-12-22  4:33   ` Wei Huang
2020-12-22  5:09     ` Gabriel C
2020-12-22  6:08       ` Wei Huang
2020-12-22  4:33   ` Gabriel C
2020-12-22  6:07     ` Guenter Roeck
2020-12-22  6:16     ` Guenter Roeck
2020-12-22 15:26       ` Gabriel C
2020-12-22 15:51         ` Guenter Roeck
2020-12-23 10:41   ` Jan Engelhardt
2020-12-23 11:22     ` Gabriel C
2020-12-23 11:27     ` Gabriel C
2020-12-23 14:25     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.