All of lore.kernel.org
 help / color / mirror / Atom feed
* librte_power w/ intel_pstate cpufreq governor
@ 2017-02-27  5:56 Threqn Peng
  2017-03-01  9:22 ` Threqn Peng
  0 siblings, 1 reply; 15+ messages in thread
From: Threqn Peng @ 2017-02-27  5:56 UTC (permalink / raw)
  To: dev

Hey guys,

    I have the same problem which have been discussed in January
2016(*http://dpdk.org/ml/archives/dev/2016-January/031374.html
<http://dpdk.org/ml/archives/dev/2016-January/031374.html>*), about intel
cpu scaling frequency control in linux user space. But it seems no more
progress/solution relate to this problem until now.

    I also checked the example code:"L3 Forwarding with Power Management
Sample Application" in newest dpdk version (17.02) , no update yet.

    Is it true that, newer cpufreq driver p-state only support two
control-mode: "performance" and "powersave", and we can't do any more about
cpu working frequency control, for power-saving?

    Thanks for your help.

Best Regards,
Peng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2017-02-27  5:56 librte_power w/ intel_pstate cpufreq governor Threqn Peng
@ 2017-03-01  9:22 ` Threqn Peng
  0 siblings, 0 replies; 15+ messages in thread
From: Threqn Peng @ 2017-03-01  9:22 UTC (permalink / raw)
  To: dev

Hello,

    The solution should be "rte_epoll_wait". 10G NIC burst handling with in
"rte_epoll_wait" state(rx-queue number:12), pkt loss rate is about 0.003%.
Patch mail list(http://dpdk.org/ml/archives/dev/2015-February/014191.html)

    As for "cpu scaling frequence",  I think the better choice should be
leaving it to P-State, since driver-controlling is better than user's
acknowledgement of hardware state.

    FYI.

Best Regards,
Peng

On 27 February 2017 at 13:56, Threqn Peng <phyorat@gmail.com> wrote:

> Hey guys,
>
>     I have the same problem which have been discussed in January 2016(*http://dpdk.org/ml/archives/dev/2016-January/031374.html
> <http://dpdk.org/ml/archives/dev/2016-January/031374.html>*), about intel
> cpu scaling frequency control in linux user space. But it seems no more
> progress/solution relate to this problem until now.
>
>     I also checked the example code:"L3 Forwarding with Power Management
> Sample Application" in newest dpdk version (17.02) , no update yet.
>
>     Is it true that, newer cpufreq driver p-state only support two
> control-mode: "performance" and "powersave", and we can't do any more about
> cpu working frequency control, for power-saving?
>
>     Thanks for your help.
>
> Best Regards,
> Peng
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2018-03-05 11:25       ` Hunt, David
@ 2018-03-05 12:23         ` longtb5
  0 siblings, 0 replies; 15+ messages in thread
From: longtb5 @ 2018-03-05 12:23 UTC (permalink / raw)
  To: david.hunt, dev

Hi Dave,

Unfortunately I do not have access to our server BIOS settings. The power management task for our appliance is also on pending. I'm expecting to return to this task in April. Maybe we can still work out a patch before 18.05 (not sure about DPDK roadmap).

Regards,
-BL 

> -----Original Message-----
> From: david.hunt@intel.com [mailto:david.hunt@intel.com]
> Sent: Monday, March 5, 2018 6:26 PM
> To: longtb5@viettel.com.vn; dev@dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> 
> Hi BL,
> 
> 
> On 5/3/2018 10:48 AM, longtb5@viettel.com.vn
> <mailto:longtb5@viettel.com.vn>  wrote:
> 
> 
> 	Hi Dave,
> 
> 	Actually in my test lab which is a HP box running CentOS 7 on kernel
> version
> 	3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq.
> So I guess
> 	disabling intel_pstate wouldn't help in my case.
> 
> 	# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
> 	pcc-cpufreq
> 
> 	# cat
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
> 	conservative userspace powersave ondemand performance
> 
> 	According to kernel doc, pcc_cpufreq also doesn't export
> scaling_availabe_frequencies
> 	in sysfs.
> 
> 	From kernel doc:
> 	"scaling_available_frequencies is not created in /sys. No intermediate
> 	frequencies need to be listed because the BIOS will try to achieve any
> 	frequency, within limits, requested by the governor. A frequency
> does not have
> 	to be strictly associated with a P-state."
> 
> 	The lack of scaling_availabe_frequencies makes
> power_acpi_cpufreq_init()
> 	complains, similar to the problem with intel_pstate as  in the other
> thread.
> 	I have tried (though with not much effort) to force the kernel
> 	to use acpi-cpufreq instead but without success.
> 
> 	Luckily, as quoted above pcc_cpufreq supports setting of arbitrary
> frequency,
> 	so a simple workaround for now is to fake a
> scaling_available_frequencies file
> 	in another directory, then edit the code in librte_power to use that
> file instead.
> 
> 	Regards,
> 	-BL
> 
> 
> 		-----Original Message-----
> 		From: david.hunt@intel.com <mailto:david.hunt@intel.com>
> [mailto:david.hunt@intel.com]
> 		Sent: Monday, March 5, 2018 5:16 PM
> 		To: longtb5@viettel.com.vn <mailto:longtb5@viettel.com.vn>
> ; dev@dpdk.org <mailto:dev@dpdk.org>
> 		Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq
> governor
> 
> 		Hi BL,
> 
> 		I have always used "intel_pstate=disable" in my kernel
> parameters at boot so
> 		as to disable the intel_pstate driver, and force the kernel to
> use the acpi-
> 		cpufreq driver:
> 
> 		# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
> 		acpi-cpufreq
> 
> 		This then gives me the following options for the governor:
> 		['conservative', 'ondemand', 'userspace', 'powersave',
> 'performance',
> 		'schedutil']
> 
> 		Because DPDK threads typically poll, they appear as 100%
> busy to the p_state
> 		driver, so if you want to be able to change core frequency
> down (as in l3fwd-
> 		power), you need to use the acpi-cpufreq driver.
> 
> 		I had a read through the docs just now, and this does not
> seem to be
> 		mentioned, so I'll do up a patch to give some information on
> the correct
> 		kernel parameters to use when using the power library.
> 
> 		Regards,
> 		Dave.
> 
> 		On 2/3/2018 7:20 AM, longtb5@viettel.com.vn
> <mailto:longtb5@viettel.com.vn>  wrote:
> 
> 			Forgot to link the original thread.
> 
> 			http://dpdk.org/ml/archives/dev/2016-
> January/030930.html
> 
> 			-BL
> 
> 
> 				-----Original Message-----
> 				From: longtb5@viettel.com.vn
> <mailto:longtb5@viettel.com.vn>  [mailto:longtb5@viettel.com.vn]
> 				Sent: Friday, March 2, 2018 2:19 PM
> 				To: dev@dpdk.org <mailto:dev@dpdk.org>
> 				Cc: david.hunt@intel.com
> <mailto:david.hunt@intel.com> ; mhall@mhcomputing.net
> <mailto:mhall@mhcomputing.net> ;
> 				helin.zhang@intel.com
> <mailto:helin.zhang@intel.com> ; longtb5@viettel.com.vn
> <mailto:longtb5@viettel.com.vn>
> 				Subject: librte_power w/ intel_pstate cpufreq
> governor
> 
> 				Hi everybody,
> 
> 				I know this thread was from over 2 years ago
> but I ran into the same
> 
> 			problem
> 
> 				with l3fwd-power today.
> 
> 				Any updates on this?
> 
> 				-BL
> 
> 
> 
> 
> 
> 
> Good to hear you found a workaround.
> 
> So the issue really is "Getting the Power Library working with the ppc-cpufreq
> kernel driver" :)
> 
> From wiki.archlinux.org:
> ppc-cpufreq: his driver supports Processor Clocking Control interface by
> Hewlett-Packard and Microsoft Corporation which is useful on some ProLiant
> servers.
> 
> In the following doc: https://www.kernel.org/doc/Documentation/cpu-
> freq/pcc-cpufreq.txt
> it mentions - "When PCC mode is enabled, the platform will not expose
> processor performance or throttle states (_PSS, _TSS and related ACPI
> objects) to OSPM. Therefore,the native P-state driver (such as acpi-cpufreq
> for Intel, powernow-k8 forAMD) will not load".
> Is there a way to disable PPC mode in the BIOS on that server? From that
> wording, it seems to imply imply that there is a way to disable PPC (seeing that
> it can be enabled).
> 
> If you can't disbale PPC, I would suggest that a patch may be needed to allow
> the power library detect if it's using acpi or ppc, and obtain a list of cpu
> frequencies accordingly. However, I don't have any HP servers available to
> me, so I'm currently unable to research a method of getting a list of valid cpu
> frequencies on a machine using the ppc driver.
> 
> If you come up with a snippet of code for listing available frequencies on that
> server, let me know and we can look at adding that into the power library. :)
> 
> Regards,
> Dave.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2018-03-05 10:48     ` longtb5
@ 2018-03-05 11:25       ` Hunt, David
  2018-03-05 12:23         ` longtb5
  0 siblings, 1 reply; 15+ messages in thread
From: Hunt, David @ 2018-03-05 11:25 UTC (permalink / raw)
  To: longtb5, dev


Hi BL,


On 5/3/2018 10:48 AM, longtb5@viettel.com.vn wrote:
> Hi Dave,
>
> Actually in my test lab which is a HP box running CentOS 7 on kernel version
> 3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq. So I guess
> disabling intel_pstate wouldn't help in my case.
>
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
> pcc-cpufreq
>
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
> conservative userspace powersave ondemand performance
>
> According to kernel doc, pcc_cpufreq also doesn't export scaling_availabe_frequencies
> in sysfs.
>
>  From kernel doc:
> "scaling_available_frequencies is not created in /sys. No intermediate
> frequencies need to be listed because the BIOS will try to achieve any
> frequency, within limits, requested by the governor. A frequency does not have
> to be strictly associated with a P-state."
>
> The lack of scaling_availabe_frequencies makes power_acpi_cpufreq_init()
> complains, similar to the problem with intel_pstate as  in the other thread.
> I have tried (though with not much effort) to force the kernel
> to use acpi-cpufreq instead but without success.
>
> Luckily, as quoted above pcc_cpufreq supports setting of arbitrary frequency,
> so a simple workaround for now is to fake a scaling_available_frequencies file
> in another directory, then edit the code in librte_power to use that file instead.
>
> Regards,
> -BL
>
>> -----Original Message-----
>> From: david.hunt@intel.com [mailto:david.hunt@intel.com]
>> Sent: Monday, March 5, 2018 5:16 PM
>> To: longtb5@viettel.com.vn; dev@dpdk.org
>> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
>>
>> Hi BL,
>>
>> I have always used "intel_pstate=disable" in my kernel parameters at boot so
>> as to disable the intel_pstate driver, and force the kernel to use the acpi-
>> cpufreq driver:
>>
>> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
>> acpi-cpufreq
>>
>> This then gives me the following options for the governor:
>> ['conservative', 'ondemand', 'userspace', 'powersave', 'performance',
>> 'schedutil']
>>
>> Because DPDK threads typically poll, they appear as 100% busy to the p_state
>> driver, so if you want to be able to change core frequency down (as in l3fwd-
>> power), you need to use the acpi-cpufreq driver.
>>
>> I had a read through the docs just now, and this does not seem to be
>> mentioned, so I'll do up a patch to give some information on the correct
>> kernel parameters to use when using the power library.
>>
>> Regards,
>> Dave.
>>
>> On 2/3/2018 7:20 AM, longtb5@viettel.com.vn wrote:
>>> Forgot to link the original thread.
>>>
>>> http://dpdk.org/ml/archives/dev/2016-January/030930.html
>>>
>>> -BL
>>>
>>>> -----Original Message-----
>>>> From: longtb5@viettel.com.vn [mailto:longtb5@viettel.com.vn]
>>>> Sent: Friday, March 2, 2018 2:19 PM
>>>> To: dev@dpdk.org
>>>> Cc: david.hunt@intel.com; mhall@mhcomputing.net;
>>>> helin.zhang@intel.com; longtb5@viettel.com.vn
>>>> Subject: librte_power w/ intel_pstate cpufreq governor
>>>>
>>>> Hi everybody,
>>>>
>>>> I know this thread was from over 2 years ago but I ran into the same
>>> problem
>>>> with l3fwd-power today.
>>>>
>>>> Any updates on this?
>>>>
>>>> -BL

Good to hear you found a workaround.

So the issue really is "Getting the Power Library working with the 
ppc-cpufreq kernel driver" :)

 From wiki.archlinux.org:
ppc-cpufreq: his driver supports Processor Clocking Control interface by 
Hewlett-Packard and Microsoft Corporation which is useful on some 
ProLiant servers.

In the following doc: 
https://www.kernel.org/doc/Documentation/cpu-freq/pcc-cpufreq.txt
it mentions - "When PCC mode is enabled, the platform will not expose 
processor performance or throttle states (_PSS, _TSS and related ACPI 
objects) to OSPM. Therefore,the native P-state driver (such as 
acpi-cpufreq for Intel, powernow-k8 forAMD) will not load".
Is there a way to disable PPC mode in the BIOS on that server? From that 
wording, it seems to imply imply that there is a way to disable PPC 
(seeing that it can be enabled).

If you can't disbale PPC, I would suggest that a patch may be needed to 
allow the power library detect if it's using acpi or ppc, and obtain a 
list of cpu frequencies accordingly. However, I don't have any HP 
servers available to me, so I'm currently unable to research a method of 
getting a list of valid cpu frequencies on a machine using the ppc driver.

If you come up with a snippet of code for listing available frequencies 
on that server, let me know and we can look at adding that into the 
power library. :)

Regards,
Dave.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2018-03-05 10:16   ` Hunt, David
@ 2018-03-05 10:48     ` longtb5
  2018-03-05 11:25       ` Hunt, David
  0 siblings, 1 reply; 15+ messages in thread
From: longtb5 @ 2018-03-05 10:48 UTC (permalink / raw)
  To: david.hunt, dev

Hi Dave,

Actually in my test lab which is a HP box running CentOS 7 on kernel version
3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq. So I guess 
disabling intel_pstate wouldn't help in my case.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
pcc-cpufreq

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors 
conservative userspace powersave ondemand performance

According to kernel doc, pcc_cpufreq also doesn't export scaling_availabe_frequencies
in sysfs.

>From kernel doc:
"scaling_available_frequencies is not created in /sys. No intermediate
frequencies need to be listed because the BIOS will try to achieve any
frequency, within limits, requested by the governor. A frequency does not have
to be strictly associated with a P-state."

The lack of scaling_availabe_frequencies makes power_acpi_cpufreq_init() 
complains, similar to the problem with intel_pstate as  in the other thread.
I have tried (though with not much effort) to force the kernel
to use acpi-cpufreq instead but without success.

Luckily, as quoted above pcc_cpufreq supports setting of arbitrary frequency, 
so a simple workaround for now is to fake a scaling_available_frequencies file
in another directory, then edit the code in librte_power to use that file instead.

Regards,
-BL

> -----Original Message-----
> From: david.hunt@intel.com [mailto:david.hunt@intel.com]
> Sent: Monday, March 5, 2018 5:16 PM
> To: longtb5@viettel.com.vn; dev@dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> Hi BL,
> 
> I have always used "intel_pstate=disable" in my kernel parameters at boot so
> as to disable the intel_pstate driver, and force the kernel to use the acpi-
> cpufreq driver:
> 
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
> acpi-cpufreq
> 
> This then gives me the following options for the governor:
> ['conservative', 'ondemand', 'userspace', 'powersave', 'performance',
> 'schedutil']
> 
> Because DPDK threads typically poll, they appear as 100% busy to the p_state
> driver, so if you want to be able to change core frequency down (as in l3fwd-
> power), you need to use the acpi-cpufreq driver.
> 
> I had a read through the docs just now, and this does not seem to be
> mentioned, so I'll do up a patch to give some information on the correct
> kernel parameters to use when using the power library.
> 
> Regards,
> Dave.
> 
> On 2/3/2018 7:20 AM, longtb5@viettel.com.vn wrote:
> > Forgot to link the original thread.
> >
> > http://dpdk.org/ml/archives/dev/2016-January/030930.html
> >
> > -BL
> >
> >> -----Original Message-----
> >> From: longtb5@viettel.com.vn [mailto:longtb5@viettel.com.vn]
> >> Sent: Friday, March 2, 2018 2:19 PM
> >> To: dev@dpdk.org
> >> Cc: david.hunt@intel.com; mhall@mhcomputing.net;
> >> helin.zhang@intel.com; longtb5@viettel.com.vn
> >> Subject: librte_power w/ intel_pstate cpufreq governor
> >>
> >> Hi everybody,
> >>
> >> I know this thread was from over 2 years ago but I ran into the same
> > problem
> >> with l3fwd-power today.
> >>
> >> Any updates on this?
> >>
> >> -BL
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2018-03-02  7:20 ` longtb5
@ 2018-03-05 10:16   ` Hunt, David
  2018-03-05 10:48     ` longtb5
  0 siblings, 1 reply; 15+ messages in thread
From: Hunt, David @ 2018-03-05 10:16 UTC (permalink / raw)
  To: longtb5, dev

Hi BL,

I have always used "intel_pstate=disable" in my kernel parameters at 
boot so as to disable the intel_pstate driver, and force the kernel to 
use the acpi-cpufreq driver:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq

This then gives me the following options for the governor:
['conservative', 'ondemand', 'userspace', 'powersave', 'performance', 
'schedutil']

Because DPDK threads typically poll, they appear as 100% busy to the 
p_state driver, so if you want to be able to change core frequency down 
(as in l3fwd-power), you need to use the acpi-cpufreq driver.

I had a read through the docs just now, and this does not seem to be 
mentioned, so I'll do up a patch to give some information on the correct 
kernel parameters to use when using the power library.

Regards,
Dave.

On 2/3/2018 7:20 AM, longtb5@viettel.com.vn wrote:
> Forgot to link the original thread.
>
> http://dpdk.org/ml/archives/dev/2016-January/030930.html
>
> -BL
>
>> -----Original Message-----
>> From: longtb5@viettel.com.vn [mailto:longtb5@viettel.com.vn]
>> Sent: Friday, March 2, 2018 2:19 PM
>> To: dev@dpdk.org
>> Cc: david.hunt@intel.com; mhall@mhcomputing.net; helin.zhang@intel.com;
>> longtb5@viettel.com.vn
>> Subject: librte_power w/ intel_pstate cpufreq governor
>>
>> Hi everybody,
>>
>> I know this thread was from over 2 years ago but I ran into the same
> problem
>> with l3fwd-power today.
>>
>> Any updates on this?
>>
>> -BL
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2018-03-02  7:18 longtb5
@ 2018-03-02  7:20 ` longtb5
  2018-03-05 10:16   ` Hunt, David
  0 siblings, 1 reply; 15+ messages in thread
From: longtb5 @ 2018-03-02  7:20 UTC (permalink / raw)
  To: dev

Forgot to link the original thread.

http://dpdk.org/ml/archives/dev/2016-January/030930.html

-BL

> -----Original Message-----
> From: longtb5@viettel.com.vn [mailto:longtb5@viettel.com.vn]
> Sent: Friday, March 2, 2018 2:19 PM
> To: dev@dpdk.org
> Cc: david.hunt@intel.com; mhall@mhcomputing.net; helin.zhang@intel.com;
> longtb5@viettel.com.vn
> Subject: librte_power w/ intel_pstate cpufreq governor
> 
> Hi everybody,
> 
> I know this thread was from over 2 years ago but I ran into the same
problem
> with l3fwd-power today.
> 
> Any updates on this?
> 
> -BL

^ permalink raw reply	[flat|nested] 15+ messages in thread

* librte_power w/ intel_pstate cpufreq governor
@ 2018-03-02  7:18 longtb5
  2018-03-02  7:20 ` longtb5
  0 siblings, 1 reply; 15+ messages in thread
From: longtb5 @ 2018-03-02  7:18 UTC (permalink / raw)
  To: dev; +Cc: david.hunt, mhall, helin.zhang, longtb5

Hi everybody,

I know this thread was from over 2 years ago but I ran into the same problem
with l3fwd-power today.
Any updates on this?

-BL

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2016-01-14  7:15       ` Zhang, Helin
@ 2016-01-14  7:44         ` Matthew Hall
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Hall @ 2016-01-14  7:44 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Thu, Jan 14, 2016 at 07:15:51AM +0000, Zhang, Helin wrote:
> That's disappointing if Skylake is like that. Let's have a learning first, 
> and then check if we can fix that. But in addition, DPDK provide interrupt 
> based packet receiving mechanism, can it be one of your choice?

Maybe I am wrong. But I could not disprove what the Linux p_state driver 
Documentation file and other places claimed, which is that the clockrate 
control is no-opped, because the white papers on Intel HWP are not findable in 
the Intel website, or by using Google with the operator "site:intel.com".

The IRQ based part is still enabled and works quite well in a very trivial 
test so far... but the clockrate callback handlers are null and the governor 
setting gets corrupted, both due to failed init of librte_power. So I will 
have to rebuild DPDK with the librte_power ACPI + KVM init commented out and 
the fastpath clockrate callback functions commented out of course. It is minor 
so I can do it to see what will happen.

> If no objection, I will find time later (may be in a month) to investigate 
> that. Of cause, please try to investigate that from your side.

Agreed.

> That's always there, for example, DPDK can exit accidently, without caring 
> anything. Then you can have the similar issue again.

Of course, it could. But if there was some kind of shutdown function, at least 
then I could call it from the signal handler I already have which closes the 
ports (this prevents nasty port lockups on virtio-net port DMA memory zones 
which can happen on future runs otherwise).

> It seems that you are so important for Intel. :) I don't have Skylake in hand. :(

:) Hahaha... newegg.com to the rescue. I guess we need to be sure there is 
some program to test the stuff in DPDK for the new kernels and hardware. It 
appears we are pretty far behind now... I saw several threads about things 
that were behind just today.

> Regards,
> Helin

Matthew.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2016-01-14  7:03     ` Matthew Hall
  2016-01-14  7:11       ` Matthew Hall
@ 2016-01-14  7:15       ` Zhang, Helin
  2016-01-14  7:44         ` Matthew Hall
  1 sibling, 1 reply; 15+ messages in thread
From: Zhang, Helin @ 2016-01-14  7:15 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev



> -----Original Message-----
> From: Matthew Hall [mailto:mhall@mhcomputing.net]
> Sent: Thursday, January 14, 2016 3:04 PM
> To: Zhang, Helin
> Cc: dev@dpdk.org; Liang, Cunming; Zhou, Danny
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> On Tue, Jan 12, 2016 at 03:17:21PM +0000, Zhang, Helin wrote:
> > Hi Matthew
> >
> > Yes, you have indicated out the key, the power management module has
> changed or upgraded.
> > Could you help to try the legacy one to see if it still works, as indicated in
> your link?
> 
> I can do this, but according to the documents I am reading, the old Power
> Management module is secretly stubbed out / no-opped inside of the
> Skylake CPU core, and the core manages its own clockrate internally every 1
> msec instead of every 30 msec with input from the OS (Intel Speed Shift
> technology).
> 
> If this is true, then I suspect there is no point to getting it to work again with
> either the old frequency driver or the new driver, because the chip would
> not listen to it. So then it seems like it makes sense to skip the clock
> adjustment callbacks on Skylake and take extra stuff out of the fastpath code.
That's disappointing if Skylake is like that. Let's have a learning first, and then check if we can fix that.
But in addition, DPDK provide interrupt based packet receiving mechanism, can it be one of your choice?

For now, I am afraid that I don't have time on it, as we are all focusing on the next release development.
If no objection, I will find time later (may be in a month) to investigate that.
Of cause, please try to investigate that from your side.

> 
> > Taking control of the governor from kernel to user space, might need
> > one more checks before that. But it is actually not a big issue, as
> > user can switch it back to anything via 'echo'.
> 
> I think it's a bit bigger issue, as it leaves the chip in full-power mode without
> really warning anybody, instead of the standard default adaptive mode.
That's always there, for example, DPDK can exit accidently, without caring anything.
Then you can have the similar issue again.

> 
> > Yes, it seems that librte_power is out of date for a while. It is not
> > easy to track all the kernel versions. Now we have good chance to do
> > that, as you have reported issues. Let's have a look on the new power
> > management mechanism and then see if we can do something.
> 
> Yes, let me know how I could help. I don't know very much yet. My machine
> is Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because
> there is no whitepaper on the Intel website for Intel Speed Shift technology
> at all.
It seems that you are so important for Intel. :) I don't have Skylake in hand. :(
Anyway, I will try to find time on that, and hopefully will find something or solution.
Thank you very much for the great jobs!

Regards,
Helin

> 
> > Really thanks to your questions!
> 
> I am looking forward to getting some answers figured out together.
> 
> > Regards,
> > Helin
> 
> Matthew.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2016-01-14  7:03     ` Matthew Hall
@ 2016-01-14  7:11       ` Matthew Hall
  2016-01-14  7:15       ` Zhang, Helin
  1 sibling, 0 replies; 15+ messages in thread
From: Matthew Hall @ 2016-01-14  7:11 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Thu, Jan 14, 2016 at 02:03:55AM -0500, Matthew Hall wrote:
> Yes, let me know how I could help. I don't know very much yet. My machine is 
> Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because 
> there is no whitepaper on the Intel website for Intel Speed Shift technology 
> at all.

This is the closest thing I could find:

http://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/4/

Some copy of a presentation from Intel IDF15.

Can somebody at Intel help me to find more papers or the right instruction or 
architecture manuals for HWP (Hardware P-State) feature?

Matthew.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2016-01-12 15:17   ` Zhang, Helin
@ 2016-01-14  7:03     ` Matthew Hall
  2016-01-14  7:11       ` Matthew Hall
  2016-01-14  7:15       ` Zhang, Helin
  0 siblings, 2 replies; 15+ messages in thread
From: Matthew Hall @ 2016-01-14  7:03 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Tue, Jan 12, 2016 at 03:17:21PM +0000, Zhang, Helin wrote:
> Hi Matthew
> 
> Yes, you have indicated out the key, the power management module has changed or upgraded.
> Could you help to try the legacy one to see if it still works, as indicated in your link?

I can do this, but according to the documents I am reading, the old Power 
Management module is secretly stubbed out / no-opped inside of the Skylake CPU 
core, and the core manages its own clockrate internally every 1 msec instead 
of every 30 msec with input from the OS (Intel Speed Shift technology).

If this is true, then I suspect there is no point to getting it to work again 
with either the old frequency driver or the new driver, because the chip would 
not listen to it. So then it seems like it makes sense to skip the clock 
adjustment callbacks on Skylake and take extra stuff out of the fastpath code.

> Taking control of the governor from kernel to user space, might need one 
> more checks before that. But it is actually not a big issue, as user can 
> switch it back to anything via 'echo'.

I think it's a bit bigger issue, as it leaves the chip in full-power mode 
without really warning anybody, instead of the standard default adaptive mode. 

> Yes, it seems that librte_power is out of date for a while. It is not easy 
> to track all the kernel versions. Now we have good chance to do that, as you 
> have reported issues. Let's have a look on the new power management 
> mechanism and then see if we can do something.

Yes, let me know how I could help. I don't know very much yet. My machine is 
Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because 
there is no whitepaper on the Intel website for Intel Speed Shift technology 
at all.

> Really thanks to your questions!

I am looking forward to getting some answers figured out together.

> Regards,
> Helin

Matthew.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2016-01-03  7:51 ` Matthew Hall
@ 2016-01-12 15:17   ` Zhang, Helin
  2016-01-14  7:03     ` Matthew Hall
  0 siblings, 1 reply; 15+ messages in thread
From: Zhang, Helin @ 2016-01-12 15:17 UTC (permalink / raw)
  To: Matthew Hall, dev

Hi Matthew

Yes, you have indicated out the key, the power management module has changed or upgraded.
Could you help to try the legacy one to see if it still works, as indicated in your link?

Taking control of the governor from kernel to user space, might need one more checks before that.
But it is actually not a big issue, as user can switch it back to anything via 'echo'.

Yes, it seems that librte_power is out of date for a while. It is not easy to track all the kernel versions.
Now we have good chance to do that, as you have reported issues. Let's have a look on the new power management mechanism and then see if we can do something.

Really thanks to your questions!

Regards,
Helin

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matthew Hall
> Sent: Sunday, January 3, 2016 3:51 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> Hello,
> 
> In about one month, I never received any response about all these major
> issues I was finding with librte_power and the intel_pstate based CPU
> clockrate control driver used in all the new Linux kernels.
> 
>  From what I can tell, none of this librte_power code ever worked right in the
> first place on Sandy Bridge and newer, because the chip secretly ignores
> clockrate adjustments from outside.
> 
> Can anyone who is more expert about Intel Power Management please help
> me check this and point me to some documentation which explains how this
> is supposed to work?
> 
> I am kind of blocked on doing performance / production quality
> improvements on my code, without some kind of basic help understanding
> how this librte_power stuff should work.
> 
> Thanks,
> Matthew.
> 
> On 12/5/15 4:08 PM, Matthew Hall wrote:
> > Hello all,
> >
> > I wanted to ask some questions about librte_power and the great
> > adaptive polling / IRQ mode example in l3fwd-power.
> >
> > I am very interested in getting this to work in my project because it
> > will make it much friendlier to attract new community developers if I
> > am as cooperative as possible with system resources.
> >
> > Let's discuss the init process for a moment. It has some problems on
> > my system, and I need some help to figure out how to handle this right.
> >
> > 1. Begins with the call to rte_power_init.
> >
> > 2. Attempts to init ACPI cpufreq mode.
> >
> > 2.1. Sets lcore cpufreq governor to userspace mode.
> >
> > 2.2. Function power_get_available_freqs checks lcore CPU frequencies
> from:
> >
> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies
> >
> > 2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I
> > am planning to write a patch for this error a bit later.
> >
> > My kernel is using the intel_pstate driver, so
> > scaling_available_frequencies does not exist:
> >
> > http://askubuntu.com/questions/544266/why-are-missing-the-frequency-
> op
> > tions-on-cpufreq-utils-indicator
> >
> > 3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init
> fails.
> >
> > 4. rte_power_init will try rte_power_kvm_vm_init. That will fail
> > because it's a physical Skylake system not some kind of VM.
> >
> > 5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to
> > set Power Management Environment for lcore 0".
> >
> > So, I have a couple of questions to figure out from here:
> >
> > 1. It seems bad to switch the governor into userspace before verifying
> > the frequencies available in scaling_available_frequencies. If there
> > are no frequencies available, it seems like it should not be trying to
> > take over control of an effectively uncontrollable value.
> >
> > 2. If the governor is switched to userspace, and then no governing is
> > done, it seems like the clockrate will necessarily always be wrong
> > also because nothing will be configuring it anymore, neither kernel,
> > nor failed DPDK userspace code, since rte_power_freq_up / down
> > function pointers will always be NULL. Is this true? This seems bad if so.
> >
> > It seems that the librte_power code is basically out of date, as
> > pstate has been present since Sandy Bridge, which is quite old by now
> > for network processing. I am not sure how to make this work right now.
> > So far I see a couple options but I really don't know much about this stuff:
> >
> > 1) skip rte_power_init completely, and let intel_pstate handle it
> > using HWP mode
> >
> > 2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people
> > warned this old driver is mostly a no-op and the CPU ignores its frequency
> requests).
> >
> > The Internet advice says it's possible, but not a very good idea, to
> > switch from the modern intel_pstate driver to the legacy ACPI mode.
> > Reading through the kernel docs (below) state that it's better to use
> > HWP (Hardware P State)
> > mode:
> >
> > https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
> >
> > If none of this rte_power_init stuff works, are the other CPU
> > conservation measures inside the l3fwd-power example enough to work
> > right with HWP all by themselves with nothing additional?
> >
> > Thanks,
> > Matthew.
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: librte_power w/ intel_pstate cpufreq governor
  2015-12-06  0:08 Matthew Hall
@ 2016-01-03  7:51 ` Matthew Hall
  2016-01-12 15:17   ` Zhang, Helin
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Hall @ 2016-01-03  7:51 UTC (permalink / raw)
  To: dev

Hello,

In about one month, I never received any response about all these major 
issues I was finding with librte_power and the intel_pstate based CPU 
clockrate control driver used in all the new Linux kernels.

 From what I can tell, none of this librte_power code ever worked right 
in the first place on Sandy Bridge and newer, because the chip secretly 
ignores clockrate adjustments from outside.

Can anyone who is more expert about Intel Power Management please help 
me check this and point me to some documentation which explains how this 
is supposed to work?

I am kind of blocked on doing performance / production quality 
improvements on my code, without some kind of basic help understanding 
how this librte_power stuff should work.

Thanks,
Matthew.

On 12/5/15 4:08 PM, Matthew Hall wrote:
> Hello all,
>
> I wanted to ask some questions about librte_power and the great adaptive
> polling / IRQ mode example in l3fwd-power.
>
> I am very interested in getting this to work in my project because it will
> make it much friendlier to attract new community developers if I am as
> cooperative as possible with system resources.
>
> Let's discuss the init process for a moment. It has some problems on my
> system, and I need some help to figure out how to handle this right.
>
> 1. Begins with the call to rte_power_init.
>
> 2. Attempts to init ACPI cpufreq mode.
>
> 2.1. Sets lcore cpufreq governor to userspace mode.
>
> 2.2. Function power_get_available_freqs checks lcore CPU frequencies from:
>
> /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies
>
> 2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I am
> planning to write a patch for this error a bit later.
>
> My kernel is using the intel_pstate driver, so scaling_available_frequencies
> does not exist:
>
> http://askubuntu.com/questions/544266/why-are-missing-the-frequency-options-on-cpufreq-utils-indicator
>
> 3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init fails.
>
> 4. rte_power_init will try rte_power_kvm_vm_init. That will fail because it's
> a physical Skylake system not some kind of VM.
>
> 5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to set
> Power Management Environment for lcore 0".
>
> So, I have a couple of questions to figure out from here:
>
> 1. It seems bad to switch the governor into userspace before verifying the
> frequencies available in scaling_available_frequencies. If there are no
> frequencies available, it seems like it should not be trying to take over
> control of an effectively uncontrollable value.
>
> 2. If the governor is switched to userspace, and then no governing is done, it
> seems like the clockrate will necessarily always be wrong also because nothing
> will be configuring it anymore, neither kernel, nor failed DPDK userspace
> code, since rte_power_freq_up / down function pointers will always be NULL. Is
> this true? This seems bad if so.
>
> It seems that the librte_power code is basically out of date, as pstate has
> been present since Sandy Bridge, which is quite old by now for network
> processing. I am not sure how to make this work right now. So far I see a
> couple options but I really don't know much about this stuff:
>
> 1) skip rte_power_init completely, and let intel_pstate handle it using HWP
> mode
>
> 2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people warned
> this old driver is mostly a no-op and the CPU ignores its frequency requests).
>
> The Internet advice says it's possible, but not a very good idea, to switch
> from the modern intel_pstate driver to the legacy ACPI mode. Reading through
> the kernel docs (below) state that it's better to use HWP (Hardware P State)
> mode:
>
> https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
>
> If none of this rte_power_init stuff works, are the other CPU conservation
> measures inside the l3fwd-power example enough to work right with HWP all by
> themselves with nothing additional?
>
> Thanks,
> Matthew.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* librte_power w/ intel_pstate cpufreq governor
@ 2015-12-06  0:08 Matthew Hall
  2016-01-03  7:51 ` Matthew Hall
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Hall @ 2015-12-06  0:08 UTC (permalink / raw)
  To: dev

Hello all,

I wanted to ask some questions about librte_power and the great adaptive 
polling / IRQ mode example in l3fwd-power.

I am very interested in getting this to work in my project because it will 
make it much friendlier to attract new community developers if I am as 
cooperative as possible with system resources.

Let's discuss the init process for a moment. It has some problems on my 
system, and I need some help to figure out how to handle this right.

1. Begins with the call to rte_power_init.

2. Attempts to init ACPI cpufreq mode.

2.1. Sets lcore cpufreq governor to userspace mode.

2.2. Function power_get_available_freqs checks lcore CPU frequencies from:

/sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies

2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I am 
planning to write a patch for this error a bit later.

My kernel is using the intel_pstate driver, so scaling_available_frequencies 
does not exist:

http://askubuntu.com/questions/544266/why-are-missing-the-frequency-options-on-cpufreq-utils-indicator

3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init fails.

4. rte_power_init will try rte_power_kvm_vm_init. That will fail because it's 
a physical Skylake system not some kind of VM.

5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to set 
Power Management Environment for lcore 0".

So, I have a couple of questions to figure out from here:

1. It seems bad to switch the governor into userspace before verifying the 
frequencies available in scaling_available_frequencies. If there are no 
frequencies available, it seems like it should not be trying to take over 
control of an effectively uncontrollable value.

2. If the governor is switched to userspace, and then no governing is done, it 
seems like the clockrate will necessarily always be wrong also because nothing 
will be configuring it anymore, neither kernel, nor failed DPDK userspace 
code, since rte_power_freq_up / down function pointers will always be NULL. Is 
this true? This seems bad if so.

It seems that the librte_power code is basically out of date, as pstate has 
been present since Sandy Bridge, which is quite old by now for network 
processing. I am not sure how to make this work right now. So far I see a 
couple options but I really don't know much about this stuff:

1) skip rte_power_init completely, and let intel_pstate handle it using HWP 
mode

2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people warned 
this old driver is mostly a no-op and the CPU ignores its frequency requests).

The Internet advice says it's possible, but not a very good idea, to switch 
from the modern intel_pstate driver to the legacy ACPI mode. Reading through 
the kernel docs (below) state that it's better to use HWP (Hardware P State) 
mode:

https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt

If none of this rte_power_init stuff works, are the other CPU conservation 
measures inside the l3fwd-power example enough to work right with HWP all by 
themselves with nothing additional?

Thanks,
Matthew.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-05 12:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27  5:56 librte_power w/ intel_pstate cpufreq governor Threqn Peng
2017-03-01  9:22 ` Threqn Peng
  -- strict thread matches above, loose matches on Subject: below --
2018-03-02  7:18 longtb5
2018-03-02  7:20 ` longtb5
2018-03-05 10:16   ` Hunt, David
2018-03-05 10:48     ` longtb5
2018-03-05 11:25       ` Hunt, David
2018-03-05 12:23         ` longtb5
2015-12-06  0:08 Matthew Hall
2016-01-03  7:51 ` Matthew Hall
2016-01-12 15:17   ` Zhang, Helin
2016-01-14  7:03     ` Matthew Hall
2016-01-14  7:11       ` Matthew Hall
2016-01-14  7:15       ` Zhang, Helin
2016-01-14  7:44         ` Matthew Hall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.