All of lore.kernel.org
 help / color / mirror / Atom feed
* [sched_delayed] sched: RT throttling activated
@ 2013-08-23  8:53 Martin Mokrejs
  2013-08-23 10:09 ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23  8:53 UTC (permalink / raw)
  To: Theodore Tso, Thomas Gleixner, mingo, peterz; +Cc: LKML

Hi,
  I tried to figure out what this message really means. I came to 
https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
but I am still lost. I lack in the FAQ some user-related information.
The first paragraph is still unclear to me. I have a i7-2640M based
laptop, hyperthreading is enabled by BIOS but I shut down the two
emulated cores by (no BIOS option to disable HT):

echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online

At least I hope I shutdown those emulated ones. i7z claims I did the
right thing and IntelPerformanceCounterMonitorV2.5.1/pcm.x application
says the same:

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE | TEMP

   0    0     1.78   1.51   1.18    1.18    1595 K   3363 K    0.53    0.00    0.09    0.02     N/A     N/A     23
   1    0     1.21   1.03   1.18    1.18    9359 K     13 M    0.31    0.00    0.51    0.04     N/A     N/A     24
-------------------------------------------------------------------------------------------------------------------
 SKT    0     1.50   1.27   1.18    1.18      10 M     16 M    0.35    0.00    0.30    0.03    1.32    0.37     24
-------------------------------------------------------------------------------------------------------------------
 TOTAL  *     1.50   1.27   1.18    1.18      10 M     16 M    0.35    0.00    0.30    0.03    1.32    0.37     N/A

 Instructions retired: 8368 M ; Active cycles: 6594 M ; Time (TSC): 2797 Mticks ; C0 (active,non-halted) core residency: 100.00 %

 C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %

 PHYSICAL CORE IPC                 : 1.27 => corresponds to 31.73 % utilization for cores in active state
 Instructions per nominal CPU cycle: 1.50 => corresponds to 37.40 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 28.18 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    28.18 Joules





Why do I get the message at all? I have in 3.10.9 kernel:

...
CONFIG_IOSCHED_DEADLINE=y
CONFIG_DEFAULT_IOSCHED="deadline"
...
CONFIG_NR_CPUS=4
...
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
...
# CONFIG_SCHED_MC is not set
CONFIG_SCHED_HRTICK=y

I fear this is about CPU being overloaded (both cores loaded
by user processes), by why do I get the message at all?



Cpu speed from cpuinfo 2796.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2796 MHz
  CPU Multiplier 28x || Bus clock frequency (BCLK) 99.86 MHz

Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 2895.86 MHz (99.86 x [29])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  35x/33x/33x/33x
  Real Current Frequency 3295.29 MHz [99.86 x 33.00] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %   C7 %  Temp
        Core 1 [0]:       3295.28 (33.00x)       100       0       0       0       0    76
        Core 2 [1]:       3295.29 (33.00x)       100       0       0       0       0    76



Would you please clarify what the "[sched_delayed] sched: RT throttling activated"
really means? Is that because there is some RT-requiring application on my system?
I don't know of any (or don't care about real-time responsiveness except that ALSA
drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could the
culprit be nfsd? Then I will recompile is as a module.

# ps -ef |  grep nfs
root       920     2  0 Aug22 ?        00:00:00 [nfsiod]
#


Thank you,
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23  8:53 [sched_delayed] sched: RT throttling activated Martin Mokrejs
@ 2013-08-23 10:09 ` Peter Zijlstra
  2013-08-23 10:38   ` Martin Mokrejs
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-08-23 10:09 UTC (permalink / raw)
  To: Martin Mokrejs; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML

On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote:
> Hi,
>   I tried to figure out what this message really means. I came to 
> https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
> but I am still lost. I lack in the FAQ some user-related information.
> The first paragraph is still unclear to me. I have a i7-2640M based
> laptop, hyperthreading is enabled by BIOS but I shut down the two
> emulated cores by (no BIOS option to disable HT):
> 
> Would you please clarify what the "[sched_delayed] sched: RT throttling activated"
> really means? 

It means you have (a) real-time task(s) that consume significant amount
of time. At some point we throttle them in an attempt to keep the system
from falling over.

> Is that because there is some RT-requiring application on my system?

Yep.

> I don't know of any (or don't care about real-time responsiveness except that ALSA
> drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could the
> culprit be nfsd? Then I will recompile is as a module.

Unlikely, I don't think I've ever seen anybody run their nfsd with RT
priority. Also, you can run RT tasks regardless of the config options.
SCHED_RR and SCHED_FIFO are POSIX specified and always available.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 10:09 ` Peter Zijlstra
@ 2013-08-23 10:38   ` Martin Mokrejs
  2013-08-23 11:02     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23 10:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML

Hi Peter,

Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote:
>> Hi,
>>   I tried to figure out what this message really means. I came to 
>> https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
>> but I am still lost. I lack in the FAQ some user-related information.
>> The first paragraph is still unclear to me. I have a i7-2640M based
>> laptop, hyperthreading is enabled by BIOS but I shut down the two
>> emulated cores by (no BIOS option to disable HT):
>>
>> Would you please clarify what the "[sched_delayed] sched: RT throttling activated"
>> really means? 
> 
> It means you have (a) real-time task(s) that consume significant amount

How can I find them? I don't think I need the RT, I have two CPU-bound
processes and want to run them at max speed. Rest of the system is unimportant.

I still don't understand what the $subj message actually says. Does it say
the RT-requiring task was slowed down? I am a bit lost here.

> of time. At some point we throttle them in an attempt to keep the system
> from falling over.

Will I get companion "[sched_delayed] sched: RT throttling deactivated"
at some point?

> 
>> Is that because there is some RT-requiring application on my system?
> 
> Yep.

Which? How can I find them and turn that requirement off (if I understand right they
interrupt my long-living computing processes)?

> 
>> I don't know of any (or don't care about real-time responsiveness except that ALSA
>> drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could the
>> culprit be nfsd? Then I will recompile is as a module.
> 
> Unlikely, I don't think I've ever seen anybody run their nfsd with RT

Maybe false info in that thread, I don't know:
http://forums.opensuse.org/english/get-technical-help-here/applications/482756-kernel-panic-rt-throttling-activated.html

> priority. Also, you can run RT tasks regardless of the config options.
> SCHED_RR and SCHED_FIFO are POSIX specified and always available.

Are python-based apps requiring the realtime features?


I used to get the messages below which are now gone with my CPU cooler being replaced yesterday:

[ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727)
[ 4172.717277] CPU1: Package temperature above threshold, cpu clock throttled (total events = 158008)
[ 4172.717348] CPU0: Package temperature above threshold, cpu clock throttled (total events = 158008)
[ 4172.718291] CPU1: Core temperature/speed normal
[ 4172.718293] CPU1: Package temperature/speed normal
[ 4172.718347] CPU0: Package temperature/speed normal
[ 4205.336883] mce: [Hardware Error]: Machine check events logged
...
[ 8966.052786] CPU1: Core temperature/speed normal
[ 8966.052788] CPU0: Package temperature/speed normal
[ 8966.052791] CPU1: Package temperature/speed normal
[ 9266.421068] CPU1: Core temperature above threshold, cpu clock throttled (total events = 530778)
[ 9266.421070] CPU0: Package temperature above threshold, cpu clock throttled (total events = 547228)
[ 9266.421075] CPU1: Package temperature above threshold, cpu clock throttled (total events = 547228)
[ 9266.422076] CPU1: Core temperature/speed normal
[ 9266.422078] CPU0: Package temperature/speed normal
[ 9266.422081] CPU1: Package temperature/speed normal
[ 9445.150679] [sched_delayed] sched: RT throttling activated
[ 9566.792369] CPU1: Core temperature above threshold, cpu clock throttled (total events = 559429)
[ 9566.792372] CPU0: Package temperature above threshold, cpu clock throttled (total events = 576882)
[ 9566.792378] CPU1: Package temperature above threshold, cpu clock throttled (total events = 576882)
[ 9566.793377] CPU1: Core temperature/speed normal
[ 9566.793380] CPU0: Package temperature/speed normal
[ 9566.793382] CPU1: Package temperature/speed normal
[ 9872.630811] CPU1: Core temperature above threshold, cpu clock throttled (total events = 583223)
[ 9872.630813] CPU0: Package temperature above threshold, cpu clock throttled (total events = 601532)
[ 9872.630817] CPU1: Package temperature above threshold, cpu clock throttled (total events = 601532)
[ 9872.631818] CPU1: Core temperature/speed normal
[ 9872.631820] CPU0: Package temperature/speed normal
[ 9872.631823] CPU1: Package temperature/speed normal

mcelog report in such cases:

Hardware event. This is not a software error.
MCE 0
CPU 1 THERMAL EVENT TSC 1bf82e2a146 
TIME 1375536062 Sat Aug  3 15:21:02 2013
Processor 1 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 880003c3 MCGSTATUS 0
MCGCAP c07 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 42





While my CPU cooler got replaced even now I still get (hence this email thread):

[39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 00007ff67badff00 sp 00007fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
[44520.259205] [sched_delayed] sched: RT throttling activated
[48956.057816] blah.py[16623]: segfault at 2f ip 00007fd462e5d046 sp 00007fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
[49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 00007fe255715f00 sp 00007fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
[49942.020084] blah.py[6950]: segfault at d0 ip 00007f3e8a9acf9c sp 00007fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
[66696.443342] blah.py[8015]: segfault at cf ip 00007f798f708f9c sp 00007fff420336e0 error 4 in libpython2.7.so.1.0[7f798f660000+173000]
[67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 00007f7b17a85f00 sp 00007fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
[77262.490502] blah.py[29107]: segfault at 21e1458 ip 00007fc54cd17f00 sp 00007fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]


So, what does this "[sched_delayed] sched: RT throttling activated" tell me?


Thank you for your guidance,
Martin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 10:38   ` Martin Mokrejs
@ 2013-08-23 11:02     ` Peter Zijlstra
  2013-08-23 11:35       ` Martin Mokrejs
  2013-08-23 19:07       ` Martin Mokrejs
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2013-08-23 11:02 UTC (permalink / raw)
  To: Martin Mokrejs; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML

On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
> > It means you have (a) real-time task(s) that consume significant amount
> 
> How can I find them? 

ps -deo pid,cls,cmd | grep -e RR -e FF

Should do I suppose

> I don't think I need the RT, I have two CPU-bound
> processes and want to run them at max speed. Rest of the system is unimportant.
> 
> I still don't understand what the $subj message actually says. Does it say
> the RT-requiring task was slowed down? I am a bit lost here.

Yeah, they were forcibly stopped from running for a little while.

> > of time. At some point we throttle them in an attempt to keep the system
> > from falling over.
> 
> Will I get companion "[sched_delayed] sched: RT throttling deactivated"
> at some point?

Nope, you get that message once to tell you that we throttle RT tasks.

> Are python-based apps requiring the realtime features?

I'm fairly sure python could use the relevant scheduling classes, but I
don't speak snake so I really wouldn't know.

> I used to get the messages below which are now gone with my CPU cooler being replaced yesterday:
> 
> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727)

> mcelog report in such cases:
> 
> Hardware event. This is not a software error.
> MCE 0
> CPU 1 THERMAL EVENT TSC 1bf82e2a146 
> TIME 1375536062 Sat Aug  3 15:21:02 2013
> Processor 1 heated above trip temperature. Throttling enabled.
> Please check your system cooling. Performance will be impacted
> STATUS 880003c3 MCGSTATUS 0
> MCGCAP c07 APICID 2 SOCKETID 0 
> CPUID Vendor Intel Family 6 Model 42

Right, those are thermal events throttling the speed of your CPU to keep
the thing from heat damaging itself.

> While my CPU cooler got replaced even now I still get (hence this email thread):
> 
> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 00007ff67badff00 sp 00007fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
> [44520.259205] [sched_delayed] sched: RT throttling activated
> [48956.057816] blah.py[16623]: segfault at 2f ip 00007fd462e5d046 sp 00007fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 00007fe255715f00 sp 00007fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
> [49942.020084] blah.py[6950]: segfault at d0 ip 00007f3e8a9acf9c sp 00007fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
> [66696.443342] blah.py[8015]: segfault at cf ip 00007f798f708f9c sp 00007fff420336e0 error 4 in libpython2.7.so.1.0[7f798f660000+173000]
> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 00007f7b17a85f00 sp 00007fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 00007fc54cd17f00 sp 00007fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]
> 
> 
> So, what does this "[sched_delayed] sched: RT throttling activated" tell me?

That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
warning that comes only once per boot and should prompt you to
investigate.

You can turn the throttle off, but be advised that running a RR/FIFO
task at 100% can (and generally does) negatively affect the running of
your system (as in, these tasks can prevent system duties from taking
place and eventually make the system come to a halt).


As to those faults, investigate if your python prog does something
particualrly weird or your runtime is in order. Otherwise I would advise
you to run memtest for a while to make sure your machine is in proper
working order.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 11:02     ` Peter Zijlstra
@ 2013-08-23 11:35       ` Martin Mokrejs
  2013-08-23 12:10         ` Peter Zijlstra
  2013-08-23 19:07       ` Martin Mokrejs
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23 11:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML



Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
>>> It means you have (a) real-time task(s) that consume significant amount
>>
>> How can I find them? 
> 
> ps -deo pid,cls,cmd | grep -e RR -e FF

# ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
    7  FF [migration/0]
   10  FF [watchdog/0]
   11  FF [watchdog/1]
   12  FF [migration/1]
   17  FF [migration/2]
   22  FF [migration/3]
 2161  FF [irq/50-iwlwifi]
#

The shell/python tasks have 'TS' in place of the FF value in the second column
so I guess they are not requiring realtime responsiveness.

> 
> Should do I suppose
> 
>> I don't think I need the RT, I have two CPU-bound
>> processes and want to run them at max speed. Rest of the system is unimportant.
>>
>> I still don't understand what the $subj message actually says. Does it say
>> the RT-requiring task was slowed down? I am a bit lost here.
> 
> Yeah, they were forcibly stopped from running for a little while.
> 
>>> of time. At some point we throttle them in an attempt to keep the system
>>> from falling over.
>>
>> Will I get companion "[sched_delayed] sched: RT throttling deactivated"
>> at some point?
> 
> Nope, you get that message once to tell you that we throttle RT tasks.

I think the message could improved to explain this is a warn ONCE message and
that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart
message to be anticipated.

> 
>> Are python-based apps requiring the realtime features?
> 
> I'm fairly sure python could use the relevant scheduling classes, but I
> don't speak snake so I really wouldn't know.
> 
>> I used to get the messages below which are now gone with my CPU cooler being replaced yesterday:
>>
>> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727)
> 
>> mcelog report in such cases:
>>
>> Hardware event. This is not a software error.
>> MCE 0
>> CPU 1 THERMAL EVENT TSC 1bf82e2a146 
>> TIME 1375536062 Sat Aug  3 15:21:02 2013
>> Processor 1 heated above trip temperature. Throttling enabled.
>> Please check your system cooling. Performance will be impacted
>> STATUS 880003c3 MCGSTATUS 0
>> MCGCAP c07 APICID 2 SOCKETID 0 
>> CPUID Vendor Intel Family 6 Model 42
> 
> Right, those are thermal events throttling the speed of your CPU to keep
> the thing from heat damaging itself.
> 
>> While my CPU cooler got replaced even now I still get (hence this email thread):
>>
>> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 00007ff67badff00 sp 00007fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
>> [44520.259205] [sched_delayed] sched: RT throttling activated
>> [48956.057816] blah.py[16623]: segfault at 2f ip 00007fd462e5d046 sp 00007fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
>> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 00007fe255715f00 sp 00007fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
>> [49942.020084] blah.py[6950]: segfault at d0 ip 00007f3e8a9acf9c sp 00007fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
>> [66696.443342] blah.py[8015]: segfault at cf ip 00007f798f708f9c sp 00007fff420336e0 error 4 in libpython2.7.so.1.0[7f798f660000+173000]
>> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 00007f7b17a85f00 sp 00007fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
>> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 00007fc54cd17f00 sp 00007fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]
>>
>>
>> So, what does this "[sched_delayed] sched: RT throttling activated" tell me?
> 
> That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
> warning that comes only once per boot and should prompt you to
> investigate.

Could kernel log by itself some kind of equivalent of the
"ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['" command?

> 
> You can turn the throttle off, but be advised that running a RR/FIFO
> task at 100% can (and generally does) negatively affect the running of
> your system (as in, these tasks can prevent system duties from taking
> place and eventually make the system come to a halt).

Provided I have in my .config:

# grep EMPT .config.current 
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

does that mean that I can't do much about those kernel tasks reported by the ps
command above? Or could kernel be tuned to be even less demanding and not
interrupt my tasks "that often" (no idea how often that happens if the message is
logged only once and how much harm is causes).

> 
> 
> As to those faults, investigate if your python prog does something
> particualrly weird or your runtime is in order. Otherwise I would advise
> you to run memtest for a while to make sure your machine is in proper
> working order.

I will re-check the stacktraces but last time I did I did not come to a single
place where it crashes. OK, will re-test the memory again but I think it is fine.
It seemed those results of the overheated CPU and thermal throttling. Now when the
thermal throttling does not happen due to new cooler I wondered what the RT throttling
does and whether that could be causing the segfaults.

Thank you,
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 11:35       ` Martin Mokrejs
@ 2013-08-23 12:10         ` Peter Zijlstra
  2013-08-23 12:33           ` Johannes Berg
  2013-08-23 12:46           ` Martin Mokrejs
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2013-08-23 12:10 UTC (permalink / raw)
  To: Martin Mokrejs; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML, Johannes Berg

On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote:

> # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['

This explicitly only lists kernel threads; from your other comment:

> The shell/python tasks have 'TS' in place of the FF value in the second column
> so I guess they are not requiring realtime responsiveness.

I'll assume you actually inspected the other tasks and found none.

>     7  FF [migration/0]
>    10  FF [watchdog/0]
>    11  FF [watchdog/1]
>    12  FF [migration/1]
>    17  FF [migration/2]
>    22  FF [migration/3]

The 'migration' threads only look like FIFO threads but they're secretly
not and don't count to the limit. The watchdog threads shouldn't run
much either.

>  2161  FF [irq/50-iwlwifi]

Oh a threaded interrupt, I presume you're not using "threadiqrs" since
this is the only interrupt thread around and I see a
'request_threaded_irq()' call in
drivers/net/wireless/iwlwifi/pcie/trans.c

And wow, why would that thing consume that much cpu. 

Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of
cpu-time?


> > Nope, you get that message once to tell you that we throttle RT tasks.
> 
> I think the message could improved to explain this is a warn ONCE message and
> that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart
> message to be anticipated.

Would something like: 

  sched: [ONCE] RT throttle hit -- inspect system configuration.

Be a better message?

Also, could you clarify what exact kernel version you're running? I
couldn't find it in your previous messages.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 12:10         ` Peter Zijlstra
@ 2013-08-23 12:33           ` Johannes Berg
  2013-08-23 13:06             ` Peter Zijlstra
  2013-08-23 12:46           ` Martin Mokrejs
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2013-08-23 12:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Martin Mokrejs, Theodore Tso, Thomas Gleixner, mingo, LKML

On Fri, 2013-08-23 at 14:10 +0200, Peter Zijlstra wrote:

> >  2161  FF [irq/50-iwlwifi]
> 
> Oh a threaded interrupt, I presume you're not using "threadiqrs" since
> this is the only interrupt thread around and I see a
> 'request_threaded_irq()' call in
> drivers/net/wireless/iwlwifi/pcie/trans.c
> 
> And wow, why would that thing consume that much cpu. 

How much is "that much"?

> Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of
> cpu-time?

Not really, no, though if you do a lot of data transfer it can actually
consume a fair bit of CPU - made better by two somewhat recent patches,
commits 01911dab97cb3e21e640aaca82689acec00ed848 and
68972c46f2975d3d61f9dc9f311f77bfc8a8b12b.

johannes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 12:10         ` Peter Zijlstra
  2013-08-23 12:33           ` Johannes Berg
@ 2013-08-23 12:46           ` Martin Mokrejs
  2013-08-23 13:01             ` Martin Mokrejs
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23 12:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML, Johannes Berg



Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote:
> 
>> # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
> 
> This explicitly only lists kernel threads; from your other comment:
> 
>> The shell/python tasks have 'TS' in place of the FF value in the second column
>> so I guess they are not requiring realtime responsiveness.
> 
> I'll assume you actually inspected the other tasks and found none.

Yes, the other (false) matches were in the third or latter columns so I wanted to match
just those true matches and cut&paste it. I admit, this is not a general-purpose
REGEXP and is misleading.

> 
>>     7  FF [migration/0]
>>    10  FF [watchdog/0]
>>    11  FF [watchdog/1]
>>    12  FF [migration/1]
>>    17  FF [migration/2]
>>    22  FF [migration/3]
> 
> The 'migration' threads only look like FIFO threads but they're secretly
> not and don't count to the limit. The watchdog threads shouldn't run
> much either.
> 
>>  2161  FF [irq/50-iwlwifi]
> 
> Oh a threaded interrupt, I presume you're not using "threadiqrs" since

Is that what you talk about?

CONFIG_IRQ_FORCED_THREADING=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y

> this is the only interrupt thread around and I see a
> 'request_threaded_irq()' call in
> drivers/net/wireless/iwlwifi/pcie/trans.c
> 
> And wow, why would that thing consume that much cpu. 
> 
> Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of
> cpu-time?

I am not sure if I understand you but in case it helps somebody

Current values:

# cat /proc/interrupts 
           CPU0       CPU1       
  0:         23          0   IO-APIC-edge      timer
  1:         42          0   IO-APIC-edge      i8042
  8:         36          0   IO-APIC-edge      rtc0
  9:          3          0   IO-APIC-fasteoi   acpi
 12:     404650          0   IO-APIC-edge      i8042
 16:        109          0   IO-APIC-fasteoi   ehci_hcd:usb1
 23:     583646          0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:          0          0   PCI-MSI-edge      pciehp
 41:      54319          0   PCI-MSI-edge      i915
 42:     553802          0   PCI-MSI-edge      ahci
 43:          0          0   PCI-MSI-edge      enp5s0
 44:     257268          0   PCI-MSI-edge      xhci_hcd
 45:          0          0   PCI-MSI-edge      xhci_hcd
 46:          0          0   PCI-MSI-edge      xhci_hcd
 47:          0          0   PCI-MSI-edge      xhci_hcd
 48:          0          0   PCI-MSI-edge      xhci_hcd
 49:     465462          0   PCI-MSI-edge      snd_hda_intel
 50:    3895788          0   PCI-MSI-edge      iwlwifi
NMI:       8687       9483   Non-maskable interrupts
LOC:   17531664   16978131   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:       8687       9483   Performance monitoring interrupts
IWI:     213009     205171   IRQ work interrupts
RTR:          3          0   APIC ICR read retries
RES:    1922651    4491695   Rescheduling interrupts
CAL:      73741     348678   Function call interrupts
TLB:      98634     111173   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        286        286   Machine check polls
ERR:          0
MIS:          0

# ifconfig wlp9s0
wlp9s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.24  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::4e80:93ff:fe15:e6c7  prefixlen 64  scopeid 0x20<link>
        ether 4c:80:93:15:e6:c7  txqueuelen 1000  (Ethernet)
        RX packets 811806  bytes 992611146 (946.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 490006  bytes 71390887 (68.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg
...
[   11.789302] Intel(R) Wireless WiFi driver for Linux, in-tree:d
[   11.789310] Copyright(c) 2003-2013 Intel Corporation
[   11.791626] iwlwifi 0000:09:00.0: irq 50 for MSI/MSI-X
[   12.044905] iwlwifi 0000:09:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm
[   13.896033] iwlwifi 0000:09:00.0: CONFIG_IWLWIFI_DEBUG enabled
[   13.896041] iwlwifi 0000:09:00.0: CONFIG_IWLWIFI_DEBUGFS disabled
[   13.896044] iwlwifi 0000:09:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled
[   13.896047] iwlwifi 0000:09:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled
[   13.896049] iwlwifi 0000:09:00.0: CONFIG_IWLWIFI_P2P disabled
[   13.896054] iwlwifi 0000:09:00.0: Detected Intel(R) Centrino(R) Wireless-N 1030 BGN, REV=0xB0
[   13.896173] iwlwifi 0000:09:00.0: L1 Disabled; Enabling L0S
[   13.917705] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs'


> 
> 
>>> Nope, you get that message once to tell you that we throttle RT tasks.
>>
>> I think the message could improved to explain this is a warn ONCE message and
>> that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart
>> message to be anticipated.
> 
> Would something like: 
> 
>   sched: [ONCE] RT throttle hit -- inspect system configuration.
> 
> Be a better message?

Not really. I would prefer something like:

[sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task $psname

where the "$cmd" would be something like what "ps -deo pid,cls,cmd | grep -e RR -e FF" gives?

Still, I think when the interrupted task is ran again, a "counterpart" message should be logged.
Especially to see the timing between them could be helpful?

> 
> Also, could you clarify what exact kernel version you're running? I
> couldn't find it in your previous messages.


Vanilla 3.10.9, I thought it was in the very first message ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 12:46           ` Martin Mokrejs
@ 2013-08-23 13:01             ` Martin Mokrejs
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23 13:01 UTC (permalink / raw)
  To: Martin Mokrejs, Peter Zijlstra
  Cc: Theodore Tso, Thomas Gleixner, mingo, LKML, Johannes Berg



Martin Mokrejs wrote:

>>>> Nope, you get that message once to tell you that we throttle RT tasks.
>>>
>>> I think the message could improved to explain this is a warn ONCE message and
>>> that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart
>>> message to be anticipated.
>>
>> Would something like: 
>>
>>   sched: [ONCE] RT throttle hit -- inspect system configuration.
>>
>> Be a better message?
> 
> Not really. I would prefer something like:
> 
> [sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task $psname

Actually, to retain the message text appearing in current kernel so that people can find
by e.g. Google newer syntax and possibly this thread maybe much better would be:

[sched_delayed] sched: RT throttling limit $d hit. Stopped running $cmd on CPU%d in favor of RR/FIFO task $psname. Will not issue any more these messages until reboot.

I know, looong line.


I just realized this is about some threshold limit value, and you mean that iwlwifi
contributed the highest increase compared to the other kernel threads on my system.

sysctl -q -a |  grep -i limit

does not show what is the actual value. Am probably looking into a wrong place. ;-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 12:33           ` Johannes Berg
@ 2013-08-23 13:06             ` Peter Zijlstra
  2013-08-23 13:10               ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-08-23 13:06 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Martin Mokrejs, Theodore Tso, Thomas Gleixner, mingo, LKML

On Fri, Aug 23, 2013 at 02:33:42PM +0200, Johannes Berg wrote:
> On Fri, 2013-08-23 at 14:10 +0200, Peter Zijlstra wrote:
> 
> > >  2161  FF [irq/50-iwlwifi]
> > 
> > Oh a threaded interrupt, I presume you're not using "threadiqrs" since
> > this is the only interrupt thread around and I see a
> > 'request_threaded_irq()' call in
> > drivers/net/wireless/iwlwifi/pcie/trans.c
> > 
> > And wow, why would that thing consume that much cpu. 
> 
> How much is "that much"?

0.95s worth of time in the past 1s.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 13:06             ` Peter Zijlstra
@ 2013-08-23 13:10               ` Johannes Berg
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Berg @ 2013-08-23 13:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Martin Mokrejs, Theodore Tso, Thomas Gleixner, mingo, LKML

On Fri, 2013-08-23 at 15:06 +0200, Peter Zijlstra wrote:

> > > >  2161  FF [irq/50-iwlwifi]

> > > And wow, why would that thing consume that much cpu. 
> > 
> > How much is "that much"?
> 
> 0.95s worth of time in the past 1s.

Huh, no, I have no idea how that can be. Any idea what it's doing?

johannes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [sched_delayed] sched: RT throttling activated
  2013-08-23 11:02     ` Peter Zijlstra
  2013-08-23 11:35       ` Martin Mokrejs
@ 2013-08-23 19:07       ` Martin Mokrejs
  1 sibling, 0 replies; 12+ messages in thread
From: Martin Mokrejs @ 2013-08-23 19:07 UTC (permalink / raw)
  To: Peter Zijlstra, Martin Mokrejs; +Cc: Theodore Tso, Thomas Gleixner, mingo, LKML

While you are probably thinking about the iwlwifi issue causing RT throttling
I have one more interesting followup below.

Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
>>> It means you have (a) real-time task(s) that consume significant amount
>>
>> How can I find them? 
> 
> ps -deo pid,cls,cmd | grep -e RR -e FF
> 
> Should do I suppose
> 
>> I don't think I need the RT, I have two CPU-bound
>> processes and want to run them at max speed. Rest of the system is unimportant.
>>
>> I still don't understand what the $subj message actually says. Does it say
>> the RT-requiring task was slowed down? I am a bit lost here.
> 
> Yeah, they were forcibly stopped from running for a little while.
> 
>>> of time. At some point we throttle them in an attempt to keep the system
>>> from falling over.
>>
>> Will I get companion "[sched_delayed] sched: RT throttling deactivated"
>> at some point?
> 
> Nope, you get that message once to tell you that we throttle RT tasks.
> 
>> Are python-based apps requiring the realtime features?
> 
> I'm fairly sure python could use the relevant scheduling classes, but I
> don't speak snake so I really wouldn't know.
> 
>> I used to get the messages below which are now gone with my CPU cooler being replaced yesterday:
>>
>> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727)
> 
>> mcelog report in such cases:
>>
>> Hardware event. This is not a software error.
>> MCE 0
>> CPU 1 THERMAL EVENT TSC 1bf82e2a146 
>> TIME 1375536062 Sat Aug  3 15:21:02 2013
>> Processor 1 heated above trip temperature. Throttling enabled.
>> Please check your system cooling. Performance will be impacted
>> STATUS 880003c3 MCGSTATUS 0
>> MCGCAP c07 APICID 2 SOCKETID 0 
>> CPUID Vendor Intel Family 6 Model 42
> 
> Right, those are thermal events throttling the speed of your CPU to keep
> the thing from heat damaging itself.
> 
>> While my CPU cooler got replaced even now I still get (hence this email thread):
>>
>> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 00007ff67badff00 sp 00007fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
>> [44520.259205] [sched_delayed] sched: RT throttling activated
>> [48956.057816] blah.py[16623]: segfault at 2f ip 00007fd462e5d046 sp 00007fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
>> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 00007fe255715f00 sp 00007fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
>> [49942.020084] blah.py[6950]: segfault at d0 ip 00007f3e8a9acf9c sp 00007fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
>> [66696.443342] blah.py[8015]: segfault at cf ip 00007f798f708f9c sp 00007fff420336e0 error 4 in libpython2.7.so.1.0[7f798f660000+173000]
>> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 00007f7b17a85f00 sp 00007fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
>> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 00007fc54cd17f00 sp 00007fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]
>>
>>
>> So, what does this "[sched_delayed] sched: RT throttling activated" tell me?
> 
> That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
> warning that comes only once per boot and should prompt you to
> investigate.
> 
> You can turn the throttle off, but be advised that running a RR/FIFO
> task at 100% can (and generally does) negatively affect the running of
> your system (as in, these tasks can prevent system duties from taking
> place and eventually make the system come to a halt).
> 
> 
> As to those faults, investigate if your python prog does something
> particualrly weird or your runtime is in order. Otherwise I would advise
> you to run memtest for a while to make sure your machine is in proper
> working order.

Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / filesystem.
I do not have timing information what was the time since bootup. I deleted some
files on the disk and thought I am done. Now, few hours later I realized:

[85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000]
[87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal
[87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case.  If you have one, please send an email to linux-mm@kvack.org.
[97959.812943] blah.py[13069]: segfault at 7f1f2cfdca58 ip 00007f1f2db87f00 sp 00007fffade41768 error 4 in libpython2.7.so.1.0[7f1f2da77000+173000]


I bet at about the time 87125 the disk was full. The laptop has 16GB of RAM
and the coredump files are really big, 300MB to 8GB. However, the nr_pdflush_threads
message sounds scary. Does linux 3.10.9 want to delete /proc on the fly? ;-)


Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-08-23 19:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23  8:53 [sched_delayed] sched: RT throttling activated Martin Mokrejs
2013-08-23 10:09 ` Peter Zijlstra
2013-08-23 10:38   ` Martin Mokrejs
2013-08-23 11:02     ` Peter Zijlstra
2013-08-23 11:35       ` Martin Mokrejs
2013-08-23 12:10         ` Peter Zijlstra
2013-08-23 12:33           ` Johannes Berg
2013-08-23 13:06             ` Peter Zijlstra
2013-08-23 13:10               ` Johannes Berg
2013-08-23 12:46           ` Martin Mokrejs
2013-08-23 13:01             ` Martin Mokrejs
2013-08-23 19:07       ` Martin Mokrejs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.