All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: rlimits: Print more information when CPU/RT limits are exceeded
       [not found] <20170501232152.8142E661220@gitolite.kernel.org>
@ 2017-05-02  3:12 ` Dave Jones
  2017-05-04  4:38   ` Arun Raghavan
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Jones @ 2017-05-02  3:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Arun Raghavan, Thomas Gleixner, Linus Torvalds

On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
 > Web:        https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
 > Commit:     e7ea7c9806a2681807257ea89085339d33f7fa0b
 > Parent:     4495c08e84729385774601b5146d51d9e5849f81
 > Refname:    refs/heads/master
 > Author:     Arun Raghavan <arun@arunraghavan.net>
 > AuthorDate: Wed Mar 1 20:23:09 2017 +0530
 > Committer:  Thomas Gleixner <tglx@linutronix.de>
 > CommitDate: Mon Mar 13 21:32:15 2017 +0100
 > 
 >     rlimits: Print more information when CPU/RT limits are exceeded
 >     
 >     When a process is sent a SIGKILL because it exceeded CPU or RT limits,
 >     the cause may not be obvious in userspace -- daemonised processes just
 >     get killed, and even foreground process just see a 'Killed' message. The
 >     lack of any information on why this might be happening in logs can be
 >     confusing to users who are not aware of this mechanism.
 >     
 >     Add messages which dump the process name and tid in dmesg when a process
 >     exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
 >     people debugging such issues.
 >     
 >     Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
 >     Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
 >     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

This needs to be configurable, because this is really obnoxious..

[  121.042170] RT Watchdog Timeout (hard): trinity-c40[7533]
[  125.670948] RT Watchdog Timeout (hard): trinity-c29[1612]
[  200.631968] CPU Watchdog Timeout (soft): trinity-c33[11454]
[  200.644308] CPU Watchdog Timeout (soft): trinity-c33[11454]
[  200.656551] CPU Watchdog Timeout (soft): trinity-c33[11454]
[  213.454504] CPU Watchdog Timeout (soft): trinity-c33[11454]
[  276.787116] CPU Watchdog Timeout (soft): trinity-c22[23943]
[  285.857773] CPU Watchdog Timeout (soft): trinity-c33[24908]
[  287.236710] CPU Watchdog Timeout (soft): trinity-c22[23943]
[  295.186400] CPU Watchdog Timeout (soft): trinity-c33[24908]
[  296.464352] CPU Watchdog Timeout (soft): trinity-c22[23943]
[  305.164011] CPU Watchdog Timeout (soft): trinity-c33[24908]
[  367.123564] CPU Watchdog Timeout (soft): trinity-c8[1472]
[  373.950321] CPU Watchdog Timeout (soft): trinity-c8[1472]
[  381.415054] CPU Watchdog Timeout (soft): trinity-c8[1472]
[  389.621759] CPU Watchdog Timeout (soft): trinity-c8[1472]
[  463.940996] CPU Watchdog Timeout (soft): trinity-c29[7725]
[  463.952215] CPU Watchdog Timeout (soft): trinity-c29[7725]
[  463.963306] CPU Watchdog Timeout (soft): trinity-c29[7725]
[  555.264401] RT Watchdog Timeout (hard): trinity-c58[19175]
[  610.536159] RT Watchdog Timeout (hard): trinity-c2[28363]
[  741.785688] CPU Watchdog Timeout (soft): trinity-c46[5034]
[  741.796600] CPU Watchdog Timeout (soft): trinity-c46[5034]
[  741.807384] CPU Watchdog Timeout (soft): trinity-c46[5034]
[  741.972679] CPU Watchdog Timeout (soft): trinity-c46[5034]
[  743.115630] CPU Watchdog Timeout (soft): trinity-c46[5034]
[  743.128628] CPU Watchdog Timeout (soft): trinity-c12[2803]
[  743.139032] CPU Watchdog Timeout (soft): trinity-c12[2803]
[  743.149276] CPU Watchdog Timeout (soft): trinity-c12[2803]
[  823.866183] CPU Watchdog Timeout (soft): trinity-c21[15684]
[  892.151230] CPU Watchdog Timeout (soft): trinity-c56[22668]
[  892.161239] CPU Watchdog Timeout (soft): trinity-c56[22668]
[  899.818894] CPU Watchdog Timeout (soft): trinity-c4[22072]
[  899.828718] CPU Watchdog Timeout (soft): trinity-c4[22072]
[  899.838439] CPU Watchdog Timeout (soft): trinity-c4[22072]
[  905.253660] CPU Watchdog Timeout (soft): trinity-c56[22668]
[  907.297573] CPU Watchdog Timeout (soft): trinity-c39[24134]
[  907.307170] CPU Watchdog Timeout (soft): trinity-c39[24134]
[  907.519560] CPU Watchdog Timeout (soft): trinity-c4[22072]
[  940.624120] RT Watchdog Timeout (hard): trinity-c34[30478]
[  959.189311] CPU Watchdog Timeout (soft): trinity-c61[31012]
[  968.873887] CPU Watchdog Timeout (soft): trinity-c61[31012]
[  980.305390] CPU Watchdog Timeout (soft): trinity-c61[31012]
[  992.532852] CPU Watchdog Timeout (soft): trinity-c61[31012]
[ 1032.126118] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1036.723920] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1046.628487] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1054.169156] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1064.102718] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1076.722166] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1129.844831] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1134.980606] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1146.512098] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1158.420576] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1170.275054] CPU Watchdog Timeout (soft): trinity-c4[14714]


	Dave

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rlimits: Print more information when CPU/RT limits are exceeded
  2017-05-02  3:12 ` rlimits: Print more information when CPU/RT limits are exceeded Dave Jones
@ 2017-05-04  4:38   ` Arun Raghavan
  2017-05-04 13:32     ` Dave Jones
  0 siblings, 1 reply; 3+ messages in thread
From: Arun Raghavan @ 2017-05-04  4:38 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel Mailing List; +Cc: Thomas Gleixner, Linus Torvalds

On Tue, 2 May 2017, at 08:42 AM, Dave Jones wrote:
> On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
>  > Web:        https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
>  > Commit:     e7ea7c9806a2681807257ea89085339d33f7fa0b
>  > Parent:     4495c08e84729385774601b5146d51d9e5849f81
>  > Refname:    refs/heads/master
>  > Author:     Arun Raghavan <arun@arunraghavan.net>
>  > AuthorDate: Wed Mar 1 20:23:09 2017 +0530
>  > Committer:  Thomas Gleixner <tglx@linutronix.de>
>  > CommitDate: Mon Mar 13 21:32:15 2017 +0100
>  > 
>  >     rlimits: Print more information when CPU/RT limits are exceeded
>  >     
>  >     When a process is sent a SIGKILL because it exceeded CPU or RT limits,
>  >     the cause may not be obvious in userspace -- daemonised processes just
>  >     get killed, and even foreground process just see a 'Killed' message. The
>  >     lack of any information on why this might be happening in logs can be
>  >     confusing to users who are not aware of this mechanism.
>  >     
>  >     Add messages which dump the process name and tid in dmesg when a process
>  >     exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
>  >     people debugging such issues.
>  >     
>  >     Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
>  >     Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
>  >     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> This needs to be configurable, because this is really obnoxious..

Is there an example of hos this is done elsewhere that I can work off?

Thanks,
Arun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rlimits: Print more information when CPU/RT limits are exceeded
  2017-05-04  4:38   ` Arun Raghavan
@ 2017-05-04 13:32     ` Dave Jones
  0 siblings, 0 replies; 3+ messages in thread
From: Dave Jones @ 2017-05-04 13:32 UTC (permalink / raw)
  To: Arun Raghavan; +Cc: Linux Kernel Mailing List, Thomas Gleixner, Linus Torvalds

On Thu, May 04, 2017 at 10:08:16AM +0530, Arun Raghavan wrote:
 > On Tue, 2 May 2017, at 08:42 AM, Dave Jones wrote:
 > > On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
 > >  > Web:        https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
 > >  > Commit:     e7ea7c9806a2681807257ea89085339d33f7fa0b
 > >  > Parent:     4495c08e84729385774601b5146d51d9e5849f81
 > >  > Refname:    refs/heads/master
 > >  > Author:     Arun Raghavan <arun@arunraghavan.net>
 > >  > AuthorDate: Wed Mar 1 20:23:09 2017 +0530
 > >  > Committer:  Thomas Gleixner <tglx@linutronix.de>
 > >  > CommitDate: Mon Mar 13 21:32:15 2017 +0100
 > >  > 
 > >  >     rlimits: Print more information when CPU/RT limits are exceeded
 > >  >     
 > >  >     When a process is sent a SIGKILL because it exceeded CPU or RT limits,
 > >  >     the cause may not be obvious in userspace -- daemonised processes just
 > >  >     get killed, and even foreground process just see a 'Killed' message. The
 > >  >     lack of any information on why this might be happening in logs can be
 > >  >     confusing to users who are not aware of this mechanism.
 > >  >     
 > >  >     Add messages which dump the process name and tid in dmesg when a process
 > >  >     exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
 > >  >     people debugging such issues.
 > >  >     
 > >  >     Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
 > >  >     Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
 > >  >     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 > > 
 > > This needs to be configurable, because this is really obnoxious..
 > 
 > Is there an example of hos this is done elsewhere that I can work off?

The obvious one to me that comes to mind is /proc/sys/kernel/print-fatal-signals

	Dave

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-04 13:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170501232152.8142E661220@gitolite.kernel.org>
2017-05-02  3:12 ` rlimits: Print more information when CPU/RT limits are exceeded Dave Jones
2017-05-04  4:38   ` Arun Raghavan
2017-05-04 13:32     ` Dave Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.