* Re: rlimits: Print more information when CPU/RT limits are exceeded
[not found] <20170501232152.8142E661220@gitolite.kernel.org>
@ 2017-05-02 3:12 ` Dave Jones
2017-05-04 4:38 ` Arun Raghavan
0 siblings, 1 reply; 3+ messages in thread
From: Dave Jones @ 2017-05-02 3:12 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Arun Raghavan, Thomas Gleixner, Linus Torvalds
On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
> Web: https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
> Commit: e7ea7c9806a2681807257ea89085339d33f7fa0b
> Parent: 4495c08e84729385774601b5146d51d9e5849f81
> Refname: refs/heads/master
> Author: Arun Raghavan <arun@arunraghavan.net>
> AuthorDate: Wed Mar 1 20:23:09 2017 +0530
> Committer: Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Mon Mar 13 21:32:15 2017 +0100
>
> rlimits: Print more information when CPU/RT limits are exceeded
>
> When a process is sent a SIGKILL because it exceeded CPU or RT limits,
> the cause may not be obvious in userspace -- daemonised processes just
> get killed, and even foreground process just see a 'Killed' message. The
> lack of any information on why this might be happening in logs can be
> confusing to users who are not aware of this mechanism.
>
> Add messages which dump the process name and tid in dmesg when a process
> exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
> people debugging such issues.
>
> Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
> Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This needs to be configurable, because this is really obnoxious..
[ 121.042170] RT Watchdog Timeout (hard): trinity-c40[7533]
[ 125.670948] RT Watchdog Timeout (hard): trinity-c29[1612]
[ 200.631968] CPU Watchdog Timeout (soft): trinity-c33[11454]
[ 200.644308] CPU Watchdog Timeout (soft): trinity-c33[11454]
[ 200.656551] CPU Watchdog Timeout (soft): trinity-c33[11454]
[ 213.454504] CPU Watchdog Timeout (soft): trinity-c33[11454]
[ 276.787116] CPU Watchdog Timeout (soft): trinity-c22[23943]
[ 285.857773] CPU Watchdog Timeout (soft): trinity-c33[24908]
[ 287.236710] CPU Watchdog Timeout (soft): trinity-c22[23943]
[ 295.186400] CPU Watchdog Timeout (soft): trinity-c33[24908]
[ 296.464352] CPU Watchdog Timeout (soft): trinity-c22[23943]
[ 305.164011] CPU Watchdog Timeout (soft): trinity-c33[24908]
[ 367.123564] CPU Watchdog Timeout (soft): trinity-c8[1472]
[ 373.950321] CPU Watchdog Timeout (soft): trinity-c8[1472]
[ 381.415054] CPU Watchdog Timeout (soft): trinity-c8[1472]
[ 389.621759] CPU Watchdog Timeout (soft): trinity-c8[1472]
[ 463.940996] CPU Watchdog Timeout (soft): trinity-c29[7725]
[ 463.952215] CPU Watchdog Timeout (soft): trinity-c29[7725]
[ 463.963306] CPU Watchdog Timeout (soft): trinity-c29[7725]
[ 555.264401] RT Watchdog Timeout (hard): trinity-c58[19175]
[ 610.536159] RT Watchdog Timeout (hard): trinity-c2[28363]
[ 741.785688] CPU Watchdog Timeout (soft): trinity-c46[5034]
[ 741.796600] CPU Watchdog Timeout (soft): trinity-c46[5034]
[ 741.807384] CPU Watchdog Timeout (soft): trinity-c46[5034]
[ 741.972679] CPU Watchdog Timeout (soft): trinity-c46[5034]
[ 743.115630] CPU Watchdog Timeout (soft): trinity-c46[5034]
[ 743.128628] CPU Watchdog Timeout (soft): trinity-c12[2803]
[ 743.139032] CPU Watchdog Timeout (soft): trinity-c12[2803]
[ 743.149276] CPU Watchdog Timeout (soft): trinity-c12[2803]
[ 823.866183] CPU Watchdog Timeout (soft): trinity-c21[15684]
[ 892.151230] CPU Watchdog Timeout (soft): trinity-c56[22668]
[ 892.161239] CPU Watchdog Timeout (soft): trinity-c56[22668]
[ 899.818894] CPU Watchdog Timeout (soft): trinity-c4[22072]
[ 899.828718] CPU Watchdog Timeout (soft): trinity-c4[22072]
[ 899.838439] CPU Watchdog Timeout (soft): trinity-c4[22072]
[ 905.253660] CPU Watchdog Timeout (soft): trinity-c56[22668]
[ 907.297573] CPU Watchdog Timeout (soft): trinity-c39[24134]
[ 907.307170] CPU Watchdog Timeout (soft): trinity-c39[24134]
[ 907.519560] CPU Watchdog Timeout (soft): trinity-c4[22072]
[ 940.624120] RT Watchdog Timeout (hard): trinity-c34[30478]
[ 959.189311] CPU Watchdog Timeout (soft): trinity-c61[31012]
[ 968.873887] CPU Watchdog Timeout (soft): trinity-c61[31012]
[ 980.305390] CPU Watchdog Timeout (soft): trinity-c61[31012]
[ 992.532852] CPU Watchdog Timeout (soft): trinity-c61[31012]
[ 1032.126118] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1036.723920] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1046.628487] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1054.169156] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1064.102718] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1076.722166] CPU Watchdog Timeout (soft): trinity-c34[5861]
[ 1129.844831] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1134.980606] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1146.512098] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1158.420576] CPU Watchdog Timeout (soft): trinity-c4[14714]
[ 1170.275054] CPU Watchdog Timeout (soft): trinity-c4[14714]
Dave
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: rlimits: Print more information when CPU/RT limits are exceeded
2017-05-02 3:12 ` rlimits: Print more information when CPU/RT limits are exceeded Dave Jones
@ 2017-05-04 4:38 ` Arun Raghavan
2017-05-04 13:32 ` Dave Jones
0 siblings, 1 reply; 3+ messages in thread
From: Arun Raghavan @ 2017-05-04 4:38 UTC (permalink / raw)
To: Dave Jones, Linux Kernel Mailing List; +Cc: Thomas Gleixner, Linus Torvalds
On Tue, 2 May 2017, at 08:42 AM, Dave Jones wrote:
> On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
> > Web: https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
> > Commit: e7ea7c9806a2681807257ea89085339d33f7fa0b
> > Parent: 4495c08e84729385774601b5146d51d9e5849f81
> > Refname: refs/heads/master
> > Author: Arun Raghavan <arun@arunraghavan.net>
> > AuthorDate: Wed Mar 1 20:23:09 2017 +0530
> > Committer: Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Mon Mar 13 21:32:15 2017 +0100
> >
> > rlimits: Print more information when CPU/RT limits are exceeded
> >
> > When a process is sent a SIGKILL because it exceeded CPU or RT limits,
> > the cause may not be obvious in userspace -- daemonised processes just
> > get killed, and even foreground process just see a 'Killed' message. The
> > lack of any information on why this might be happening in logs can be
> > confusing to users who are not aware of this mechanism.
> >
> > Add messages which dump the process name and tid in dmesg when a process
> > exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
> > people debugging such issues.
> >
> > Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
> > Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> This needs to be configurable, because this is really obnoxious..
Is there an example of hos this is done elsewhere that I can work off?
Thanks,
Arun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: rlimits: Print more information when CPU/RT limits are exceeded
2017-05-04 4:38 ` Arun Raghavan
@ 2017-05-04 13:32 ` Dave Jones
0 siblings, 0 replies; 3+ messages in thread
From: Dave Jones @ 2017-05-04 13:32 UTC (permalink / raw)
To: Arun Raghavan; +Cc: Linux Kernel Mailing List, Thomas Gleixner, Linus Torvalds
On Thu, May 04, 2017 at 10:08:16AM +0530, Arun Raghavan wrote:
> On Tue, 2 May 2017, at 08:42 AM, Dave Jones wrote:
> > On Mon, May 01, 2017 at 11:21:52PM +0000, Linux Kernel wrote:
> > > Web: https://git.kernel.org/torvalds/c/e7ea7c9806a2681807257ea89085339d33f7fa0b
> > > Commit: e7ea7c9806a2681807257ea89085339d33f7fa0b
> > > Parent: 4495c08e84729385774601b5146d51d9e5849f81
> > > Refname: refs/heads/master
> > > Author: Arun Raghavan <arun@arunraghavan.net>
> > > AuthorDate: Wed Mar 1 20:23:09 2017 +0530
> > > Committer: Thomas Gleixner <tglx@linutronix.de>
> > > CommitDate: Mon Mar 13 21:32:15 2017 +0100
> > >
> > > rlimits: Print more information when CPU/RT limits are exceeded
> > >
> > > When a process is sent a SIGKILL because it exceeded CPU or RT limits,
> > > the cause may not be obvious in userspace -- daemonised processes just
> > > get killed, and even foreground process just see a 'Killed' message. The
> > > lack of any information on why this might be happening in logs can be
> > > confusing to users who are not aware of this mechanism.
> > >
> > > Add messages which dump the process name and tid in dmesg when a process
> > > exceeds its CPU or RT limits (soft and hard) in order to make it clearer to
> > > people debugging such issues.
> > >
> > > Signed-off-by: Arun Raghavan <arun@arunraghavan.net>
> > > Link: http://lkml.kernel.org/r/20170301145309.27214-1-arun@arunraghavan.net
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > This needs to be configurable, because this is really obnoxious..
>
> Is there an example of hos this is done elsewhere that I can work off?
The obvious one to me that comes to mind is /proc/sys/kernel/print-fatal-signals
Dave
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-05-04 13:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20170501232152.8142E661220@gitolite.kernel.org>
2017-05-02 3:12 ` rlimits: Print more information when CPU/RT limits are exceeded Dave Jones
2017-05-04 4:38 ` Arun Raghavan
2017-05-04 13:32 ` Dave Jones
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.