linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] hung_task: show sysctl_hung_task_warnings
@ 2023-01-12  9:17 Weiping Zhang
  2023-01-13 13:43 ` Petr Mladek
  0 siblings, 1 reply; 6+ messages in thread
From: Weiping Zhang @ 2023-01-12  9:17 UTC (permalink / raw)
  To: pmladek, akpm, peterz; +Cc: linux-kernel, zwp10758

This patch try to add more debug info to detect lost kernel log or no
hung task was detected.

The user set 10 to the hung_task_timeout_secs, the kernel log:

[ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.

There is no any log at about 3983, it's hard to know if kernel log was
lost or there is no hung task was detected at that moment. So this patch
print sysctl_hung_task_warnings to distinguish the above two cases.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 kernel/hung_task.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c71889f3f3fc..ca917931473d 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 * complain:
 	 */
 	if (sysctl_hung_task_warnings) {
-		if (sysctl_hung_task_warnings > 0)
+		if (sysctl_hung_task_warnings > 0) {
 			sysctl_hung_task_warnings--;
+			pr_err("sysctl_hung_task_warnings: %d\n",
+				sysctl_hung_task_warnings);
+		}
 		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
 		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
 		pr_err("      %s %s %.*s\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] hung_task: show sysctl_hung_task_warnings
  2023-01-12  9:17 [RFC PATCH] hung_task: show sysctl_hung_task_warnings Weiping Zhang
@ 2023-01-13 13:43 ` Petr Mladek
  2023-01-29  9:08   ` Weiping Zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2023-01-13 13:43 UTC (permalink / raw)
  To: akpm, peterz, linux-kernel, zwp10758

On Thu 2023-01-12 17:17:45, Weiping Zhang wrote:
> This patch try to add more debug info to detect lost kernel log or no
> hung task was detected.
> 
> The user set 10 to the hung_task_timeout_secs, the kernel log:
> 
> [ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
> [ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> [ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> [ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
> [ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> [ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> [ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
> [ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> [ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> [ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
> [ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> [ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> [ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.
> 
> There is no any log at about 3983, it's hard to know if kernel log was
> lost or there is no hung task was detected at that moment. So this patch
> print sysctl_hung_task_warnings to distinguish the above two cases.
> 
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>  kernel/hung_task.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index c71889f3f3fc..ca917931473d 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  	 * complain:
>  	 */
>  	if (sysctl_hung_task_warnings) {
> -		if (sysctl_hung_task_warnings > 0)
> +		if (sysctl_hung_task_warnings > 0) {
>  			sysctl_hung_task_warnings--;
> +			pr_err("sysctl_hung_task_warnings: %d\n",
> +				sysctl_hung_task_warnings);
> +		}

It is too much noise. But it might make sense to report it when
the counter gets down to zero. Something like:

		if (sysctl_hung_task_warnings)
			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");

and move this down after printing all the details for this hung task report.

>  		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
>  		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
>  		pr_err("      %s %s %.*s\n",

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] hung_task: show sysctl_hung_task_warnings
  2023-01-13 13:43 ` Petr Mladek
@ 2023-01-29  9:08   ` Weiping Zhang
  2023-01-30 10:28     ` Petr Mladek
  0 siblings, 1 reply; 6+ messages in thread
From: Weiping Zhang @ 2023-01-29  9:08 UTC (permalink / raw)
  To: Petr Mladek; +Cc: akpm, peterz, linux-kernel

Sorry for the late reply.

On Fri, Jan 13, 2023 at 9:43 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2023-01-12 17:17:45, Weiping Zhang wrote:
> > This patch try to add more debug info to detect lost kernel log or no
> > hung task was detected.
> >
> > The user set 10 to the hung_task_timeout_secs, the kernel log:
> >
> > [ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
> > [ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > [ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > [ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
> > [ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > [ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > [ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
> > [ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > [ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > [ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.

Expect but no hung task log at this moment.

> > [ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > [ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > [ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.
> >
> > There is no any log at about 3983, it's hard to know if kernel log was
> > lost or there is no hung task was detected at that moment. So this patch
> > print sysctl_hung_task_warnings to distinguish the above two cases.
> >
> > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > ---
> >  kernel/hung_task.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index c71889f3f3fc..ca917931473d 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> >        * complain:
> >        */
> >       if (sysctl_hung_task_warnings) {
> > -             if (sysctl_hung_task_warnings > 0)
> > +             if (sysctl_hung_task_warnings > 0) {
> >                       sysctl_hung_task_warnings--;
> > +                     pr_err("sysctl_hung_task_warnings: %d\n",
> > +                             sysctl_hung_task_warnings);
> > +             }
>
> It is too much noise. But it might make sense to report it when
> the counter gets down to zero. Something like:
>
>                 if (sysctl_hung_task_warnings)
>                         pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
>
> and move this down after printing all the details for this hung task report.
I'm fine to print this warning when it gets down to 0.

The warning counter is useful to detect kernel log lost or not, if add
a new line to print this count is too noise,
how about append this counter at the end of the following line:
        pr_err("INFO: task %s:%d blocked for more than %ld seconds,
sysctl_hung_task_warnings: %d\n",
                t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
sysctl_hung_task_warnings);
>
> >               pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> >                      t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> >               pr_err("      %s %s %.*s\n",
>
> Best Regards,
> Petr

Thanks a ton
Weiping

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] hung_task: show sysctl_hung_task_warnings
  2023-01-29  9:08   ` Weiping Zhang
@ 2023-01-30 10:28     ` Petr Mladek
  2023-01-31  7:56       ` Weiping Zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2023-01-30 10:28 UTC (permalink / raw)
  To: Weiping Zhang; +Cc: akpm, peterz, linux-kernel

On Sun 2023-01-29 17:08:29, Weiping Zhang wrote:
> Sorry for the late reply.
> 
> On Fri, Jan 13, 2023 at 9:43 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Thu 2023-01-12 17:17:45, Weiping Zhang wrote:
> > > This patch try to add more debug info to detect lost kernel log or no
> > > hung task was detected.
> > >
> > > The user set 10 to the hung_task_timeout_secs, the kernel log:
> > >
> > > [ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
> 
> Expect but no hung task log at this moment.
> 
> > > [ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.
> > >
> > > There is no any log at about 3983, it's hard to know if kernel log was
> > > lost or there is no hung task was detected at that moment. So this patch
> > > print sysctl_hung_task_warnings to distinguish the above two cases.
> > >
> > > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > > ---
> > >  kernel/hung_task.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > index c71889f3f3fc..ca917931473d 100644
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > >        * complain:
> > >        */
> > >       if (sysctl_hung_task_warnings) {
> > > -             if (sysctl_hung_task_warnings > 0)
> > > +             if (sysctl_hung_task_warnings > 0) {
> > >                       sysctl_hung_task_warnings--;
> > > +                     pr_err("sysctl_hung_task_warnings: %d\n",
> > > +                             sysctl_hung_task_warnings);
> > > +             }
> >
> > It is too much noise. But it might make sense to report it when
> > the counter gets down to zero. Something like:
> >
> >                 if (sysctl_hung_task_warnings)
> >                         pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
> >
> > and move this down after printing all the details for this hung task report.
> I'm fine to print this warning when it gets down to 0.

I prefer this variant.

> The warning counter is useful to detect kernel log lost or not, if add
> a new line to print this count is too noise,
> how about append this counter at the end of the following line:
>         pr_err("INFO: task %s:%d blocked for more than %ld seconds,
> sysctl_hung_task_warnings: %d\n",
>                 t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
> sysctl_hung_task_warnings);

Is it really important to print this on every line, please? IMHO, most people
do not care. IMHO. it would add more harm (confusion, noise) than good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] hung_task: show sysctl_hung_task_warnings
  2023-01-30 10:28     ` Petr Mladek
@ 2023-01-31  7:56       ` Weiping Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Weiping Zhang @ 2023-01-31  7:56 UTC (permalink / raw)
  To: Petr Mladek, fuyuanli; +Cc: akpm, peterz, linux-kernel

On Mon, Jan 30, 2023 at 6:28 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Sun 2023-01-29 17:08:29, Weiping Zhang wrote:
> > Sorry for the late reply.
> >
> > On Fri, Jan 13, 2023 at 9:43 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Thu 2023-01-12 17:17:45, Weiping Zhang wrote:
> > > > This patch try to add more debug info to detect lost kernel log or no
> > > > hung task was detected.
> > > >
> > > > The user set 10 to the hung_task_timeout_secs, the kernel log:
> > > >
> > > > [ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
> > > > [ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > > [ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > > [ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
> > > > [ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > > [ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > > [ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
> > > > [ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > > [ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > > [ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
> >
> > Expect but no hung task log at this moment.
> >
> > > > [ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > > [ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > > [ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.
> > > >
> > > > There is no any log at about 3983, it's hard to know if kernel log was
> > > > lost or there is no hung task was detected at that moment. So this patch
> > > > print sysctl_hung_task_warnings to distinguish the above two cases.
> > > >
> > > > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > > > ---
> > > >  kernel/hung_task.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > > index c71889f3f3fc..ca917931473d 100644
> > > > --- a/kernel/hung_task.c
> > > > +++ b/kernel/hung_task.c
> > > > @@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > > >        * complain:
> > > >        */
> > > >       if (sysctl_hung_task_warnings) {
> > > > -             if (sysctl_hung_task_warnings > 0)
> > > > +             if (sysctl_hung_task_warnings > 0) {
> > > >                       sysctl_hung_task_warnings--;
> > > > +                     pr_err("sysctl_hung_task_warnings: %d\n",
> > > > +                             sysctl_hung_task_warnings);
> > > > +             }
> > >
> > > It is too much noise. But it might make sense to report it when
> > > the counter gets down to zero. Something like:
> > >
> > >                 if (sysctl_hung_task_warnings)
> > >                         pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
> > >
> > > and move this down after printing all the details for this hung task report.
> > I'm fine to print this warning when it gets down to 0.
>
> I prefer this variant.
It's ok, Yuanli will submit a patch for this.
>
> > The warning counter is useful to detect kernel log lost or not, if add
> > a new line to print this count is too noise,
> > how about append this counter at the end of the following line:
> >         pr_err("INFO: task %s:%d blocked for more than %ld seconds,
> > sysctl_hung_task_warnings: %d\n",
> >                 t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
> > sysctl_hung_task_warnings);
>
> Is it really important to print this on every line, please? IMHO, most people
> do not care. IMHO. it would add more harm (confusion, noise) than good.
>
Thanks
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH] hung_task: show sysctl_hung_task_warnings
@ 2023-01-12  9:29 Weiping Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Weiping Zhang @ 2023-01-12  9:29 UTC (permalink / raw)
  To: pmladek, akpm, peterz; +Cc: linux-kernel, zwp10758

This patch try to add more debug info to detect lost kernel log or no
hung task was detected.

The user set 10 to the hung_task_timeout_secs, the kernel log:

[ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
[ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
[ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
[ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.

There is no any log at about 3983, it's hard to know if kernel log was
lost or there is no hung task was detected at that moment. So this patch
print sysctl_hung_task_warnings to distinguish the above two cases.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 kernel/hung_task.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c71889f3f3fc..ca917931473d 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 * complain:
 	 */
 	if (sysctl_hung_task_warnings) {
-		if (sysctl_hung_task_warnings > 0)
+		if (sysctl_hung_task_warnings > 0) {
 			sysctl_hung_task_warnings--;
+			pr_err("sysctl_hung_task_warnings: %d\n",
+				sysctl_hung_task_warnings);
+		}
 		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
 		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
 		pr_err("      %s %s %.*s\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-31  7:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12  9:17 [RFC PATCH] hung_task: show sysctl_hung_task_warnings Weiping Zhang
2023-01-13 13:43 ` Petr Mladek
2023-01-29  9:08   ` Weiping Zhang
2023-01-30 10:28     ` Petr Mladek
2023-01-31  7:56       ` Weiping Zhang
2023-01-12  9:29 Weiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).