linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Radoslaw Szkodzinski <lkml@astralstorm.puszkin.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Mon, 3 Dec 2007 11:27:15 +0100	[thread overview]
Message-ID: <20071203102715.GC28560@one.firstfloor.org> (raw)
In-Reply-To: <20071203111520.33ed2139@astralstorm.puszkin.org>

> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.

What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away develops 
some hickup?  Or just the SCSI driver decides to do lengthy error 
recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test
and to fix.

> Yes, that's exactly why the patch is needed - to find the bugs and fix

The way to find that would be to use source auditing, not break
perfectly fine error handling paths. Especially since this at least
traditionally hasn't been considered a bug, but a fundamental design
parameter of network/block/etc. file systems 

> CIFS and similar have to be fixed - it tends to lock the app
> using it, in unkillable state.

Actually that's not true. You can umount -f and then kill for at least
NFS and CIFS. Not sure it is true for the other network file systems
though.

You could in theory do TASK_KILLABLE for all block devices too (not 
a bad thing; I would love to have it). 

But it would be equivalent in work (has to patch all the same places with 
similar code) to Suparna's big old fs AIO retry
patchkit that never went forward because everyone was too worried
about excessive code impact. Maybe that has changed, maybe not ... 

And even then you would need to check all possible error handling
paths (scsi_error and low level drivers at least) that they all 
finish in less than two minutes.

> > > wild guesses. Only one way to get the real false positive percentage.
> > 
> > Yes let's break things first instead of looking at the implications closely.
> 
> Throwing _rare_ stack traces is not breakage. 120s task_uninterruptible

Sorry that's total bogus. Throwing a stack trace is the kernel
equivalent of sending S.O.S. and worrying the user significantly,
taxing reporting resources etc.  and in the interest of saving
everybody trouble it should only do that when it is really
sure it is truly broken. 

> in the usual case (no errors) is already broken - there are no sane
> loads that can invoke that IMO.

You are wrong on that.

-Andi


  parent reply	other threads:[~2007-12-03 10:27 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-01  9:20 [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks Ingo Molnar
2007-12-01 18:31 ` David Rientjes
2007-12-01 18:33   ` Ingo Molnar
2007-12-01 18:42 ` David Rientjes
2007-12-01 19:36   ` Ingo Molnar
2007-12-02  0:54     ` Ingo Oeser
2007-12-02  8:58       ` Ingo Molnar
2007-12-02 15:52       ` David Rientjes
2007-12-02 18:57 ` Andi Kleen
2007-12-02 18:59   ` Ingo Molnar
2007-12-02 19:41     ` Arjan van de Ven
2007-12-02 20:08       ` Ingo Molnar
2007-12-02 20:09       ` Andi Kleen
2007-12-02 20:26         ` Ingo Molnar
2007-12-02 20:47           ` Andi Kleen
2007-12-02 21:10             ` Ingo Molnar
2007-12-02 21:19               ` Andi Kleen
2007-12-02 21:24                 ` Ingo Molnar
2007-12-02 21:34                   ` Andi Kleen
2007-12-02 22:25                     ` Ingo Molnar
2007-12-02 22:18                 ` Arjan van de Ven
2007-12-02 22:20                 ` Ingo Molnar
2007-12-03  0:00                   ` Andi Kleen
2007-12-02 22:43             ` Arjan van de Ven
2007-12-03  0:07               ` Andi Kleen
2007-12-03  0:59                 ` Arjan van de Ven
2007-12-03  9:55                   ` Andi Kleen
2007-12-03 10:15                     ` Radoslaw Szkodzinski
2007-12-03 10:23                       ` Ingo Molnar
2007-12-03 10:27                       ` Andi Kleen [this message]
2007-12-03 10:38                         ` Ingo Molnar
2007-12-03 11:04                           ` Andi Kleen
2007-12-03 11:59                             ` Ingo Molnar
2007-12-03 12:13                               ` Andi Kleen
2007-12-03 12:28                                 ` Ingo Molnar
2007-12-03 12:41                                   ` Andi Kleen
2007-12-03 13:00                                     ` Ingo Molnar
2007-12-03 13:14                                       ` Andi Kleen
     [not found]                                         ` <20071203132955.GA31354@elte.hu>
2007-12-03 13:41                                           ` Radoslaw Szkodzinski
2007-12-03 13:59                                             ` Ingo Molnar
2007-12-03 14:15                                               ` Andi Kleen
2007-12-03 13:48                                           ` Andi Kleen
2007-12-03 13:55                                             ` Ingo Molnar
2007-12-03 14:17                                               ` Andi Kleen
2007-12-03 14:33                                                 ` Ingo Molnar
2007-12-03 17:02                                                 ` Ray Lee
2007-12-03 13:50                                 ` Pekka Enberg
2007-12-03 13:57                                   ` Ingo Molnar
2007-12-03 14:14                                   ` Andi Kleen
2007-12-03 14:19                                     ` Ingo Molnar
2007-12-03 17:57                                       ` Andrew Morton
2007-12-03 18:28                                         ` Rafael J. Wysocki
2007-12-03 19:24                                           ` Ingo Molnar
2007-12-03 22:47                                             ` Rafael J. Wysocki
2007-12-04  0:05                                               ` Ingo Molnar
2007-12-03 15:23                         ` Arjan van de Ven
2007-12-03 16:36                           ` Andi Kleen
2007-12-05 22:31                           ` Mark Lord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071203102715.GC28560@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@astralstorm.puszkin.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).