From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261437AbTEEWD3 (ORCPT ); Mon, 5 May 2003 18:03:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261444AbTEEWD3 (ORCPT ); Mon, 5 May 2003 18:03:29 -0400 Received: from vladimir.pegasys.ws ([64.220.160.58]:50956 "HELO vladimir.pegasys.ws") by vger.kernel.org with SMTP id S261437AbTEEWDR (ORCPT ); Mon, 5 May 2003 18:03:17 -0400 Date: Mon, 5 May 2003 15:12:38 -0700 From: jw schultz To: linux kernel mailing list Subject: Re: processes stuck in D state Message-ID: <20030505221238.GD14677@pegasys.ws> Mail-Followup-To: jw schultz , linux kernel mailing list References: <200305051826.01134.fsdeveloper@yahoo.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200305051826.01134.fsdeveloper@yahoo.de> User-Agent: Mutt/1.3.27i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 05, 2003 at 06:25:48PM +0200, Michael Buesch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Monday 05 May 2003 17:24, Mike Waychison wrote: > > On Mon, 5 May 2003, Michael Buesch wrote: > > > On Monday 05 May 2003 07:52, Zeev Fisher wrote: > > > > Hi! > > > > > > Hi Zeev! > > > > > > > I got a continuos problem of unkillable processes stuck in D state ( > > > > uninterruptable sleep ) on my Linux servers. > > > > It happens randomly every time on other server on another process ( all > > > > the servers are configured the same with 2.4.18-10 kernel ). Here's an > > > > example : > > > > > > [snip] > > > > > > > Has anyone noticed the same behavior ? Is this a well known problem ? > > > > > > I've had the same problem with some 2.4.21-preX twice (or maybe more > > > times, don't remember) on one of my machines. > > > IMHO it has something to do with NFS. (I'm using this box as a > > > NFS-client). I wish, I could reproduce it one more time, to do some > > > traces, etc on it. But I've not found a way to reproduce it, yet. > > > > This happens when you mount an NFS mount with the 'hard' option (default) > > and a mount's handle expires incorrectly (eg: server crash). > > Read the mount manpage for an explanation to the downsides of using > > the 'soft' option. > > > > > > Mike Waychison > > my fstab-entry: > 192.168.0.50:/mnt/nfs_1 /mnt/nfs_1 nfs rw,hard,intr,user,nodev,nosuid,exec 0 0 > > from man mount: > [snip] The process cannot be interrupted or killed unless you also specify intr. [/snip] > > I can't interrupt any process that accessed the NFS-server > while shutting down the server, although intr is specified. > _That's_ my problem. :) I had a similar problem with SuSE's 2.4.18. Random processes seemed to go into D state from whence intr is useless. I rebuilt the kernel with NFSv3 disabled and that problem went away. The logs are full of May 5 14:54:15 duncan kernel: NFS: NFSv3 not supported. May 5 14:54:15 duncan kernel: nfs warning: mount version older than kernel but that i can live with. Processes hung and umount failing i cannot abide. If there is a better answer, i'm listening. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt