From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263424AbUDGBqu (ORCPT ); Tue, 6 Apr 2004 21:46:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263429AbUDGBqu (ORCPT ); Tue, 6 Apr 2004 21:46:50 -0400 Received: from e33.co.us.ibm.com ([32.97.110.131]:2293 "EHLO e33.co.us.ibm.com") by vger.kernel.org with ESMTP id S263424AbUDGBqs (ORCPT ); Tue, 6 Apr 2004 21:46:48 -0400 Message-ID: <40735D7E.2090304@us.ibm.com> Date: Tue, 06 Apr 2004 20:46:38 -0500 From: Brian King User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Chris Wright CC: linux-kernel@vger.kernel.org Subject: Re: call_usermodehelper hang References: <4072F2B7.2070605@us.ibm.com> <20040406174150.D22989@build.pdx.osdl.net> In-Reply-To: <20040406174150.D22989@build.pdx.osdl.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Chris Wright wrote: > * Brian King (brking@us.ibm.com) wrote: > >>I have been running into some kernel hangs due to call_usermodehelper. Looking >>at the backtrace, it looks to me like there are deadlock issues with adding >>devices from work queues. Attached is a sample backtrace from one of the >>hangs I experienced. My question is why does call_usermodehelper do 2 different >>things depending on whether or not it is called from the kevent task? It appears >>that the simple way to fix the hang would be to never have call_usermodehelper >>use a work_queue since it must be called from process context anyway, or >>am I missing something? > > > It does two different things because it's trying to run from keventd. > In the case that current is not keventd, it schedules the work, so > keventd will pick that work up later to run in it's process context. > > How early is this hang? Pretty early. Boot time, loading scsi drivers. While initializing the third scsi adapter on the system a device shows up dynamically on the first adapter, the LLD schedules work to call scsi_add_device and we end up with the hang. It looks like init thread adds work and waits > for it's completion while holding a semaphore. It is never woken up by > keventd which is sleeping waiting for wakeup from semaphore that init > thread took. > > Seems troubling to hold the sem while calling call_usermodehelper, as that > could go off for a long time. I agree. There is another similar hang I ran into recently with the scsi core in that I was calling scsi_add_device from keventd, which ended up sleeping on the scan_mutex since the module load process was calling scsi_scan_host. scsi_scan_host was in the same kobject hotplug code, calling call_usermodehelper. I figured my LLD shouldn't be doing that, so I synchronized the events. Perhaps that was the wrong fix... I suppose I could have changed the LLD to always call scsi_scan_host/scsi_add_device/etc. from keventd... -Brian