From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753233AbXC3MEn (ORCPT ); Fri, 30 Mar 2007 08:04:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752951AbXC3MEm (ORCPT ); Fri, 30 Mar 2007 08:04:42 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:51972 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751861AbXC3MEl (ORCPT ); Fri, 30 Mar 2007 08:04:41 -0400 Date: Fri, 30 Mar 2007 14:04:16 +0200 From: Ingo Molnar To: Adrian Bunk Cc: Linus Torvalds , Andrew Morton , linux-kernel@vger.kernel.org, Greg Kroah-Hartman Subject: [bug] hung bootup in various drivers, was: "2.6.21-rc5: known regressions" Message-ID: <20070330120416.GA19373@elte.hu> References: <20070327015929.GY16477@stusta.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070327015929.GY16477@stusta.de> User-Agent: Mutt/1.4.2.2i X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.0.3 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org i just found a new category of driver regressions in 2.6.21, doing allyesconfig bzImage bootup tests: the init methods of various drivers hangs in driver_unregister(). It is caused by this problem: the semantics of driver_unregister() [also implicitly called in pci_driver_unregister()] has apparently changed recently. If a driver does: pci_register_driver(&my_driver); ... if (some_failure) { pci_unregister_driver(&my_driver); ... } it will hang the bootup in the following piece of code: drivers/base/driver.c: void driver_unregister(struct device_driver * drv) { bus_remove_driver(drv); wait_for_completion(&drv->unloaded); the completion is never done - because nobody removes the bus while the init is still happening, obviously. (and bootup is serialized anyway) now, the majority of drivers does the driver unregistry from its module-cleanup function, so it's not affected by this problem. But if you apply the debug patch attached further below, and do an allyesconfig bzImage bootup, there's 3 hits already: BUG: at drivers/base/driver.c:187 driver_unregister() [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] driver_unregister+0x3d/0x43 [] pci_unregister_driver+0x10/0x5f [] slgt_init+0x9b/0x1ca [] init+0x15d/0x2bd [] kernel_thread_helper+0x7/0x10 BUG: at drivers/base/driver.c:187 driver_unregister() [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] driver_unregister+0x3d/0x43 [] pci_unregister_driver+0x10/0x5f [] init_ipmi_si+0x70a/0x738 [] init+0x15d/0x2bd [] kernel_thread_helper+0x7/0x10 BUG: at drivers/base/driver.c:187 driver_unregister() [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] driver_unregister+0x3d/0x43 [] pci_unregister_driver+0x10/0x5f [] tlan_probe+0x2dd/0x30e [] init+0x15d/0x2bd [] kernel_thread_helper+0x7/0x10 possibly more could trigger. Each of these 3 places caused an actual bootup hang on my testbox, so these are real regressions and need to be fixed. because there are a good number of drivers that do pci_unregister_device() from their init function, and because i cannot see anything obviously wrong in doing an unregister call after a failure, i think it's driver_unregister() that needs to be fixed. Greg, what do you think? Ingo Index: linux/drivers/base/driver.c =================================================================== --- linux.orig/drivers/base/driver.c +++ linux/drivers/base/driver.c @@ -183,7 +183,8 @@ int driver_register(struct device_driver void driver_unregister(struct device_driver * drv) { bus_remove_driver(drv); - wait_for_completion(&drv->unloaded); + if (!drv->unloaded.done) + WARN_ON(1); } /**