From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752776AbXLIETz (ORCPT ); Sat, 8 Dec 2007 23:19:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751340AbXLIETp (ORCPT ); Sat, 8 Dec 2007 23:19:45 -0500 Received: from gherkin.frus.com ([192.158.254.49]:37990 "EHLO gherkin.frus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751246AbXLIETo (ORCPT ); Sat, 8 Dec 2007 23:19:44 -0500 Subject: Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha In-Reply-To: <475B3C23.6010304@orcon.net.nz> "from Michael Cree at Dec 9, 2007 01:51:47 pm" To: Michael Cree Date: Sat, 8 Dec 2007 22:19:39 -0600 (CST) CC: Kay Sievers , Bob Tracy , Andrew Morton , mingo@elte.hu, linux-kernel@vger.kernel.org, rjw@sisk.pl, rth@twiddle.net, ink@jurassic.park.msu.ru, linux-scsi@vger.kernel.org, greg@kroah.com X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Message-Id: <20071209041939.7DE6FDBA2@gherkin.frus.com> From: rct@frus.com (Bob Tracy) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michael Cree wrote: > Kay Sievers wrote: > > On Fri, 2007-12-07 at 23:05 -0600, Bob Tracy wrote: > >> Kay Sievers wrote: > >>> Is the udev daemon (still) running while it fails? > >> Yes, and there's something else I forgot to mention that may be > >> significant... For the bad case, in addition to udevd, "ps -ef" > >> shows a "sh -e /lib/udev/net.agent" running with a PPID of 1. This > >> process doesn't exit until I reboot. If this is normal under the > >> circumstances, please disregard. > > > > Does SysRq-T show where it hangs? > > Ummm... No. I didn't have the CONFIG_MAGIC_SYSRQ flag set, so I set it, > and recompiled the kernel. Guess what - now the system comes up > normally without any problem. The block devices appear in /dev. To > recap: without CONFIG_MAGIC_SYSRQ on the 2.6.24-rc3 kernel the missing > block devices error in /dev occurs and the init scripts fall over on > startup, and with CONFIG_MAGIC_SYSRQ the system comes up normally. I *do* have CONFIG_MAGIC_SYSRQ set. Anyone care to bet whether my machine starts working again if I disable it? Sheesh... The "kernel alignment issue" theory is making sense... We change the size of an initialized variable with the patch, and the problem shows up. We shift starting addresses a different way by tweaking kernel options, and two wrongs make a right? I've seen it happen, and tracking this down isn't going to be easy. Anyone want to wade through the different System.map files and hazard a guess where we're leaving the rails? Here's a very brief diff excerpt between the System.map files corresponding to "sysctl_check patch reverted" (the -dirty version) and "with sysctl_check patch". At least they agree up to line 10870 :-) ... --- /boot/System.map-2.6.24-rc2-g6f37ac79-dirty 2007-12-07 08:03:50.000000000 -0 600 +++ System.map 2007-12-07 13:43:37.000000000 -0600 @@ -10868,9414 +10868,9414 @@ fffffc0000684b00 R kallsyms_markers fffffc0000684d00 R kallsyms_token_table fffffc0000685100 R kallsyms_token_index -fffffc00006f61e0 r __pci_fixup_PCI_VENDOR_ID_SERVERWORKSPCI_DEVICE_ID_SERVERWORKS_CSB5IDEquirk_svwks_csb5ide -fffffc00006f61e0 R __start_pci_fixups_early -fffffc00006f61f0 r __pci_fixup_PCI_VENDOR_ID_INTELPCI_DEVICE_ID_INTEL_82801CA_10quirk_ide_samemode (...) -fffffc0000716120 r __param_bic_scale -fffffc0000716148 r __param_tcp_friendliness -fffffc0000716170 R __end_rodata -fffffc0000716170 R __stop___param +fffffc00006f61f0 r __pci_fixup_PCI_VENDOR_ID_SERVERWORKSPCI_DEVICE_ID_SERVERWORKS_CSB5IDEquirk_svwks_csb5ide +fffffc00006f61f0 R __start_pci_fixups_early +fffffc00006f6200 r __pci_fixup_PCI_VENDOR_ID_INTELPCI_DEVICE_ID_INTEL_82801CA_10quirk_ide_samemode (...) +fffffc0000716130 r __param_bic_scale +fffffc0000716158 r __param_tcp_friendliness +fffffc0000716180 R __end_rodata +fffffc0000716180 R __stop___param fffffc0000718000 A __init_begin fffffc0000718000 T _sinittext fffffc0000718000 t set_reset_devices > When running the broken kernel udev is running (according to 'ps') and > executing /sbin/udevtrigger manually generates a number of errors of the > form: > > scsi_id[]: scsi_id: unable to access '/block' > > The missing /dev/* entries do not appear. I don't get the errors that Michael is seeing, and udevtrigger seems to be exiting without errors (return code 0). The last part is the same: the missing /dev/* entries do not appear. --Bob T.