* Re: Oops in 2.6.10-rc1 (almost solved)
@ 2004-11-13 3:45 Chuck Ebbert
2004-11-13 14:28 ` Matt Domsch
0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2004-11-13 3:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, Matt Domsch
On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote:
> > PS: do you have *any* idea how this could be related to the snd-es1371
> > driver (which is producing the oops then)?
>
> I bet it's overwriting some array, and just corrupting memory after it.
> For example, the edd_info[] array only has 6 entries,
That's almost certainly the problem. There can be up to 16 EDD devices
as of the Jun 30 update to the EDD code.
And sound_class is the next item after edd_info[] in my System.map...
--Chuck Ebbert 12-Nov-04 22:21:27
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 3:45 Oops in 2.6.10-rc1 (almost solved) Chuck Ebbert @ 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 0 siblings, 2 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-13 14:28 UTC (permalink / raw) To: Chuck Ebbert, Christian Kujau; +Cc: Linus Torvalds, linux-kernel On Fri, Nov 12, 2004 at 10:45:12PM -0500, Chuck Ebbert wrote: > On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote: > > > > PS: do you have *any* idea how this could be related to the snd-es1371 > > > driver (which is producing the oops then)? > > > > I bet it's overwriting some array, and just corrupting memory after it. > > For example, the edd_info[] array only has 6 entries, > > That's almost certainly the problem. There can be up to 16 EDD devices > as of the Jun 30 update to the EDD code. Bingo... edd_devices[] was too short. When we keep more than 6 signatures, it overruns the end. Also, I rewrote edd_num_devices to be clearer about its goal. This patch is necessary even after the last edd.S patch was reverted. It still doesn't explain why Christian's BIOS reports more devices than he has, that's still UI, so don't re-apply the edd.S patch just reverted. Signed-off-by: Matt Domsch -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== drivers/firmware/edd.c 1.30 vs edited ===== --- 1.30/drivers/firmware/edd.c 2004-06-29 09:44:48 -05:00 +++ edited/drivers/firmware/edd.c 2004-11-13 07:56:00 -06:00 @@ -70,7 +70,7 @@ static int edd_dev_is_type(struct edd_device *edev, const char *type); static struct pci_dev *edd_get_pci_dev(struct edd_device *edev); -static struct edd_device *edd_devices[EDDMAXNR]; +static struct edd_device *edd_devices[EDD_MBR_SIG_MAX]; #define EDD_DEVICE_ATTR(_name,_mode,_show,_test) \ struct edd_attribute edd_attr_##_name = { \ @@ -728,9 +728,9 @@ static inline int edd_num_devices(void) { - return min_t(unsigned char, - max_t(unsigned char, edd.edd_info_nr, edd.mbr_signature_nr), - max_t(unsigned char, EDD_MBR_SIG_MAX, EDDMAXNR)); + return max_t(unsigned char, + min_t(unsigned char, EDD_MBR_SIG_MAX, edd.mbr_signature_nr), + min_t(unsigned char, EDDMAXNR, edd.edd_info_nr)); } /** ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 14:28 ` Matt Domsch @ 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 1 sibling, 0 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-13 18:55 UTC (permalink / raw) To: Chuck Ebbert, Christian Kujau; +Cc: Linus Torvalds, linux-kernel On Sat, Nov 13, 2004 at 08:28:35AM -0600, Matt Domsch wrote: > On Fri, Nov 12, 2004 at 10:45:12PM -0500, Chuck Ebbert wrote: > > On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote: > > > > > > PS: do you have *any* idea how this could be related to the snd-es1371 > > > > driver (which is producing the oops then)? > > > > > > I bet it's overwriting some array, and just corrupting memory after it. > > > For example, the edd_info[] array only has 6 entries, > > > > That's almost certainly the problem. There can be up to 16 EDD devices > > as of the Jun 30 update to the EDD code. > > Bingo... edd_devices[] was too short. When we keep more > than 6 signatures, it overruns the end. In particular, depending on your .config, with EDD=y it overwrites 40 bytes past the end of edd_devices (here I've already extended it by the necessary amount, but the 40 bytes past its end are all subject to be overwritten): c043a880 b edd_devices c043a8c0 b pci_bios_present c043a8c4 B pci_mmcfg_base_addr c043a8c8 b mmcfg_last_accessed_device c043a8cc b called.0 c043a8d0 B pcibios_enable_irq c043a8d4 b eisa_irq_mask.0 c043a8d8 b broken_hp_bios_irq9 c043a8dc b acer_tm360_irqrouting c043a8e0 b pirq_table c043a8e4 b pirq_router hence the failure Christian saw and attributed to the sound drivers: EIP is at 0xc15d5820 eax: 00000000 ebx: dff20400 ecx: c15d5820 edx: dff205c4 esi: ffffffed edi: dff20400 ebp: dff20400 esp: c17a3e58 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 178, threadinfo=c17a2000 task=dfcf05a0) Stack: c01fa5c8 dff20400 000007ff dff20400 c01fa5ff dff20400 000007ff c15ea400 e082729d dff20400 c15ea400 00000000 e08469df c15ea400 000001f8 000000d0 000000d0 df45ed14 00000000 c018e14e c15ea400 ffffffed dff20400 dff20400 Call Trace: [<c01fa5c8>] pci_enable_device_bars+0x28/0x40 [<c01fa5ff>] pci_enable_device+0x1f/0x40 [<e082729d>] snd_ensoniq_create+0x1d/0x480 [snd_ens1371] [<e08469df>] snd_card_new+0x1cf/0x2c0 [snd] [<c018e14e>] sysfs_new_dirent+0x2e/0x90 [<e0827867>] snd_audiopci_probe+0x87/0x1e0 [snd_ens1371] [<c01fb012>] pci_device_probe_static+0x52/0x70 [<c01fb05c>] __pci_device_probe+0x2c/0x30 [<c01fb08c>] pci_device_probe+0x2c/0x60 [<c0258f4f>] driver_probe_device+0x2f/0x80 [<c02590b2>] driver_attach+0x52/0xa0 [<c02595f8>] bus_add_driver+0x98/0xe0 [<c0259c5f>] driver_register+0x2f/0x40 [<c01fb340>] pci_register_driver+0x40/0x50 [<e08279cf>] alsa_card_ens137x_init+0xf/0x13 [snd_ens1371] [<c0134279>] sys_init_module+0x169/0x240 [<c01041eb>] syscall_call+0x7/0xb With CONFIG_EDD=m, there just wasn't anything interesting in memory following edd_devices[] (thanks module loader for using whole pages I believe). -Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch @ 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-14 2:58 UTC (permalink / raw) To: Christian Kujau; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert On Sat, Nov 13, 2004 at 08:28:35AM -0600, Matt Domsch wrote: > It still doesn't explain why Christian's BIOS reports more devices > than he has, that's still UI, so don't re-apply the edd.S patch just reverted. Alexander van Heukelum noted to me that addw here modifies CF, so I think something like should fix that. Christian, if you're in a position to test this, I'd really appreciate it. You've been a fantastic bug reporter / tester! Not ready for Linus yet, and you'll need to re-apply the previous edd.S patch which is now reverted in Linus's tree. As your BIOS reports via CHECK EXTENSIONS PRESENT that you've got more devices than you actually have, hopefully the int13 EXTENDED READ won't succeed for non-existant devices anymore, and then neither will the READ SECTORS call. -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== arch/i386/boot/edd.S 1.3 vs edited ===== --- 1.3/arch/i386/boot/edd.S 2004-10-20 03:37:11 -05:00 +++ edited/arch/i386/boot/edd.S 2004-11-13 20:31:58 -06:00 @@ -58,8 +58,12 @@ sti # work around buggy BIOSes popw %dx popw %si - addw $EDD_DEV_ADDR_PACKET_LEN, %sp # remove packet from stack - jnc edd_mbr_store_sig + pushfl # save EFLAGS into ebx + popl %ebx # because addw modifies CF + addw $EDD_DEV_ADDR_PACKET_LEN, %sp # remove packet from stack + pushl %ebx # get back right CF + popfl + jnc edd_mbr_store_sig # otherwise, fall through to the legacy read function edd_mbr_read_sectors: ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch @ 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2 siblings, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2004-11-14 4:43 UTC (permalink / raw) To: Matt Domsch; +Cc: Christian Kujau, linux-kernel, Chuck Ebbert On Sat, 13 Nov 2004, Matt Domsch wrote: > > Not ready for Linus yet Indeed. Please don't use pushfl/popfl to save the carry flag. There are tons of better ways. For example, use "lea" instead of "add" to not write the flags (and add a comment). Or save the carry flag in a register with sbb %bx,%bx ant test %bx later. Or any of a million other _standard_ ways to handle this problem. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds @ 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2 siblings, 0 replies; 21+ messages in thread From: Christian @ 2004-11-14 11:45 UTC (permalink / raw) To: Matt Domsch; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert Matt Domsch wrote: > > Alexander van Heukelum noted to me that addw here modifies CF, so I > think something like should fix that. Christian, if you're in a > position to test this, I'd really appreciate it. You've been a yes, i'll do so. right now i am off (and late) to sth. else, but i'll test this in the evening. thank you, Christian. -- BOFH excuse #318: Your EMAIL is now being delivered by the USPS. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian @ 2004-11-14 20:02 ` Christian Kujau 2004-11-14 21:55 ` Matt Domsch 2 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-14 20:02 UTC (permalink / raw) To: Matt Domsch; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 sorry, took me a bit longer to get to the testing. Matt Domsch schrieb: > > Not ready for Linus yet, and you'll need to re-apply the previous > edd.S patch which is now reverted in Linus's tree. As your BIOS i've applied the patch to a pristine 2.6.10-rc1, so the (currently reverted) EDD change is still there. tell me, if the patch had to be applied to sth. else. but for now i have to say, that it still oopses: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1_edd-2.txt ... BIOS EDD facility v0.16 2004-Jun-25, 16 devices found ... (oh, i've added an ide-disk yesterday, so hde will show up in dmesg.) sorry, Christian. - -- BOFH excuse #401: Sales staff sold a product we don't offer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBl7nZ+A7rjkF8z0wRAvuHAKCX8TWiDt5DP25OqBEWKecfM6x3HwCeNRoM 1IzHqKpcbWOABXWJ4vC4d1w= =FiKX -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 20:02 ` Christian Kujau @ 2004-11-14 21:55 ` Matt Domsch 2004-11-15 12:41 ` Oops in 2.6.10-rc1 (solved) Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Matt Domsch @ 2004-11-14 21:55 UTC (permalink / raw) To: Christian Kujau; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert On Sun, Nov 14, 2004 at 09:02:33PM +0100, Christian Kujau wrote: > > Not ready for Linus yet, and you'll need to re-apply the previous > > edd.S patch which is now reverted in Linus's tree. As your BIOS > > i've applied the patch to a pristine 2.6.10-rc1, so the (currently > reverted) EDD change is still there. tell me, if the patch had to be > applied to sth. else. > > but for now i have to say, that it still oopses: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1_edd-2.txt OK, the patch below (which Linus applied to his tree yesterday) should fix the oopses. > BIOS EDD facility v0.16 2004-Jun-25, 16 devices found but the patch to edd.S doesn't resolve that EDD believes you've got 16 devices (I would expect it to report 2, as you have only 2 disks). Thanks for the quick testing. Back to the drawing board though for this second part. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== drivers/firmware/edd.c 1.30 vs edited ===== --- 1.30/drivers/firmware/edd.c 2004-06-29 09:44:48 -05:00 +++ edited/drivers/firmware/edd.c 2004-11-13 07:56:00 -06:00 @@ -70,7 +70,7 @@ static int edd_dev_is_type(struct edd_device *edev, const char *type); static struct pci_dev *edd_get_pci_dev(struct edd_device *edev); -static struct edd_device *edd_devices[EDDMAXNR]; +static struct edd_device *edd_devices[EDD_MBR_SIG_MAX]; #define EDD_DEVICE_ATTR(_name,_mode,_show,_test) \ struct edd_attribute edd_attr_##_name = { \ @@ -728,9 +728,9 @@ static inline int edd_num_devices(void) { - return min_t(unsigned char, - max_t(unsigned char, edd.edd_info_nr, edd.mbr_signature_nr), - max_t(unsigned char, EDD_MBR_SIG_MAX, EDDMAXNR)); + return max_t(unsigned char, + min_t(unsigned char, EDD_MBR_SIG_MAX, edd.mbr_signature_nr), + min_t(unsigned char, EDDMAXNR, edd.edd_info_nr)); } /** ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (solved) 2004-11-14 21:55 ` Matt Domsch @ 2004-11-15 12:41 ` Christian Kujau 0 siblings, 0 replies; 21+ messages in thread From: Christian Kujau @ 2004-11-15 12:41 UTC (permalink / raw) To: linux-kernel; +Cc: Matt Domsch, Linus Torvalds, Chuck Ebbert -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matt Domsch schrieb: > > OK, the patch below (which Linus applied to his tree yesterday) should > fix the oopses. > so i've compiled a pristine 2.6.10-rc1-bk24 as your patch should be included there (i've tried to apply your patch with --dry-run -> it did not succeed, -R *would* have been successful) and finally it works! http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1-bk24.txt http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1-bk24 snd_ens1371 is working fine, no oops, i can load/unload the drivers, no problems ;-) > >>BIOS EDD facility v0.16 2004-Jun-25, 16 devices found > > but the patch to edd.S doesn't resolve that EDD believes you've got 16 > devices (I would expect it to report 2, as you have only 2 disks). but still: BIOS EDD facility v0.16 2004-Jun-25, 6 devices found i have 2 disks now (1 ide, 1 scsi), 2 cdrom drives (ide). as you can see from the dmesg, i have an additional ide-controller onboard: PDC20265: chipset revision 2 PDC20265: ROM enabled at 0xdffe0000 PDC20265: 100% native mode on irq 10 PDC20265: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode. ide2: BM-DMA at 0xb400-0xb407, BIOS settings: hde:DMA, hdf:DMA ide3: BM-DMA at 0xb408-0xb40f, BIOS settings: hdg:pio, hdh:pio Probing IDE interface ide2... hde: ST320413A, ATA DISK drive ide2 at 0xbc00-0xbc07,0xb802 on irq 10 Probing IDE interface ide3... Probing IDE interface ide1... Probing IDE interface ide3... Probing IDE interface ide4... ide4: Wait for ready failed before probe ! Probing IDE interface ide5... ide5: Wait for ready failed before probe ! but there are only 4 ide channels on my board (Gigabyte GA7ZXR): ide0 - with hda+hdb connected (2x cdrom) ide1 - none ide2 - with hde connected (ST320413A) ide3 - none so it's probing for a non-existent ide4+ide5! but it did that even in -bk4 times, so it's not "new behaviour", i guess. http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9-bk4.txt anyway, it's working now, the oops is gone, but i can do further testing regarding this EDD issue of course. Thanks to all involved, Christian. - -- BOFH excuse #195: We only support a 28000 bps connection. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBmKP4+A7rjkF8z0wRAooxAJ9dD5QEXsEPUJjlBNvtfhtPteGoNwCfdfCA tsYq86N5Y/bpegSXYWS+nkw= =kFOh -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Oops in 2.6.10-rc1 @ 2004-10-28 13:12 Christian 2004-11-07 16:57 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Christian @ 2004-10-28 13:12 UTC (permalink / raw) To: alsa-devel [repost to alsa-devel as suggested by lkml] hi, yesterday i was updating to recent 2.6.10-rc1-BK and booting gives: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: dfc10ce0 *pde = 00000000 Oops: 0000 [#1] PREEMPT Modules linked in: snd_ens1371 snd_rawmidi snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc rtc CPU: 0 EIP: 0060:[<dfc10ce0>] Not tainted VLI EFLAGS: 00010282 (2.6.10-rc1) EIP is at 0xdfc10ce0 eax: 00000000 ebx: dff1f800 ecx: dfc10ce0 edx: dff1f9c4 esi: ffffffed edi: dff1f800 ebp: dff1f800 esp: de613e50 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 186, threadinfo=de612000 task=deb5e5a0) Stack: c01fc7b8 dff1f800 000007ff dff1f800 c01fc7ef dff1f800 000007ff dfc1e400 e082729d dff1f800 dfc1e400 00000000 e08469cf dfc1e400 000001f8 000000d0 c01667f7 de36da8c c0171759 dffe79e0 dfc1e400 ffffffed dff1f800 dff1f800 Call Trace: [<c01fc7b8>] pci_enable_device_bars+0x28/0x40 [<c01fc7ef>] pci_enable_device+0x1f/0x40 [<e082729d>] snd_ensoniq_create+0x1d/0x480 [snd_ens1371] [<e08469cf>] snd_card_new+0x1cf/0x2c0 [snd] [<c01667f7>] __lookup_hash+0xa7/0xe0 [<c0171759>] alloc_inode+0x129/0x150 [<e0827867>] snd_audiopci_probe+0x87/0x1e0 [snd_ens1371] [<c016f6c2>] dput+0x92/0x250 [<c01fd202>] pci_device_probe_static+0x52/0x70 [<c01fd24c>] __pci_device_probe+0x2c/0x30 [<c01fd27c>] pci_device_probe+0x2c/0x60 [<c025adff>] bus_match+0x3f/0x80 [<c025af52>] driver_attach+0x52/0xa0 [<c025b478>] bus_add_driver+0x98/0xe0 [<c025ba8f>] driver_register+0x2f/0x40 [<c01fd530>] pci_register_driver+0x40/0x50 [<e08279cf>] alsa_card_ens137x_init+0xf/0x13 [snd_ens1371] [<c01341ba>] sys_init_module+0x18a/0x270 [<c01041fb>] syscall_call+0x7/0xb Code: 5f 64 65 76 38 62 00 00 00 00 00 00 00 00 00 02 00 00 00 88 0c c1 df 08 0d c1 df 10 fa 3a c0 00 fa 3a c0 00 00 00 00 6c 5a c1 df <0a> 00 00 00 36 46 37 46 00 00 00 00 f0 0c c1 df 69 6e 74 31 33 full dmesg output here: www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg.txt updating to an even more recent (read: updated now) does not help and the problem is really triggered when loading snd_ens1371. well, the only "problem" is the oops and i have no sound :-( just strange that nobody else cries out loud. or am i just lacking enough information? ok, this is debian/unstable (i386), gcc3.4.2, libc2.3.2, pls tell me if you need more information. thank you, Christian. -- BOFH excuse #374: It's the InterNIC's fault. ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 @ 2004-11-07 16:57 ` Linus Torvalds 2004-11-07 18:31 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-07 16:57 UTC (permalink / raw) To: Christian Kujau; +Cc: linux-kernel, alsa-devel, perex On Sun, 7 Nov 2004, Christian Kujau wrote: > > since i got this oops between 2.6.9 and 2.6.10-rc1 i am still assuming > that the change was made somewere between 15-Oct-2004 (2.6.9) and > 22-Oct-2004 (2.6.10-rc1). Not necessarily. The ALSA merge is the most likely reason for the oops, and since ALSA development does not merge with the kernel very often, it may be some much older change in the ALSA tree. You can check the ALSA tree _before_ the merge, by doing (in the current tree): bk undo -a1.2000.7.2 which should give you a tree without any of "my" stuff, ie it was what Jaroslav was working on before he merged it into the standard tree. (BK revision numbers change on merges, so the above number is not necessarily the right one unless you have the current -bk tree. It should have a changeset something like: ChangeSet@1.2000.7.2, 2004-10-20 20:51:33+02:00, perex@suse.cz Merge suse.cz:/home/perex/bk/linux-sound/linux-sound into suse.cz:/home/perex/bk/linux-sound/work so that you can double-check). > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops.txt Yup, it's a call through a bad pointer again, and again the EIP value can be found in %ecx. But the source of the bug is not clear. The stack trace implies "show_stack()", but that function doesn't do any indirect calls, so I suspect the frame pointer didn't help in this case. And it's not "pci_enable_device()" either (which was there last time too), since that one calls "pci_enable_device_bars()" at the point it shows in the stack trace. Quite frankly, it looks like something smashed the stack, and the fact that it happens _around_ when "pci_enable_device()" was called makes me seriously suspect the IRQ handler for the device. That's when IRQ routing is enabled, so often the interrupts start at that point. And since FRAME_POINTER didn't make the stack frame look sane, it's very possible that the bogus call isn't due to a real "call", but due to a return from a broken stack. > there was an answer from the alsa-devel folks here: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109897024116288&w=2 > > "It's a bit dead-lock, because we cannot help you. It seems that > the pci structure passed to our code is broken. The driver has had > no changes in initialization for a long time." I seriously doubt that it's the PCI structure being broken. It's the ALSA merge, almost certainly - it's just that the stack is so confused that it's hard to tell where the bug has happened. And I'll double-check the "regparm" changes, just in case. They change some irq calling conventions, although none of the involved stuff seems to be implied here. A quick suggestion: make sure that there is not some stale object file lying around confusing things about memory layout, and do a "make clean" and make sure that all old modules are clean too and re-installed. The kernel dependencies should be correct, but even then there can be problems with clocks that are off a bit etc. > (still wondering why nobody else has this bug, 1370 is not *that* weird, i > thought) Yes, that makes me suspicious, and is one reason why I wonder if it's just your tree not being built right. > PS: if someone could explain me, why the ChangeSet numbers are always > different: i've used "bk revtool sound/pci/ens1370.c" to find out the > changes for this file and the suspicious patch reads > > sound/pci/ens1370.c@1.54.1.1, 2004-10-20.... > > in "bk revtool". the changelog however reads: > > ChangeSet@1.2011, 2004-10-20 08:10:43-07:00, rusty@rustcorp.com.au There are different revision numbers: there's the revision number for the _file_, and there is the revision number for the _change_. Also, both (or one) of them can change when a merge occurs, since other people may have had different merge histories, and in a distributed environment the revision numbers are a lot more fluid than in CVS. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-07 16:57 ` Linus Torvalds @ 2004-11-07 18:31 ` Christian Kujau 2004-11-07 23:45 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-07 18:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, alsa-devel, perex On Sun, 7 Nov 2004 08:57:40 -0800 (PST), Linus Torvalds wrote > > You can check the ALSA tree _before_ the merge, by doing (in > the current tree): > > bk undo -a1.2000.7.2 > > which should give you a tree without any of "my" stuff, ie it > was what Jaroslav was working on before he merged it into the > standard tree. yes, i already did so, i think: http://marc.theaimsgroup.com/?l=linux-kernel&m=109979092216919&w=2 but i did it this way: bk clone -r1.2000.7.1 linux-2.6-BK linux-2.6-BK-test bk undo -a1.2010 (probably wrong, so i'll repeat it as you suggeseted) > (BK revision numbers change on merges, so the above number is > not necessarily the right one unless you have the current -bk aha! > A quick suggestion: make sure that there is not some stale > object file lying around confusing things about memory layout, > and do a "make clean" and make sure that all old modules are > clean too and re-installed. really: i always do "make clean", even "make mrproper" sometimes, just to be sure. and i am quite certain, that i did not forget to install the modules. but i'll keep my eyes open, yes. > The kernel dependencies should be correct, but even then there can be > problems with clocks that are off a bit etc. i'm updating via "ntpdate" on every boot. i am even using a (faster) 2nd machine for my build and the bk things right now: building a current -bk on boths hosts gives me this error. > Yes, that makes me suspicious, and is one reason why I wonder > if it's just your tree not being built right. i'll build a -bk snapshot from a tar.bz2 later on and see what it gives. > There are different revision numbers: there's the revision > number for the _file_, and there is the revision number for > the _change_. aha. it was kinda confusing...now i got it, i think ;) again: thank you for your time on this rainy weekend, Christian. -- BOFH excuse #8: static buildup ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-07 18:31 ` Christian Kujau @ 2004-11-07 23:45 ` Christian Kujau 2004-11-08 1:16 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-07 23:45 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, alsa-devel, linux-sound -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christian Kujau schrieb: > On Sun, 7 Nov 2004 08:57:40 -0800 (PST), Linus Torvalds wrote > >> bk undo -a1.2000.7.2 >> >>which should give you a tree without any of "my" stuff, ie it >>was what Jaroslav was working on before he merged it into the >>standard tree. i did so from a current tree (bk pull, undo, -r get) and it's working fine (url wraps): http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-no-oops-2.6.9_a1.2000.7.2.txt so i can see with "bk changes" that the ChangeSet is still there. this is what i expected, because -a says: - -a<rev> Remove all changesets which occurred after <rev>. what i did not expect is that this ChangeSet is now *not* the culprit, because there is no oops. am i right? [1] >>Yes, that makes me suspicious, and is one reason why I wonder >>if it's just your tree not being built right. > > i'll build a -bk snapshot from a tar.bz2 later on and see what it gives. i've build from linux-2.6.10-rc1.tar.bz2 with patch-2.6.10-rc1-bk17.bz2 from kernel.org with the same .config and "modprobe snd-ens1371" oopses as expected :( > Hmm.. That may well have worked fine, but it sounds in that post like > you tried to undo the ALSA stuff, and what I suggested was really to > do the reverse: take _only_ the ALSA changes, and then if it still yes, i wanted to undo the alsa changes because i suspected the alsa framework (sorry guys) and wanted to see if it still oopses when the latest alsa patch was not appied. i did another thing: i enabled the (deprecated) OSS driver (es1371.ko) tried to load this thing: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-OSS.txt it oopses. - - you said it's not a b0rken pci thingy - - i have to assume now that it's not an ALSA issue (since oss oopses too) - - it is OSS? the driver? i've CC'ed linux-sound... > fails, at least you have now pinpointed it a bit more (admittedly to > the _likely_ source, but that's as it should be: you narrow down the > "known bad" source base until you've narrowed it down to the smallest > change you can find that causes the problem). yes, like Documentation/BUG-HUNTING says. but i seem to have difficulties in using my tools (bk). sorry for that. > Sounds like you're doing everything right, but hey, it can't hurt to > double-check. yes, i really hope that it's not just a user error (on my side). building kernels since 2.0...but you never know... thanks again for help, Christian (whose only wish these days is to get over this strange thing and not wasting peoples precious time with a "sound driver". hey, at least the box is booting...) - -- BOFH excuse #224: Jan 9 16:41:27 huber su: 'su root' succeeded for .... on /dev/pts/1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBjrOp+A7rjkF8z0wRAl59AKCEbRRzsGujcOlLUA74taFZJb8H0ACfUUxQ nVQHjBXRBBn9BgSs7cLhTlY= =wb90 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-07 23:45 ` Christian Kujau @ 2004-11-08 1:16 ` Linus Torvalds 2004-11-08 13:01 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-08 1:16 UTC (permalink / raw) To: Christian Kujau; +Cc: Kernel Mailing List, alsa-devel, linux-sound, Greg KH On Mon, 8 Nov 2004, Christian Kujau wrote: > > what i did not expect is that this ChangeSet is now *not* the culprit, > because there is no oops. am i right? [1] Yes. So now I'd like to know _where_ the culprit is, since it turned out to be not the ALSA code. > i did another thing: i enabled the (deprecated) OSS driver (es1371.ko) > tried to load this thing: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-OSS.txt > > it oopses. > - you said it's not a b0rken pci thingy > - i have to assume now that it's not an ALSA issue (since oss oopses too) > - it is OSS? the driver? i've CC'ed linux-sound... Sounds like something else changed, and likely the ALSA _and_ the OSS driver both broke. Which is not all that unlikely, since I suspect they share a lot of history. > yes, like Documentation/BUG-HUNTING says. but i seem to have difficulties > in using my tools (bk). sorry for that. Not your fault. Think of this as a learning experience ;) Anyway, now that the _other_ driver also oopses, and with a very similar oops too, so it looks like they both depended on some undocumented (or changed) detail in the PCI layer. Next step would be to see if the thing that breaks is this merge: ChangeSet@1.2463, 2004-11-04 17:07:16-08:00, torvalds@ppc970.osdl.org Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux which merges Greg's PCI/driver model changes. It's all the same steps you took with the ALSA merge, you're a professional by now ;) Greg, have you followed this thread? > (whose only wish these days is to get over this strange thing and not > wasting peoples precious time with a "sound driver". hey, at least the > box is booting...) Hey, sound is important. And especially if you somehow found something non-sound that just broke sound by mistake, all the more important to fix it. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-08 1:16 ` Linus Torvalds @ 2004-11-08 13:01 ` Christian Kujau 2004-11-08 18:13 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-08 13:01 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, alsa-devel, linux-sound, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Linus Torvalds schrieb: > > Not your fault. Think of this as a learning experience ;) it definitely is, yes. > Anyway, now that the _other_ driver also oopses, and with a very similar > oops too, so it looks like they both depended on some undocumented (or > changed) detail in the PCI layer. Next step would be to see if the thing > that breaks is this merge: may i ask how you come to this conclusion? by technical knowledge or could this be deduced by some bk magic too? > > ChangeSet@1.2463, 2004-11-04 17:07:16-08:00, torvalds@ppc970.osdl.org > Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 > into ppc970.osdl.org:/home/torvalds/v2.6/linux > > which merges Greg's PCI/driver model changes. > > It's all the same steps you took with the ALSA merge, you're a > professional by now ;) i did "bk undo -a1.2463" from a current -BK tree and it oopses: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-a1.2463.txt (i've booted with different boot options this time, because i noticed that i always booted with "acpi=force". changing this did not help either.) next i wanted to do "bk undo -r1.2463" now to see if it does *not* break without this ChangeSet (because i already know it *breaks* with this ChangeSet) but that would leave some parentless child deltas. i read in the BK docs that "bk cset -x<version>" would help here. but "bk cset - -x1.2463" aborts: - --------------------- evil@atlant:~/kernel/linux-2.6-BK$ bk changes | head -n3 ChangeSet@1.2463, 2004-11-04 17:07:16-08:00, torvalds@ppc970.osdl.org Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux evil@atlant:~/kernel/linux-2.6-BK$ bk cset -x1.2463 cset: Merge cset found in revision list: (1.2463). Aborting. (cset1) - --------------------- i've put everthing on http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/ the .configs, the oopses are there. i've double checked a kernel built from "bk -a a1.2000.7.2" yesterday but the result was the same (no oops) thank you, Christian. - -- BOFH excuse #121: halon system went off and killed the operators. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBj24z+A7rjkF8z0wRAu0tAJ9g7mfG0iz/LvSAafD7LWKNu9qvLQCg3fjW 1oMRRK8oSqH5oZsudyIQVtw= =f8CQ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-08 13:01 ` Christian Kujau @ 2004-11-08 18:13 ` Linus Torvalds 2004-11-08 20:59 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-08 18:13 UTC (permalink / raw) To: Christian Kujau; +Cc: Kernel Mailing List, alsa-devel, linux-sound, Greg KH On Mon, 8 Nov 2004, Christian Kujau wrote: > > > Anyway, now that the _other_ driver also oopses, and with a very similar > > oops too, so it looks like they both depended on some undocumented (or > > changed) detail in the PCI layer. Next step would be to see if the thing > > that breaks is this merge: > > may i ask how you come to this conclusion? by technical knowledge or could > this be deduced by some bk magic too? No, just gut feel. If the pre-merge ALSA works, and the post-merge one doesn't, and the oops in both cases happen somewhere close to where it does "pci_enable_device()", there's not a lot left. There are interrupts, and there is the PCI layer... > > ChangeSet@1.2463, 2004-11-04 17:07:16-08:00, torvalds@ppc970.osdl.org > > Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 > > into ppc970.osdl.org:/home/torvalds/v2.6/linux > > > > which merges Greg's PCI/driver model changes. > > > > It's all the same steps you took with the ALSA merge, you're a > > professional by now ;) > > i did "bk undo -a1.2463" from a current -BK tree and it oopses: Note that "bk undo -axxx" will _leave_ xxx in place, and undo everything after. So what you did still has the merge in the tree, and that it still oopses is thus to be expected. BUT, we're getting closer. > next i wanted to do "bk undo -r1.2463" now to see if it does *not* break > without this ChangeSet (because i already know it *breaks* with this > ChangeSet) but that would leave some parentless child deltas. i read in > the BK docs that "bk cset -x<version>" would help here. but "bk cset > - -x1.2463" aborts: "cset -x" only works on patches, not on complex operations. You still want "bk undo", but you want to use "bk revtool" to see what the merge point was, and tell _which_ of the merged top-of-trees you want to get to. In other words, you can't just undo a merge, you need to tell which _way_ to undo it. See? It does actually make sense, and "bk revtool" will show you the relationships of merges (at least if the time range is big enough to show enough info). Anyway, if you have the top-of-tree-is-1.2463, then go to "bk revtool", and select that node in the graph by clicking on it. Notice how those edges turned white, and you can now easily see which children were pre-merge. In this case, the top-of-tree tree _without_ the PCI merge is 1.2642: ChangeSet@1.2462, 2004-11-04 17:06:13-08:00, torvalds@ppc970.osdl.org Merge bk://kernel.bkbits.net/gregkh/linux/usb-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux (you won't see it in "bk changes", since it's a trivial merge: use "bk changes -a" to see it). So just before I merged Greg's PCI changes, I merged his USB changes. Now, that's fine - the USB merge is likely to be ok, so try doing bk undo -a1.2462 and you will now have a tree that is exactly the same as before, except it does _not_ have the PCI merge from Greg. And if this one does not oops, you can now officially blame Greg. Now, if you want to get _really_ fancy, you can now look at each changeset that differed, with something like bk set -n -d -r1.2462 -r1.2463 | bk -R prs -h -d'<:P:@:HOST:>\n$each(:C:){\t(:C:)\n}\n' - which is black magic that does a set operation and shows all the changes in between the sets of "bk at 1.2462" and "bk at 1.2463". (This is _not_ the same as "bk changes -r1.2462..1.2463", because that one just shows the single merge change that is on the direct _path_ from one changeset to another. The black magic thing shows the set difference of changesets that comes from the full graph at two points). Then you can look at each change individually and see if they matter. And once you can do the set operations, you're officially a BK poweruser. Me, I just have a script, I'm a BK dabbler. Looking at the list (appended), I don't see anything obvious, but hey, if it was obvious it wouldn't have been merged in the first place. Thanks for your willingness to pursue this thing, Linus ----- <maneesh@in.ibm.com> [PATCH] sysfs: fix sysfs backing store error path confusion o sysfs_new_dirent to retrun 0 if kmalloc fails. Thanks to Milton Miller for spotting this. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <bunk@stusta.de> [PATCH] small sysfs cleanups The patch below does the following cleanups for the sysfs code: - remove the unused global function sysfs_mknod - make some structs and functions static Please check whether this patch is correct, or whether some of the things I made static should be used globally in the forseeable future. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <kay.sievers@vrfy.org> [PATCH] add the physical device and the bus to the hotplug environment Add the sysfs path of the physical device to the hotplug event of class and block devices. This should solve the userspace issue not to know if the device is a virtual one and the "device" symlink will never be created, but we sit there and wait for it to show up not knowing when we should give up. Also the bus name is added to the hotplug event, so we don't need to reverse lookup in the /sys/bus/* directory which bus our physical device belongs to. This is e.g. the value matched against the BUS= key, that may be used in an udev rule. This is a PCI network card: ACTION=add SUBSYSTEM=net DEVPATH=/class/net/eth0 PHYSDEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:02:01.0 PHYSDEVBUS=pci INTERFACE=eth0 SEQNUM=827 PATH=/sbin:/bin:/usr/sbin:/usr/bin HOME=/ This is a IDE CDROM: ACTION=add SUBSYSTEM=block DEVPATH=/block/hdc PHYSDEVPATH=/devices/pci0000:00/0000:00:1f.1/ide1/1.0 PHYSDEVBUS=ide SEQNUM=1017 PATH=/sbin:/bin:/usr/sbin:/usr/bin HOME=/ This is an USB-stick partition: ACTION=add SUBSYSTEM=block DEVPATH=/block/sda/sda1 PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/host1/target1:0:0/1:0:0:0 PHYSDEVBUS=scsi SEQNUM=1032 PATH=/sbin:/bin:/usr/sbin:/usr/bin HOME=/ Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <tj@home-tj.org> [PATCH] driver-model: comment fix in bus.c df_01_driver_attach_comment_fix.patch bus_match() was renamed to driver_probe_device() but the comment for device_attach() wasn't updated. This patch updates it. Signed-off-by: Tejun Heo <tj@home-tj.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <tj@home-tj.org> [PATCH] driver-model: bus_recan_devices() locking fix df_02_bus_rescan_devcies_fix.patch bus_rescan_devices() eventually calls device_attach() and thus requires write locking the corresponding bus. The original code just called bus_for_each_dev() which only read locks the bus. This patch separates __bus_for_each_dev() and __bus_for_each_drv(), which don't do locking themselves, out from the original functions and call them with read lock in the original functions and with write lock in bus_rescan_devices(). Signed-off-by: Tejun Heo <tj@home-tj.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <tj@home-tj.org> [PATCH] driver-model: sysfs_release() dangling pointer reference fix df_03_sysfs_release_fix.patch Some attributes are allocated dynamically (e.g. module and device parameters) and are usually deallocated when the assoicated kobject is released. So, it's not safe to access attr after putting the kobject. Signed-off-by: Tejun Heo <tj@home-tj.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <tj@home-tj.org> [PATCH] driver-model: kobject_add() error path reference counting fix df_04_kobject_add_ref_fix.patch In kobject_add(), @kobj wasn't put'd properly on error path. This patch fixes it. Signed-off-by: Tejun Heo <tj@home-tj.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <tj@home-tj.org> [PATCH] driver-model: device_add() error path reference counting fix df_05_device_add_ref_fix.patch In device_add(), @dev wan't put'd properly when it has zero length bus_id (error path). Fixed. Signed-off-by: Tejun Heo <tj@home-tj.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <greg@kroah.com> kevent: fix build error if CONFIG_KOBJECT_UEVENT is not selected. Thanks to Serge Hallyn <serue@us.ibm.com> for pointing this out. Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <rml@novell.com> [PATCH] kobject_uevent: fix init ordering Looks like kobject_uevent_init is executed before netlink_proto_init and consequently always fails. Not cool. Attached patch switches the initialization over from core_initcall (init level 1) to postcore_initcall (init level 2). Netlink's initialization is done in core_initcall, so this should fix the problem. We should be fine waiting until postcore_initcall. Also a couple white space changes mixed in, because I am anal. Signed-Off-By: Robert Love <rml@novell.com> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <rml@novell.com> [PATCH] kobject_uevent: add MAINTAINER entry Attached patch adds a MAINTAINER entry for the kernel event layer. Signed-Off-By: Robert Love <rml@novell.com> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <greg@kroah.com> Merge kroah.com:/home/greg/linux/BK/bleed-2.6 into kroah.com:/home/greg/linux/BK/driver-2.6 <maneesh@in.ibm.com> [PATCH] fix kernel BUG at fs/sysfs/dir.c:20! On Thu, Nov 04, 2004 at 12:52:38PM -0800, Greg KH wrote: > Hi, > > I get the following BUG in the sysfs code when I do: > - plug in a usb-serial device. > - open the port with 'cat /dev/ttyUSB0' > - unplug the device. > - stop the 'cat' process with control-C > > This used to work just fine before your big sysfs changes. There is a similar problem reported by s390 people where we see parent kobject (directory) going away before child kobject (sub-directory). It seems kobject code is able to handle this, but not the sysfs. What could be happening that in sysfs_remove_dir() of parent directory, we try to remove its contents. It works well with the regular files as it is the final removal for sysfs_dirent corresponding to the files. But in case of sub-directory we are doing an extra sysfs_put(). Once while removing parent and the other one being the one from when sysfs_remove_dir() is called for the child. The following patch worked for the s390 people, I hope same will work in this case also. o Do not remove sysfs_dirents corresponding to the sub-directory in sysfs_remove_dir(). They will be removed in the sysfs_remove_dir() call for the specific sub-directory. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> <torvalds@ppc970.osdl.org> Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-08 18:13 ` Linus Torvalds @ 2004-11-08 20:59 ` Christian Kujau 2004-11-08 23:49 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-08 20:59 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, alsa-devel, linux-sound, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Linus Torvalds schrieb: > > No, just gut feel. If the pre-merge ALSA works, and the post-merge one > doesn't, and the oops in both cases happen somewhere close to where it > does "pci_enable_device()", there's not a lot left. There are interrupts, > and there is the PCI layer... yes, makes sense. >> >>i did "bk undo -a1.2463" from a current -BK tree and it oopses: > > Note that "bk undo -axxx" will _leave_ xxx in place, and undo everything > after. > > So what you did still has the merge in the tree, and that it still oopses > is thus to be expected. BUT, we're getting closer. yes, i think i understood that. that's why i wanted to revert 1.2463 too. [...] > > Now, that's fine - the USB merge is likely to be ok, so try doing > > bk undo -a1.2462 for now i appreciate your work here but i have to postpone the the "bk revtool" stuff because i have no X _and_ bk here. (but i'm a good student and will do my homework) > and you will now have a tree that is exactly the same as before, except it > does _not_ have the PCI merge from Greg. > > And if this one does not oops, you can now officially blame Greg. i can't wait... ;) >> Now, if you want to get _really_ fancy, you can now look at each changeset > that differed, with something like > > bk set -n -d -r1.2462 -r1.2463 | bk -R prs -h -d'<:P:@:HOST:>\n$each(:C:){\t(:C:)\n}\n' - > > which is black magic that does a set operation and shows all the changes > in between the sets of "bk at 1.2462" and "bk at 1.2463". > > (This is _not_ the same as "bk changes -r1.2462..1.2463", because that one > just shows the single merge change that is on the direct _path_ from one > changeset to another. The black magic thing shows the set difference of > changesets that comes from the full graph at two points). > > Then you can look at each change individually and see if they matter. will do, after the build > > And once you can do the set operations, you're officially a BK poweruser. > Me, I just have a script, I'm a BK dabbler. > > Looking at the list (appended), I don't see anything obvious, but hey, if > it was obvious it wouldn't have been merged in the first place. > > Thanks for your willingness to pursue this thing, hey, thanks to you and to the folks in the Cc: field to chase a bug which only _i_ encounter until now. /me is building now.... thanks, Christian. - -- BOFH excuse #111: The salesman drove over the CPU board. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBj94f+A7rjkF8z0wRAm/uAJ0eTBa20JnX+250GpFiSED4b+arQwCggSgo CO/MQ+1jeOOvb7WaJRKg7uY= =Qlt1 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-08 20:59 ` Christian Kujau @ 2004-11-08 23:49 ` Christian Kujau 2004-11-09 1:31 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-08 23:49 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>>Now, that's fine - the USB merge is likely to be ok, so try doing >>> >>> bk undo -a1.2462 i did so, 1.2463 went away, building as usual - but the oops resists :( http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-a1.2462.txt > > for now i appreciate your work here but i have to postpone the the "bk > revtool" stuff because i have no X _and_ bk here. (but i'm a good student > and will do my homework) ...in progress... >>> >>> bk set -n -d -r1.2462 -r1.2463 | bk -R prs -h -d'<:P:@:HOST:>\n$each(:C:){\t(:C:)\n}\n' - >>> >>>which is black magic that does a set operation and shows all the changes >>>in between the sets of "bk at 1.2462" and "bk at 1.2463". hm, i guess this has to wait now. >>>Looking at the list (appended), I don't see anything obvious, but hey, if >>>it was obvious it wouldn't have been merged in the first place. yes, i'll look for changes regarding PCI. i've started to compile the -bk snapshots too. there i can do less wrong things. when i have the "bad" -bk snapshot i'll use "bk" itself again to find the detailed change leading to the oops. i hope to get another machine with a another es1371 tomorrow and see if the error is reproduceable. thanks, Christian. PS: i've taken linux-sound and alsa-devel from CC. - -- BOFH excuse #74: You're out of memory -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkAXx+A7rjkF8z0wRAttsAJ9sOI7FVw+Lx8rBYHusHILQvIkeJACfZWDX zMY4MtVYCCxU3y0Tb/muG5Y= =CBO/ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-08 23:49 ` Christian Kujau @ 2004-11-09 1:31 ` Christian Kujau 2004-11-09 7:40 ` Pekka Enberg 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-09 1:31 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ok, i've done some other things here and built kernels from 2.6.10-rc1-bk13 and all were giving the oops: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1-bk13 http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-2.6.10-rc1-bk13.txt the config is the same config i am usually using, never gave me a headache, new options (due to new kernel version) were left to default in most cases. anyway - i've pulled again a recent tree, did "bk undo -a1.2463" again but this time i stripped down my .config (via menuconfig) to the absolute necessary things: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1_a1.2463_take2 ...and it did *NOT* oops: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-no-oops-2.6.10-rc1_a1.2463.txt i'll investigate further, building former -bk snapshots, using other configs before i'll fiddle around with bk again (to get the smaller changes). but this is a tomorrow thing, real life calls in :( Thank you all so far, Christian. - -- BOFH excuse #92: Stale file handle (next time use Tupperware(tm)!) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkB3v+A7rjkF8z0wRAjU/AKCGPnfuJiBzamcRwU9hIiH+GXZNSwCgi2YK kwN9O4z/1MzWEakWX0p6IGo= =d8GA -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-09 1:31 ` Christian Kujau @ 2004-11-09 7:40 ` Pekka Enberg 2004-11-09 12:33 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Pekka Enberg @ 2004-11-09 7:40 UTC (permalink / raw) To: Christian Kujau; +Cc: Kernel Mailing List, Linus Torvalds, Greg KH Hi, On Tue, 09 Nov 2004 02:31:28 +0100, Christian Kujau <evil@g-house.de> wrote: > the config is the same config i am usually using, never gave me a > headache, new options (due to new kernel version) were left to default in > most cases. anyway - i've pulled again a recent tree, did > "bk undo -a1.2463" again but this time i stripped down my .config (via > menuconfig) to the absolute necessary things: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1_a1.2463_take2 > > ...and it did *NOT* oops: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-no-oops-2.6.10-rc1_a1.2463.txt > > i'll investigate further, building former -bk snapshots, using other > configs before i'll fiddle around with bk again (to get the smaller > changes). but this is a tomorrow thing, real life calls in :( CONFIG_PREEMPT is one obvious candidate (you have that enabled in the original config and disabled in the non-oopsing one). Pekka ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 2004-11-09 7:40 ` Pekka Enberg @ 2004-11-09 12:33 ` Christian Kujau 2004-11-09 17:26 ` Oops in 2.6.10-rc1 (almost solved) Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-09 12:33 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Pekka Enberg, Linus Torvalds, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 this damn thread is far too long already... Pekka Enberg schrieb: > CONFIG_PREEMPT is one obvious candidate (you have that enabled in the > original config and disabled in the non-oopsing one). i've disabled *only* CONFIG_PREEMPT in another .config but it still oopses: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-debug_oops-2.6.10-rc1_no-preempt.txt http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1_no-preempt.txt 2.6.9 with preempt enabled does not oops: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.9_preempt.txt http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-no-oops_2.6.9_preempt.txt i was a fool to test further -bk snapshots but it was kinda late yesterday and i was confused: patch-2.6.9.bz2 -> 19-Oct-2004 patch-2.6.10-rc1.bz2 -> 23-Oct-2004 00:12 patch-2.6.10-rc1-bk1.bz2 -> 23-Oct-2004 13:34 2.6.9 is not oopsing *here*, plain 2.6.10-rc1 is oopsing. so i can *not* use -bk snapshots any more and i will go on with BK (undo the ChangeSets Linus told me about) and use different .configs now. sorry for the confusion and especially sorry to my bk mentor: we seem to be so close to the right ChangeSet and then i started to use *snapshots* again. Thanks, Christian - -- BOFH excuse #76: Unoptimized hard drive -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkLkQ+A7rjkF8z0wRAhqLAJ9bZm+B5LKR+sY7V+yi/fSrhJuGrwCfcumS GwsGsjKson9vwRMCDtT9/Zk= =ailz -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 12:33 ` Christian Kujau @ 2004-11-09 17:26 ` Christian Kujau 2004-11-09 18:53 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-09 17:26 UTC (permalink / raw) To: Christian Kujau, Kernel Mailing List Cc: Pekka Enberg, Linus Torvalds, Greg KH On Tue, 09 Nov 2004 13:33:20 +0100, Christian Kujau wrote > i've disabled *only* CONFIG_PREEMPT in another .config but it > still oopses: at least i finally found the "bad" .config option: it's CONFIG_EDD. when i disable this option (and only this options. i can use the same .config as usual only disbaling this very option. diff is my witness.) i can boot a current (!) 2.6.10-rc1-bk and a working snd-ens1371! i'll test with CONFIG_EDD=m later on. here a short summary: 2.6.9 CONFIG_EDD=y - OK 2.6.10-rc1-bk CONFIG_EDD=y - OOPS! 2.6.10-rc1-bk CONFIG_EDD=n - OK 2.6.10-rc1-bk CONFIG_EDD=m - ?? yes, i'll continue to find out the ChangeSet but now i (and perhaps you too, if you are as curious as me) will know where to look at. i must admit that i was not entirely sure why i wanted to enable CONFIG_EDD at all. if i had never enabled it, it'd have saved me a week of bug chasing, but learning is fun, too. thanks, Christian. -- BOFH excuse #209: Only people with names beginning with 'A' are getting mail this week (a la Microsoft) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 17:26 ` Oops in 2.6.10-rc1 (almost solved) Christian Kujau @ 2004-11-09 18:53 ` Linus Torvalds 2004-11-09 23:30 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-09 18:53 UTC (permalink / raw) To: Christian Kujau; +Cc: Kernel Mailing List, Pekka Enberg, Greg KH, Matt_Domsch On Tue, 9 Nov 2004, Christian Kujau wrote: > > at least i finally found the "bad" .config option: it's CONFIG_EDD. > when i disable this option (and only this options. i can use the same > .config as usual only disbaling this very option. diff is my witness.) > i can boot a current (!) 2.6.10-rc1-bk and a working snd-ens1371! Very strange. There's not a lot of stuff that affects EDD directly that I can see, but there is: ChangeSet@1.2000.5.108, 2004-10-20 08:36:22-07:00, Matt_Domsch@dell.com [PATCH] EDD: use EXTENDED READ command, add CONFIG_EDD_SKIP_MBR Some controller BIOSes have problems with the legacy int13 fn02 READ SECTORS command. int13 fn42 EXTENDED READ is used in preference by most boot loaders today, so lets use that. If EXTENDED READ fails or isn't supported, fall back to READ SECTORS. This hopefully resolves the three reports of BIOSes which would either long-pause (30+ seconds) or hang completely on the legacy READ SECTORS command. This also adds CONFIG_EDD_SKIP_MBR to eliminate reading the MBR on each BIOS-presented disk, in case there are further problems in this area. Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> which might fit the bill. However, even that would just change the EDD _data_, it doesn't change the code that actually runs in the kernel. And I _really_ don't see what EDD has got to do with anything. I wonder if the EDD stuff corrupts the sysfs tree or something, and you're just seeing some strange kobject interference. Greg, you'd likely still be on the line for that one. Christian, finding which change triggers this would be very good indeed. I think the merge with greg is still a good place to start, although even just doing the snapshot trees (from _before_ -rc1: ie the patches in /pub/linux/kernel/v2.6/snapshots/old: patch-2.6.9-bk*.gz) is actually also a good way to narrow things down. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 18:53 ` Linus Torvalds @ 2004-11-09 23:30 ` Christian Kujau 2004-11-09 23:40 ` Matt Domsch 0 siblings, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-09 23:30 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Pekka Enberg, Greg KH, Matt_Domsch -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Linus Torvalds schrieb: > > Very strange. There's not a lot of stuff that affects EDD directly that I > can see, but there is: > > ChangeSet@1.2000.5.108, 2004-10-20 08:36:22-07:00, Matt_Domsch@dell.com > [PATCH] EDD: use EXTENDED READ command, add CONFIG_EDD_SKIP_MBR and i say: good catch! that does it! i did "bk undo -a1.2000.5.108" on a current tree, booting this still gives an oops: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9_a1.2000.5.108.txt excluding this single ChangeSet with "bk undo -r1.2118" does work with CONFIG_EDD=y: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9_r1.2000.5.108.txt (the filename here should really read "...r1.2118.txt" because that was the number of the changeset representing the above [PATCH] *after* i did "bk undo -a1.2000.5.108". right?) > However, even that would just change the EDD _data_, it doesn't change the > code that actually runs in the kernel. And I _really_ don't see what EDD > has got to do with anything. understanding a lot less of all this than you guys i also wonder why only this single driver broke. i've always loaded a couple of drivers here, maybe i could play around a bit e.g. CONFIG_SND_ENS1371=y instead of =m or see if other hw drivers break too. > I wonder if the EDD stuff corrupts the sysfs tree or something, and you're > just seeing some strange kobject interference. do userspace tools matter here? there is "sysfsutils-1.1.0-1" and "libsysfs1-1.1.0-1" (both debian/unstable) installed here, /sys is mounted: sysfs on /sys type sysfs (rw) > Christian, finding which change triggers this would be very good indeed. I > think the merge with greg is still a good place to start, although even i'll look again over the -bk magic you told me about and see what it gives. thanks so far to all involved here, i really enjoyed "working" with you. first class support at no charge...it's just incredible. you guys rock, Christian. - -- BOFH excuse #112: The monitor is plugged into the serial port -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkVMN+A7rjkF8z0wRAqu4AKCtxZxE2spjZGgSnxTWzTTB0CWCkACgi2f3 RmHQXbnkcI1OEcLORhP1dmA= =5Dot -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 23:30 ` Christian Kujau @ 2004-11-09 23:40 ` Matt Domsch 2004-11-10 0:21 ` Christian Kujau 2004-11-11 22:43 ` Matt Domsch 0 siblings, 2 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-09 23:40 UTC (permalink / raw) To: Christian Kujau Cc: Kernel Mailing List, Linus Torvalds, Pekka Enberg, Greg KH On Wed, Nov 10, 2004 at 12:30:21AM +0100, Christian Kujau wrote: > > ChangeSet@1.2000.5.108, 2004-10-20 08:36:22-07:00, Matt_Domsch@dell.com > > [PATCH] EDD: use EXTENDED READ command, add CONFIG_EDD_SKIP_MBR > > and i say: good catch! that does it! > > i did "bk undo -a1.2000.5.108" on a current tree, booting this still gives > an oops: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9_a1.2000.5.108.txt > > excluding this single ChangeSet with "bk undo -r1.2118" does work with > CONFIG_EDD=y: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9_r1.2000.5.108.txt OK, thanks, that helps. From the diff of those dmesg: -BIOS EDD facility v0.16 2004-Jun-25, 16 devices found +BIOS EDD facility v0.16 2004-Jun-25, 6 devices found So with the latest EDD patch noted above, it's finding more disks than before. How many disks do you actually have in the system? I'll review the assembly again to see where I could have miscounted, and see how that may affect the EDD sysfs exports. Likely no answer from me before tomorrow though. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 23:40 ` Matt Domsch @ 2004-11-10 0:21 ` Christian Kujau 2004-11-10 1:01 ` Linus Torvalds 2004-11-11 22:43 ` Matt Domsch 1 sibling, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-10 0:21 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Matt Domsch, Linus Torvalds, Pekka Enberg, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matt Domsch schrieb: > > -BIOS EDD facility v0.16 2004-Jun-25, 16 devices found > +BIOS EDD facility v0.16 2004-Jun-25, 6 devices found > > So with the latest EDD patch noted above, it's finding more disks than > before. How many disks do you actually have in the system? i have one scsi disk (sda) and two atapi cdrom drives: hda: CRD-8483B, ATAPI CD/DVD-ROM drive hdb: AOPEN CD-RW CRW3248 1.17 20020620, ATAPI CD/DVD-ROM drive ... SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB) SCSI device sda: drive cache: write back the "scsi0 : sym-2.1.18k" is on a pci card, the atapi devices are connected onboard. if it helps: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/lspci-v.txt http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/lspci-vv.txt > I'll review the assembly again to see where I could have miscounted, > and see how that may affect the EDD sysfs exports. Likely no answer > from me before tomorrow though. that's ok, real life kicks in here too... thanks, Christian. PS: do you have *any* idea how this could be related to the snd-es1371 driver (which is producing the oops then)? - -- BOFH excuse #449: greenpeace free'd the mallocs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkV75+A7rjkF8z0wRAl67AJ9P+SF1WfRe7r2zoF9D/b/fyDeD0QCfe6/f Uxt5DVlb/IzW9VSWuFJqLlI= =Hpg9 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-10 0:21 ` Christian Kujau @ 2004-11-10 1:01 ` Linus Torvalds 0 siblings, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2004-11-10 1:01 UTC (permalink / raw) To: Christian Kujau; +Cc: Kernel Mailing List, Matt Domsch, Pekka Enberg, Greg KH On Wed, 10 Nov 2004, Christian Kujau wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Matt Domsch schrieb: > > > > -BIOS EDD facility v0.16 2004-Jun-25, 16 devices found > > +BIOS EDD facility v0.16 2004-Jun-25, 6 devices found > > > > So with the latest EDD patch noted above, it's finding more disks than > > before. How many disks do you actually have in the system? > > i have one scsi disk (sda) and two atapi cdrom drives: Interestingly, "16" is also EDD_MBR_SIG_MAX, so my suspicion is that it overflowed some EDD data area. edd_num_devices() (which is what reports the above number) does min_t(unsigned char, max_t(unsigned char, edd.edd_info_nr, edd.mbr_signature_nr), max_t(unsigned char, EDD_MBR_SIG_MAX, EDDMAXNR)); where EDDMAXNR is 6, and EDD_MBR_SIG_MAX is the afore-mentioned 16, so we know that either edd.edd_info_nr or edd.mbr_signature_nr is actually _bigger_ than 16. Which is clearly totally bogus. In fact, even your old "6 devices found" thing looks suspiciously bogus. > PS: do you have *any* idea how this could be related to the snd-es1371 > driver (which is producing the oops then)? I bet it's overwriting some array, and just corrupting memory after it. For example, the edd_info[] array only has 6 entries, and for example, the EDD_MBR_SIG_BUFFER is quite close to where we save the E820MAP memory map at bootup, so if something stomps on that, the kernel might be confused about where PCI memory can be allocated or similar. Or it might have overwritten some ACPI memory data, who knows. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-09 23:40 ` Matt Domsch 2004-11-10 0:21 ` Christian Kujau @ 2004-11-11 22:43 ` Matt Domsch 2004-11-11 22:53 ` Linus Torvalds 2004-11-12 0:27 ` Christian Kujau 1 sibling, 2 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-11 22:43 UTC (permalink / raw) To: Christian Kujau Cc: Kernel Mailing List, Linus Torvalds, Pekka Enberg, Greg KH On Tue, Nov 09, 2004 at 05:40:54PM -0600, Matt Domsch wrote: > OK, thanks, that helps. From the diff of those dmesg: > > -BIOS EDD facility v0.16 2004-Jun-25, 16 devices found > +BIOS EDD facility v0.16 2004-Jun-25, 6 devices found As Linus points out, those are the magic numbers in EDD for number of device entries stored. Your BIOS seems to be reporting that is has more devices than it does, or the EDD assembly is horked in a way I have not yet deciphered. > I'll review the assembly again to see where I could have miscounted, > and see how that may affect the EDD sysfs exports. Likely no answer > from me before tomorrow though. I haven't been able to find a solution to your problem yet, and given some external time constraints I've got, won't be able to look into this again for another week or more. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-11 22:43 ` Matt Domsch @ 2004-11-11 22:53 ` Linus Torvalds 2004-11-11 22:55 ` Matt Domsch 2004-11-12 0:27 ` Christian Kujau 1 sibling, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-11 22:53 UTC (permalink / raw) To: Matt Domsch, Andrew Morton Cc: Christian Kujau, Kernel Mailing List, Pekka Enberg, Greg KH On Thu, 11 Nov 2004, Matt Domsch wrote: > > I haven't been able to find a solution to your problem yet, and given > some external time constraints I've got, won't be able to look into > this again for another week or more. Matt, I'll revert the EXTENDED READ change for now, then. The random behaviour of the problem it causes makes me really dislike this bug, and I'd like to release a -rc2 and start calming down the 2.6.10 stuff, but having known random stuff happen really disturbs me. We can re-do it once it's more obvious why it broke.. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-11 22:53 ` Linus Torvalds @ 2004-11-11 22:55 ` Matt Domsch 0 siblings, 0 replies; 21+ messages in thread From: Matt Domsch @ 2004-11-11 22:55 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Christian Kujau, Kernel Mailing List, Pekka Enberg, Greg KH On Thu, Nov 11, 2004 at 02:53:15PM -0800, Linus Torvalds wrote: > Matt, I'll revert the EXTENDED READ change for now, then. The random > behaviour of the problem it causes makes me really dislike this bug, and > I'd like to release a -rc2 and start calming down the 2.6.10 stuff, but > having known random stuff happen really disturbs me. > > We can re-do it once it's more obvious why it broke.. Good plan, thanks. -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-11 22:43 ` Matt Domsch 2004-11-11 22:53 ` Linus Torvalds @ 2004-11-12 0:27 ` Christian Kujau 2004-11-12 0:49 ` Linus Torvalds 1 sibling, 1 reply; 21+ messages in thread From: Christian Kujau @ 2004-11-12 0:27 UTC (permalink / raw) To: Matt Domsch; +Cc: Kernel Mailing List, Linus Torvalds, Pekka Enberg, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matt Domsch schrieb: > > As Linus points out, those are the magic numbers in EDD for number of > device entries stored. Your BIOS seems to be reporting that is has > more devices than it does, or the EDD assembly is horked in a way I > have not yet deciphered. actually, my BIOS is even to old for e.g. ACPI, with latest firmware installed. i had no issues so far with the board/bios, but perhaps this is no longer true. however, it's still strange that this thing is only triggerd with you change and CONFIG_EDD=y. > > I haven't been able to find a solution to your problem yet, and given > some external time constraints I've got, won't be able to look into > this again for another week or more. nevermind then. as nobody else seem to be bothered by this i am happy with the workarund (CONFIG_EDD=n) and since the lkml-archives exist we could get back to it when it's bothering more people (n>1) thank you for your time, Christian. - -- BOFH excuse #396: Mail server hit by UniSpammer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBlAOE+A7rjkF8z0wRAkyLAJ4uy4LYBHWk8Wxwr/heQRVm7VOXfwCfW30C Zv1RdMYf1VOBEGkUnkQ+k0Q= =f2hG -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-12 0:27 ` Christian Kujau @ 2004-11-12 0:49 ` Linus Torvalds 2004-11-12 1:27 ` Christian Kujau 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2004-11-12 0:49 UTC (permalink / raw) To: Christian Kujau; +Cc: Matt Domsch, Kernel Mailing List, Pekka Enberg, Greg KH On Fri, 12 Nov 2004, Christian Kujau wrote: > > nevermind then. as nobody else seem to be bothered by this i am happy with > the workarund (CONFIG_EDD=n) and since the lkml-archives exist we could > get back to it when it's bothering more people (n>1) The problem with that approach is that very few people are willing to spend the time and effort to really try to figure out where the problem triggers for them. Thanks again for testing lots of kernels, and different configurations. Basically, if it's a problem that only happens for a smallish percentage of people, and an even smaller percentage of those is willing to dig down and find it, it's not a problem we can afford to ignore. Ignoring it just means that there will be "a few" error reports that we will either waste time on, or (even worse) we'll dismiss as "known problems" and then possibly miss _another_ bug. This is why I take random unexplained (but pinpointed) problems so seriously. If it wasn't as apparently random, we could file it under "known problem" and decide to try to fix it later. As it is, it's filed under "known cause", but since we don't know _why_, it might cause totally different problems on another machine, and that just makes it too painful for words. So the changeset is reverted for now in the current -bk tree, and I'll make a -rc2 this weekend and hope that we can stabilize for 2.6.10. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-12 0:49 ` Linus Torvalds @ 2004-11-12 1:27 ` Christian Kujau 0 siblings, 0 replies; 21+ messages in thread From: Christian Kujau @ 2004-11-12 1:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matt Domsch, Kernel Mailing List, Pekka Enberg, Greg KH -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Linus Torvalds schrieb: > > This is why I take random unexplained (but pinpointed) problems so > seriously. If it wasn't as apparently random, we could file it under > "known problem" and decide to try to fix it later. As it is, it's filed > under "known cause", but since we don't know _why_, it might cause totally > different problems on another machine, and that just makes it too painful > for words. just after sending my last mail i too (re)thought about this and i'd have begged Matt to revert the patch if it was not *only* me having this issue. but i can see your point here and i appreciate your decision. > So the changeset is reverted for now in the current -bk tree, and I'll > make a -rc2 this weekend and hope that we can stabilize for 2.6.10. yay! thanks, Christian. - -- BOFH excuse #96: Vendor no longer supports the product -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBlBFw+A7rjkF8z0wRAld5AJ40MjbzFbVXepXkJr1tLZCvYy7z2QCeMYCe QQyekHBs1cjuebPZTEuPZZ0= =wwF6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-11-15 12:41 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-11-13 3:45 Oops in 2.6.10-rc1 (almost solved) Chuck Ebbert 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2004-11-14 21:55 ` Matt Domsch 2004-11-15 12:41 ` Oops in 2.6.10-rc1 (solved) Christian Kujau -- strict thread matches above, loose matches on Subject: below -- 2004-10-28 13:12 Oops in 2.6.10-rc1 Christian 2004-11-07 16:57 ` Linus Torvalds 2004-11-07 18:31 ` Christian Kujau 2004-11-07 23:45 ` Christian Kujau 2004-11-08 1:16 ` Linus Torvalds 2004-11-08 13:01 ` Christian Kujau 2004-11-08 18:13 ` Linus Torvalds 2004-11-08 20:59 ` Christian Kujau 2004-11-08 23:49 ` Christian Kujau 2004-11-09 1:31 ` Christian Kujau 2004-11-09 7:40 ` Pekka Enberg 2004-11-09 12:33 ` Christian Kujau 2004-11-09 17:26 ` Oops in 2.6.10-rc1 (almost solved) Christian Kujau 2004-11-09 18:53 ` Linus Torvalds 2004-11-09 23:30 ` Christian Kujau 2004-11-09 23:40 ` Matt Domsch 2004-11-10 0:21 ` Christian Kujau 2004-11-10 1:01 ` Linus Torvalds 2004-11-11 22:43 ` Matt Domsch 2004-11-11 22:53 ` Linus Torvalds 2004-11-11 22:55 ` Matt Domsch 2004-11-12 0:27 ` Christian Kujau 2004-11-12 0:49 ` Linus Torvalds 2004-11-12 1:27 ` Christian Kujau
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.