On Mon, Jan 12, 2009 at 7:17 PM, Pallipadi, Venkatesh wrote: >>Hoping to fix this memtype problem I applied the patch from the pull >>request to 29-rc1 and rebooted. Now the system completely locks up >>when X is trying to start. >>Via serial console I got this Oops: >>[ 79.500149] BUG: unable to handle kernel NULL pointer dereference >>at 0000000000000003 >>[ 79.509240] IP: [<0000000000000003>] 0x3 >>[ 79.510002] PGD 0 >>[ 79.510002] Oops: 0010 [#1] SMP >>[ 79.510002] last sysfs file: >>/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable >>[ 79.510002] CPU 0 >>[ 79.510002] Modules linked in: w83792d tuner tea5767 tda8290 >>tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761 >>tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat >>v4l2_compat_ioctl32 usbhid videobuf_dma_sg videobuf_core hid btcx_risc >>tveeprom sg pata_amd >>[ 79.510002] Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #2 >>[ 79.510002] RIP: 0010:[<0000000000000003>] [<0000000000000003>] 0x3 >>[ 79.510002] RSP: 0018:ffffffff809a8b18 EFLAGS: 00010002 >>[ 79.510002] RAX: 0000000000000001 RBX: ffffffff00000000 >>RCX: 0000000000000000 >>[ 79.510002] RDX: 0000000000000001 RSI: 0000000000000000 >>RDI: ffffffff809a8ca8 >>[ 79.510002] RBP: ffffffff809a8b18 R08: 0000000000000001 >>R09: 0000000000000100 >>[ 79.510002] R10: ffffffff8026af40 R11: 00000000000068d8 >>R12: 0000000000000000 >>[ 79.510002] R13: ffff88007e4fd700 R14: ffff880028018d00 >>R15: ffffffff809a8aa8 >>[ 79.510002] FS: 00007ff217e406f0(0000) GS:ffffffff809b1040(0000) >>knlGS:0000000000000000 >>[ 79.510002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>[ 79.510002] CR2: 0000000000000003 CR3: 0000000000201000 >>CR4: 00000000000006e0 >>[ 79.510002] DR0: 0000000000000000 DR1: 0000000000000000 >>DR2: 0000000000000000 >>[ 79.510002] DR3: 0000000000000000 DR6: 00000000ffff4ff0 >>DR7: 0000000000000400 >>[ 79.510002] Process swapper (pid: 0, threadinfo ffffffff8087e000, >>task ffffffff807de360) >>[ 79.510002] Stack: >>[ 79.510002] ffffffff809a8b68 ffffffff802389d7 0000000000000000 >>ffffffff809a8b60 >>[ 79.510002] 0000000000000082 ffffffff8022a7a8 0000000000000000 >>0000000000000001 >>[ 79.510002] 0000000000000060 ffffffff807de360 ffffffff809a8b78 >>ffffffff80238b7d >>[ 79.510002] Call Trace: >>[ 79.510002] Call Trace: >>[ 79.510002] <0> [] >>try_to_wake_up+0x137/0x2d0 >>[ 79.510002] [] ? do_page_fault+0x368/0x970 >>[ 79.510002] [] default_wake_function+0xd/0x10 >>[ 79.510002] [] autoremove_wake_function+0x11/0x40 >>[ 79.510002] [] ? ata_scsi_qc_complete+0x1df/0x4c0 >>[ 79.510002] [] ? >>_spin_unlock_irqrestore+0x2f/0x40 >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0xec/0x100 >>[ 79.510002] [] ? >>trace_hardirqs_off_thunk+0x3a/0x6c >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0x0/0x100 >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0xec/0x100 >>[ 79.510002] [] ? page_fault+0x1f/0x30 >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0xec/0x100 >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0x0/0x100 >>[ 79.510002] [] ? warn_slowpath+0x4c/0x130 >>[ 79.510002] [] ? scsi_next_command+0x45/0x60 >>[ 79.510002] [] ? scsi_io_completion+0x376/0x4e0 >>[ 79.510002] [] ? scsi_finish_command+0xac/0xe0 >>[ 79.510002] [] ? scsi_softirq_done+0xb8/0x140 >>[ 79.510002] [] ? __remove_hrtimer+0x40/0xa0 >>[ 79.510002] [] ? >>generic_smp_call_function_interrupt+0xec/0x100 >>[ 79.510002] [] ? >>smp_call_function_interrupt+0x1f/0x30 >>[ 79.510002] [] ? >>call_function_interrupt+0x13/0x20 >>[ 79.510002] <0>Code: Bad RIP value. >>[ 79.510002] RIP [<0000000000000003>] 0x3 >>[ 79.510002] RSP >>[ 79.510002] CR2: 0000000000000003 >>[ 79.510002] ---[ end trace 99e686e29f771a49 ]--- >>[ 79.510002] Kernel panic - not syncing: Fatal exception in interrupt >>[ 79.510002] ------------[ cut here ]------------ > Torsten, > > I don't seem to be able to reproduce this failure on my test systems.. > What distribution are you using here? Can you send me the kernel config that you used. I'm using Gentoo, the compiler is: gcc (Gentoo 4.3.2-r2 p1.5, pie-10.1.5) 4.3.2 The system has 2x 2218 Opterons with 4GB of RAM, so it a NUMA system with 2 nodes. What might be important is, that I switched to the new TREE_RCU: # CONFIG_CLASSIC_RCU is not set CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=4 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_PREEMPT_RCU_TRACE is not set Rest of the .config is attached. I used the same .config for the vanilla 2.6.29-rc1 that worked apart from the DRM trouble that was also reported by others and the version patched with these fixes. HTH Torsten