All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
@ 2006-08-27 14:23 cagri coltekin
  2006-08-28  0:16 ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-08-27 14:23 UTC (permalink / raw)
  To: netdev, davem, pekkas

Hi,

[ Apologies for possible duplicates, and if I'm addressing wrong
  people. ]

The following is the standard bug report form. I believe I have
included enough information for the starters. I'd be happy to try
to provide more if you need it. Please let me know.

Kind Regards,
-- 
Cagri Coltekin

------------------------------------------------------------------------------

[1.] Kernel message: kernel BUG at net/ipv6/ip6_output.c:718

[2.] Full description of the problem/report:
    
 This is a on busy DNS server (serving about 5k queries per
 second). The problem started after we have switched to 2.6
 kernel since it gives better performance. After the the kernel
 bug message below, the system continues to run.  However, bind
 gets stuck, completely unresponsive, killing it makes it zombie
 (parent is init). A new instance of bind can be started, and
 works fine until next time. 

 The problem is not in 2.4, we were running kernel.org 2.4.29
 before.

 I've tried the patch at http://lkml.org/lkml/2006/8/13/56, just
 in case. It does not make any difference.

 The system is a dual CPU (Hyperthreading) Dell Poweredge 2650.

[3.] Keywords: Kernel, networking, IPv6, UDP, DNS 

[4.] Kernel version (from /proc/version):

 Linux version 2.6.17.11-ns-debug (cagri@x10) (gcc version 3.3.5 (Debian 1:3.3.5-8ubuntu2.1)) #6 SMP Sat Aug 26 05:06:53 CEST 2006

 It is vanilla 2.6.17.11 with all unnecessary functionality
 removed during compilation.

 Please note that the kernel is compiled on a different machine,
 I'll provide information on both system below where appropriate.

[5.] Output of Oops.. 

  NOTE that there are bug messages from two consecutive events.

Aug 25 04:03:35 ns kernel: ------------[ cut here ]------------
Aug 25 04:03:35 ns kernel: kernel BUG at net/ipv6/ip6_output.c:718!
Aug 25 04:03:35 ns kernel: invalid operand: 0000 [#1]
Aug 25 04:03:35 ns kernel: SMP 
Aug 25 04:03:35 ns kernel: Modules linked in: uhci_hcd ehci_hcd ohci_hcd aic7xxx ide_cd
Aug 25 04:03:35 ns kernel: CPU:    3
Aug 25 04:03:35 ns kernel: EIP:    0060:[svc_create_socket+189/416]    Not tainted VLI
Aug 25 04:03:35 ns kernel: EFLAGS: 00010282   (2.6.12.6-ncc-server) 
Aug 25 04:03:35 ns kernel: EIP is at ip6_fragment+0x24d/0x880
Aug 25 04:03:35 ns kernel: eax: fffffff2   ebx: f5391ff8   ecx: 000007e8   edx: f792a600
Aug 25 04:03:35 ns kernel: esi: f792a800   edi: f5391ff8   ebp: f513d180   esp: f6defbac
Aug 25 04:03:35 ns kernel: ds: 007b   es: 007b   ss: 0068
Aug 25 04:03:35 ns kernel: Process named (pid: 1942, threadinfo=f6dee000 task=f7959a20)
Aug 25 04:03:35 ns kernel: Stack: f31a8480 000007e8 f5392000 fffffdd0 f6dee000 00000000 b6346cc8 8000021e 
Aug 25 04:03:35 ns kernel:        00000000 000007e8 00000000 fffffdd0 ffffffee 000007e8 fffffdd4 f7075780 
Aug 25 04:03:35 ns kernel:        f792a048 f792a018 f31a8480 f792a040 f7b18080 c031fd50 f31a8480 c031fa30 
Aug 25 04:03:35 ns kernel: Call Trace:
Aug 25 04:03:35 ns kernel:  [svc_tcp_accept+960/992] ip6_output+0x30/0x40
Aug 25 04:03:35 ns kernel:  [svc_tcp_accept+160/992] ip6_output2+0x0/0x2f0
Aug 25 04:03:35 ns kernel:  [svcauth_unix_accept+43/496] ip6_push_pending_frames+0x29b/0x470
Aug 25 04:03:35 ns kernel:  [__func__.2+898/986] udp_v6_push_pending_frames+0x148/0x1b0
Aug 25 04:03:35 ns kernel:  [ip_send_reply+144/592] ip_generic_getfrag+0x0/0xc0
Aug 25 04:03:35 ns kernel:  [__func__.1+0/17] udpv6_sendmsg+0x52c/0x920
Aug 25 04:03:35 ns kernel:  [arp_solicit+356/464] udp_recvmsg+0x54/0x300
Aug 25 04:03:35 ns kernel:  [igmp_rcv+125/336] inet_sendmsg+0x4d/0x60
Aug 25 04:03:35 ns kernel:  [sys_connect+90/176] sock_sendmsg+0xda/0x100
Aug 25 04:03:35 ns kernel:  [find_busiest_group+209/736] find_busiest_group+0xd1/0x2e0
Aug 25 04:03:35 ns kernel:  [pci_bus_read_config_dword+38/144] copy_from_user+0x46/0x80
Aug 25 04:03:35 ns kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
Aug 25 04:03:35 ns kernel:  [sock_wfree+15/96] sys_sendmsg+0x18f/0x1f0
Aug 25 04:03:35 ns kernel:  [futex_wait+292/576] futex_wait+0x124/0x240
Aug 25 04:03:35 ns kernel:  [find_extend_vma+41/144] find_extend_vma+0x29/0x90
Aug 25 04:03:35 ns kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
Aug 25 04:03:35 ns kernel:  [futex_wake+123/208] futex_wake+0x7b/0xd0
Aug 25 04:03:35 ns kernel:  [pci_bus_read_config_dword+38/144] copy_from_user+0x46/0x80
Aug 25 04:03:35 ns kernel:  [sock_alloc_send_pskb+370/480] sys_socketcall+0x242/0x260
Aug 25 04:03:35 ns kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 25 04:03:35 ns kernel: Code: 00 00 00 00 8b 54 24 2c 8b 4c 24 24 89 54 24 0c 8b 45 24 89 4c 24 04 89 44 24 08 8b 44 24 58 89 04 24 e8 a7 44 fa ff 85 c0 74 08 <0f> 0b ce 02 86 07 3a c0 0f b7 44 24 20 0f b6 d0 c1 e2 08 c1 e8 
Aug 25 05:49:07 ns kernel:  <7>UDP: bad checksum. From 213.147.0.92:53 to 193.0.0.195:2101 ulen 52
Aug 25 06:30:02 ns kernel: ------------[ cut here ]------------
Aug 25 06:30:02 ns kernel: kernel BUG at net/ipv6/ip6_output.c:718!
Aug 25 06:30:02 ns kernel: invalid operand: 0000 [#2]
Aug 25 06:30:02 ns kernel: SMP 
Aug 25 06:30:02 ns kernel: Modules linked in: uhci_hcd ehci_hcd ohci_hcd aic7xxx ide_cd
Aug 25 06:30:02 ns kernel: CPU:    0
Aug 25 06:30:02 ns kernel: EIP:    0060:[svc_create_socket+189/416]    Not tainted VLI
Aug 25 06:30:02 ns kernel: EFLAGS: 00010282   (2.6.12.6-ncc-server) 
Aug 25 06:30:02 ns kernel: EIP is at ip6_fragment+0x24d/0x880
Aug 25 06:30:02 ns kernel: eax: fffffff2   ebx: f301a7f0   ecx: 000007e0   edx: f55cee00
Aug 25 06:30:02 ns kernel: esi: f55ceff8   edi: f301a7f0   ebp: f65d6c80   esp: f7305bac
Aug 25 06:30:02 ns kernel: ds: 007b   es: 007b   ss: 0068
Aug 25 06:30:02 ns kernel: Process named (pid: 13193, threadinfo=f7304000 task=f6867520)
Aug 25 06:30:02 ns kernel: Stack: f5036a80 000007e0 f301a7f8 fffffdd8 f7304000 00000000 b7b9bcc8 a80001fe 
Aug 25 06:30:02 ns kernel:        00000000 000007e0 00000000 fffffdd8 ffffffd6 000007e0 fffffddc f7075480 
Aug 25 06:30:02 ns kernel:        f55ce848 f55ce818 f5036a80 f55ce840 f7b18800 c031fd50 f5036a80 c031fa30 
Aug 25 06:30:02 ns kernel: Call Trace:
Aug 25 06:30:02 ns kernel:  [svc_tcp_accept+960/992] ip6_output+0x30/0x40
Aug 25 06:30:02 ns kernel:  [svc_tcp_accept+160/992] ip6_output2+0x0/0x2f0
Aug 25 06:30:02 ns kernel:  [svcauth_unix_accept+43/496] ip6_push_pending_frames+0x29b/0x470
Aug 25 06:30:02 ns kernel:  [__func__.2+898/986] udp_v6_push_pending_frames+0x148/0x1b0
Aug 25 06:30:02 ns kernel:  [ip_send_reply+144/592] ip_generic_getfrag+0x0/0xc0
Aug 25 06:30:02 ns kernel:  [__func__.1+0/17] udpv6_sendmsg+0x52c/0x920
Aug 25 06:30:02 ns kernel:  [arp_solicit+356/464] udp_recvmsg+0x54/0x300
Aug 25 06:30:02 ns kernel:  [igmp_rcv+125/336] inet_sendmsg+0x4d/0x60
Aug 25 06:30:02 ns kernel:  [sys_connect+90/176] sock_sendmsg+0xda/0x100
Aug 25 06:30:02 ns kernel:  [find_busiest_group+209/736] find_busiest_group+0xd1/0x2e0
Aug 25 06:30:02 ns kernel:  [pci_bus_read_config_dword+38/144] copy_from_user+0x46/0x80
Aug 25 06:30:02 ns kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
Aug 25 06:30:02 ns kernel:  [sock_wfree+15/96] sys_sendmsg+0x18f/0x1f0
Aug 25 06:30:02 ns kernel:  [futex_wait+292/576] futex_wait+0x124/0x240
Aug 25 06:30:02 ns kernel:  [find_extend_vma+41/144] find_extend_vma+0x29/0x90
Aug 25 06:30:02 ns kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
Aug 25 06:30:02 ns kernel:  [futex_wake+123/208] futex_wake+0x7b/0xd0
Aug 25 06:30:02 ns kernel:  [pci_bus_read_config_dword+38/144] copy_from_user+0x46/0x80
Aug 25 06:30:02 ns kernel:  [sock_alloc_send_pskb+370/480] sys_socketcall+0x242/0x260
Aug 25 06:30:02 ns kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 25 06:30:02 ns kernel: Code: 00 00 00 00 8b 54 24 2c 8b 4c 24 24 89 54 24 0c 8b 45 24 89 4c 24 04 89 44 24 08 8b 44 24 58 89 04 24 e8 a7 44 fa ff 85 c0 74 08 <0f> 0b ce 02 86 07 3a c0 0f b7 44 24 20 0f b6 d0 c1 e2 08 c1 e8 
------------[ cut here ]------------

[6.] A small shell script or example program which triggers the
     problem (if possible)

 Even though the problem occurs quite quite deterministically, I
 do not (yet) have a method to re-produce it. Currently I have a
 workaround (Queries via IPv6 is handled by another system
 running 2.4), so do not hit the bug on production system.
 However, I have traffic dumps from the time span that we had
 problems, so I can try to reproduce in a test system if
 necessary. 

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)


Environment that kernel is compiled:
------------[ cut here ]------------
x10$ scripts/ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux x10 2.6.10-5-686-smp #1 SMP Thu Aug 18 22:54:45 UTC 2005 i686 GNU/Linux
 
Gnu C                  3.3.5
Gnu make               3.80
binutils               2.15
util-linux             2.12p
mount                  2.12p
module-init-tools      3.1
e2fsprogs              1.35
jfsutils               1.1.6
reiserfsprogs          3.6.19
reiser4progs           1.0.3
xfsprogs               2.6.20
quota-tools            3.12.
nfs-utils              1.0.6
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 3.2.4
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.2.1
udev                   050
Modules Loaded         proc_intf freq_table cpufreq_userspace cpufreq_powersave cpufreq_ondemand md msdos fat usb_storage rfcomm l2cap hci_usb bluetooth nls_cp437 isofs udf af_packet nfs nfsd exportfs lockd sunrpc autofs4 ipv6 tg3 i2c_i801 i2c_core usbhid piix snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc ehci_hcd uhci_hcd usbcore pci_hotplug intel_agp agpgart rtc pcspkr capability commoncap tsdev evdev psmouse mousedev parport_pc lp parport ide_generic ide_disk ide_cd ide_core cdrom ext3 jbd mbcache dm_mod sd_mod ata_piix libata scsi_mod unix thermal processor fan fbcon font bitblit vesafb cfbcopyarea cfbimgblt cfbfillrect
------------[ cut here ]------------

Production Environment that the kernel runs:
------------[ cut here ]------------
ns# sh ./ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux ns 2.6.17.11-ns-debug #6 SMP Sat Aug 26 05:06:53 CEST 2006 i686 GNU/Linux
 
Gnu C                  3.3.5
Gnu make               3.80
binutils               2.15
util-linux             2.12p
mount                  2.12p
module-init-tools      3.2-pre1
e2fsprogs              1.37
nfs-utils              1.0.6
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 3.2.1
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.2.1
Modules Loaded         ide_cd cdrom
------------[ cut here ]------------

Production environment runs debian sarge (should be fairly
up-to-date). The bind version running on the system is 9.3.2 with
a patch (AFAIK not yet public) provided by ISC.

[7.2.] Processor information (from /proc/cpuinfo):
------------[ cut here ]------------
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.40GHz
stepping	: 7
cpu MHz		: 2387.891
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 4780.50

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.40GHz
stepping	: 7
cpu MHz		: 2387.891
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 4775.16

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.40GHz
stepping	: 7
cpu MHz		: 2387.891
cache size	: 512 KB
physical id	: 3
siblings	: 2
core id		: 0
cpu cores	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 4775.25

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.40GHz
stepping	: 7
cpu MHz		: 2387.891
cache size	: 512 KB
physical id	: 3
siblings	: 2
core id		: 0
cpu cores	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 4775.36
------------[ cut here ]------------

[7.3.] Module information (from /proc/modules):

ide_cd 41220 0 - Live 0xf8a2f000
cdrom 42784 1 ide_cd, Live 0xf8831000

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
------------[ begin: /proc/ioports ]------------
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial
0800-0803 : PM1b_EVT_BLK
0804-0805 : PM1b_CNT_BLK
0808-080b : PM_TMR
080c-0813 : GPE0_BLK
0844-0847 : PM1a_EVT_BLK
0848-0849 : PM1a_CNT_BLK
0850-0857 : GPE1_BLK
08b0-08bf : 0000:00:0f.1
c000-cfff : PCI Bus #05
  c800-c8ff : 0000:05:06.1
  cc00-ccff : 0000:05:06.0
e800-e8ff : 0000:00:0e.0
ec80-ecbf : 0000:00:04.1
  ec80-ec87 : serial
ece8-ecef : 0000:00:04.0
ecf4-ecf7 : 0000:00:04.2
ecf8-ecff : 0000:00:04.0
------------[ end: /proc/ioports ]------------
------------[ begin: /proc/iomem ]------------
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000cc000-000cc5ff : Adapter ROM
000f0000-000fffff : System ROM
00100000-effeffff : System RAM
  00100000-002d8cee : Kernel code
  002d8cef-003d3503 : Kernel data
efff0000-efffebff : ACPI Tables
efffec00-efffefff : reserved
f0000000-f7ffffff : 0000:04:08.1
f8000000-f801ffff : 0000:00:0e.0
f8100000-f81fffff : PCI Bus #05
  f8100000-f811ffff : 0000:05:06.1
fcb00000-fcb0ffff : 0000:04:08.1
fcc00000-fcdfffff : PCI Bus #05
  fccfe000-fccfefff : 0000:05:06.1
  fccff000-fccfffff : 0000:05:06.0
  fcd00000-fcd1ffff : 0000:05:06.0
fcf00000-fcf0ffff : 0000:03:08.0
  fcf00000-fcf0ffff : tg3
fcf10000-fcf1ffff : 0000:03:06.0
  fcf10000-fcf1ffff : tg3
fd000000-fdffffff : 0000:00:0e.0
fe000000-fe00ffff : 0000:00:04.0
fe100000-fe100fff : 0000:00:0f.2
fe101000-fe101fff : 0000:00:0e.0
fe102000-fe102fff : 0000:00:04.1
feb00000-feb7ffff : 0000:00:04.1
feb80000-feb80fff : 0000:00:04.0
fec00000-fec0ffff : reserved
fee00000-fee0ffff : reserved
fff80000-ffffffff : reserved
------------[ end: /proc/iomem ]------------

[7.5.] PCI information ('lspci -vvv' as root)
------------[ begin: 'lspci -vvv' ]------------
0000:00:00.0 Host bridge: ServerWorks CMIC-WS Host Bridge (GC-LE chipset) (rev 13)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

0000:00:00.1 Host bridge: ServerWorks CMIC-WS Host Bridge (GC-LE chipset)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

0000:00:00.2 Host bridge: ServerWorks CMIC-LE
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

0000:00:04.0 ff00: Dell Embedded Remote Access or ERA/O
	Subsystem: Dell Embedded Remote Access or ERA/O
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32, Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at feb80000 (32-bit, prefetchable) [size=4K]
	Region 1: I/O ports at ecf8 [size=8]
	Region 2: I/O ports at ece8 [size=8]
	Expansion ROM at fe000000 [disabled] [size=64K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:04.1 ff00: Dell Remote Access Card III
	Subsystem: Dell Remote Access Card III
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin B routed to IRQ 16
	Region 0: Memory at fe102000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at ec80 [size=64]
	Region 2: Memory at feb00000 (32-bit, prefetchable) [size=512K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:04.2 ff00: Dell Embedded Remote Access: BMC/SMIC device
	Subsystem: Dell Embedded Remote Access: BMC/SMIC device
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin C routed to IRQ 7
	Region 0: I/O ports at ecf4 [size=4]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
	Subsystem: Dell: Unknown device 0121
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min), Cache Line Size: 0x10 (64 bytes)
	Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: I/O ports at e800 [size=256]
	Region 2: Memory at fe101000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at f8000000 [disabled] [size=128K]
	Capabilities: [5c] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
	Subsystem: ServerWorks CSB5 South Bridge
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32

0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 82 [Master PriP])
	Subsystem: ServerWorks CSB5 IDE Controller
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
	Region 0: I/O ports at <ignored>
	Region 1: I/O ports at <ignored>
	Region 2: I/O ports at <ignored>
	Region 3: I/O ports at <ignored>
	Region 4: I/O ports at 08b0 [size=16]

0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) (prog-if 10 [OHCI])
	Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (20000ns max)
	Interrupt: pin A routed to IRQ 5
	Region 0: Memory at fe100000 (32-bit, non-prefetchable) [size=4K]

0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge
	Subsystem: ServerWorks: Unknown device 0230
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 03)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] 
0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 03)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] 
0000:00:11.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 03)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] 
0000:00:11.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 03)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] 
0000:03:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
	Subsystem: Dell Broadcom BCM5701 1000Base-T
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (16000ns min), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at fcf10000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [40] PCI-X non-bridge device.
		Command: DPERE- ERO- RBC=0 OST=0
		Status: Bus=3 Dev=6 Func=1 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
		Address: a526860a9e070490  Data: 4e6a

0000:03:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
	Subsystem: Dell Broadcom BCM5701 1000Base-T
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (16000ns min), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at fcf00000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [40] PCI-X non-bridge device.
		Command: DPERE- ERO+ RBC=0 OST=0
		Status: Bus=3 Dev=8 Func=1 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
		Address: 850016aeb9a02088  Data: 3569

0000:04:08.0 PCI bridge: Intel Corp. 80303 I/O Processor PCI-to-PCI Bridge (rev 01) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32, Cache Line Size: 0x10 (64 bytes)
	Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: fcc00000-fcdfffff
	Prefetchable memory behind bridge: f8100000-f81fffff
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
	Capabilities: [68] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:04:08.1 RAID bus controller: Dell PowerEdge Expandable RAID Controller 3/Di (rev 01)
	Subsystem: Dell: Unknown device 0121
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32, Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 19
	Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M]
	Expansion ROM at fcb00000 [disabled] [size=64K]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:05:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
	Subsystem: Dell PowerEdge 2400,2500,2550,4400
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (10000ns min, 6250ns max), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 7
	BIST result: 00
	Region 0: I/O ports at cc00 [size=256]
	Region 1: Memory at fccff000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at fcd00000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:05:06.1 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
	Subsystem: Dell PowerEdge 2400,2500,2550,4400
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (10000ns min, 6250ns max), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin B routed to IRQ 11
	BIST result: 00
	Region 0: I/O ports at c800 [size=256]
	Region 1: Memory at fccfe000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at f8100000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
------------[ end: 'lspci -vvv' ]------------


[7.6.] SCSI information (from /proc/scsi/scsi)

CONFIG_SCSI_PROC_FS is not set. I can provide scsi information
if you think it is relevant.

[7.7.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):
------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-08-27 14:23 PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718 cagri coltekin
@ 2006-08-28  0:16 ` Herbert Xu
  2006-08-28  0:49   ` cagri coltekin
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-08-28  0:16 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

cagri coltekin <cagri@ripe.net> wrote:
> 
> Aug 25 04:03:35 ns kernel: ------------[ cut here ]------------
> Aug 25 04:03:35 ns kernel: kernel BUG at net/ipv6/ip6_output.c:718!
> Aug 25 04:03:35 ns kernel: invalid operand: 0000 [#1]
> Aug 25 04:03:35 ns kernel: SMP 
> Aug 25 04:03:35 ns kernel: Modules linked in: uhci_hcd ehci_hcd ohci_hcd aic7xxx ide_cd
> Aug 25 04:03:35 ns kernel: CPU:    3
> Aug 25 04:03:35 ns kernel: EIP:    0060:[svc_create_socket+189/416]    Not tainted VLI
> Aug 25 04:03:35 ns kernel: EFLAGS: 00010282   (2.6.12.6-ncc-server) 

This is an ancient kernel.  Please really try 2.6.17 instead of just
talking about it (the line number confirms that it is 2.6.12).

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-08-28  0:16 ` Herbert Xu
@ 2006-08-28  0:49   ` cagri coltekin
  2006-08-29  8:28     ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-08-28  0:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

On Mon, Aug 28, 2006 at 10:16:56AM +1000, Herbert Xu wrote:
> cagri coltekin <cagri@ripe.net> wrote:
> > 
> > Aug 25 04:03:35 ns kernel: ------------[ cut here ]------------
> > Aug 25 04:03:35 ns kernel: kernel BUG at net/ipv6/ip6_output.c:718!
> > Aug 25 04:03:35 ns kernel: invalid operand: 0000 [#1]
> > Aug 25 04:03:35 ns kernel: SMP 
> > Aug 25 04:03:35 ns kernel: Modules linked in: uhci_hcd ehci_hcd ohci_hcd aic7xxx ide_cd
> > Aug 25 04:03:35 ns kernel: CPU:    3
> > Aug 25 04:03:35 ns kernel: EIP:    0060:[svc_create_socket+189/416]    Not tainted VLI
> > Aug 25 04:03:35 ns kernel: EFLAGS: 00010282   (2.6.12.6-ncc-server) 
> 
> This is an ancient kernel.  Please really try 2.6.17 instead of just
> talking about it (the line number confirms that it is 2.6.12).

Ooops, sorry for the confusion. It happens with 2.6.17 too (see
below), cut&paste from wrong log. The rest of the data provided
in the previous message is actually fresh.

Aug 26 07:09:36 ns kernel: [17180077.732000] ------------[ cut here ]------------
Aug 26 07:09:36 ns kernel: [17180077.792000] kernel BUG at net/ipv6/ip6_output.c:693!
Aug 26 07:09:36 ns kernel: [17180077.856000] invalid opcode: 0000 [#1]
Aug 26 07:09:36 ns kernel: [17180077.900000] SMP 
Aug 26 07:09:36 ns kernel: [17180077.928000] Modules linked in: ide_cd cdrom
Aug 26 07:09:36 ns kernel: [17180077.980000] CPU:    2
Aug 26 07:09:36 ns kernel: [17180077.980000] EIP:    0060:[ip6_fragment+619/1981]    Not tainted VLI
Aug 26 07:09:36 ns kernel: [17180077.980000] EFLAGS: 00010282   (2.6.17.11-ns-debug #6) 
Aug 26 07:09:36 ns kernel: [17180078.148000] EIP is at ip6_fragment+0x26b/0x7bd
Aug 26 07:09:36 ns kernel: [17180078.204000] eax: fffffff2   ebx: fffffdd8   ecx: 000005b8   edx: f5ecc600
Aug 26 07:09:36 ns kernel: [17180078.288000] esi: f5ecc7f8   edi: f5e7bff0   ebp: c2ff6780   esp: f71f5bb8
Aug 26 07:09:36 ns kernel: [17180078.376000] ds: 007b   es: 007b   ss: 0068
Aug 26 07:09:36 ns kernel: [17180078.428000] Process named (pid: 1811, threadinfo=f71f4000 task=f7470a10)
Aug 26 07:09:36 ns kernel: [17180078.508000] Stack: f7208880 000007e0 f5e7bff8 fffffdd8 f71f4000 f71f5bdc 5d000000 00000000 
Aug 26 07:09:36 ns kernel: [17180078.612000]        000007e0 0e030000 ffffffee 000007e0 fffffddc f5e7bff0 f7fd7880 f5ecc048 
Aug 26 07:09:36 ns kernel: [17180078.720000]        f7208880 f7fd7880 f5ecc040 f774c080 c02adcc6 f7208880 c02adac2 c02afcc6 
Aug 26 07:09:36 ns kernel: [17180078.824000] Call Trace:
Aug 26 07:09:36 ns kernel: [17180078.860000]  <c02adcc6> ip6_output+0x3c/0x4c  <c02adac2> ip6_output2+0x0/0x1c8
Aug 26 07:09:36 ns kernel: [17180078.948000]  <c02afcc6> ip6_push_pending_frames+0x250/0x390  <c02c09ea> udp_v6_push_pending_frames+0x13d/0x1a4
Aug 26 07:09:36 ns kernel: [17180079.072000]  <c02c0fdb> udpv6_sendmsg+0x58a/0x953  <c0291d36> udp_recvmsg+0x56/0x24c
Aug 26 07:09:36 ns kernel: [17180079.172000]  <c02986e6> inet_sendmsg+0x4a/0x56  <c0253256> sock_sendmsg+0xeb/0x105
Aug 26 07:09:36 ns kernel: [17180079.264000]  <c01c18cc> __next_cpu+0x22/0x31  <c01167c7> find_busiest_group+0xd6/0x305
Aug 26 07:09:36 ns kernel: [17180079.364000]  <c01177e6> dependent_sleeper+0x1ec/0x32d  <c012f91e> autoremove_wake_function+0x0/0x57
Aug 26 07:09:36 ns kernel: [17180079.476000]  <c01c662e> copy_from_user+0x46/0x7c  <c01c662e> copy_from_user+0x46/0x7c
Aug 26 07:09:36 ns kernel: [17180079.576000]  <c0254d9d> sys_sendmsg+0x191/0x1f8  <c01334c6> futex_wait+0x129/0x238
Aug 26 07:09:36 ns kernel: [17180079.672000]  <c014b75c> find_extend_vma+0x29/0x7e  <c0117927> default_wake_function+0x0/0x12
Aug 26 07:09:36 ns kernel: [17180079.776000]  <c0132b91> futex_wake+0x4a/0xba  <c01667a8> pipe_write+0x0/0x3b
Aug 26 07:09:36 ns kernel: [17180079.864000]  <c01c662e> copy_from_user+0x46/0x7c  <c0255243> sys_socketcall+0x236/0x254
Aug 26 07:09:36 ns kernel: [17180079.964000]  <c0102be3> syscall_call+0x7/0xb 
Aug 26 07:09:36 ns kernel: [17180080.020000] Code: 24 8b 44 24 34 89 50 04 89 5c 24 0c 8b 4c 24 20 8b 45 1c 89 4c 24 04 89 44 24 08 8b 44 24 54 89 04 24 e8 25 a6 fa ff 85 c0 74 08 <0f> 0b b5 02 21 fb 30 c0 0f b7 44 24 1c 8b 4c 24 34 89 c2 c1 e8 
Aug 26 07:09:36 ns kernel: [17180080.264000] EIP: [ip6_fragment+619/1981] ip6_fragment+0x26b/0x7bd SS:ESP 0068:f71f5bb8
----------------------------------------------------------------------


-- 
cagri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-08-28  0:49   ` cagri coltekin
@ 2006-08-29  8:28     ` Herbert Xu
  2006-08-31 15:12       ` cagri coltekin
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-08-29  8:28 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Mon, Aug 28, 2006 at 02:49:07AM +0200, cagri coltekin wrote:
> 
> Ooops, sorry for the confusion. It happens with 2.6.17 too (see
> below), cut&paste from wrong log. The rest of the data provided
> in the previous message is actually fresh.

Thanks.  Please try this patch and tell me if it prints anything out.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 4fb47a2..5e2e4ea 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -508,6 +508,10 @@ static int ip6_fragment(struct sk_buff *
 	dev = rt->u.dst.dev;
 	hlen = ip6_find_1stfragopt(skb, &prevhdr);
 	nexthdr = *prevhdr;
+	if (unlikely(hlen > skb->len)) {
+		printk(KERN_CRIT "ip6_fragment: hlen = 0x%x, len = 0x%x, nexthdr=%d\n", hlen, skb->len, nexthdr);
+		BUG();
+	}
 
 	mtu = dst_mtu(&rt->u.dst);
 	if (np && np->frag_size < mtu) {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-08-29  8:28     ` Herbert Xu
@ 2006-08-31 15:12       ` cagri coltekin
  2006-09-01  7:05         ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-08-31 15:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

Hi Again,

It took a while to find equipment for test environment, but now I
have a test environment that I can test.

Here is the result:

---------------------------------------------------------------------------
[17180051.768000] ip6_fragment: hlen = 0x818, len = 0x7ce, nexthdr=4
[17180051.840000] ------------[ cut here ]------------
[17180051.840000] kernel BUG at net/ipv6/ip6_output.c:510!
[17180051.840000] invalid opcode: 0000 [#1]
[17180051.840000] SMP 
[17180051.840000] Modules linked in: ipmi_si ipmi_msghandler ide_cd cdrom
[17180051.840000] CPU:    0
[17180051.840000] EIP:    0060:[<c02bc6bd>]    Not tainted VLI
[17180051.840000] EFLAGS: 00010296   (2.6.17.11-ns-pri-debug-p1 #6) 
[17180051.840000] EIP is at ip6_fragment+0x7f6/0x803
[17180051.840000] eax: 00000048   ebx: f75c4c5c   ecx: c038f5bc   edx: 00000286
[17180051.840000] esi: f7605c50   edi: 00000000   ebp: f76e2c80   esp: f7605bb8
[17180051.840000] ds: 007b   es: 007b   ss: 0068
[17180051.840000] Process named (pid: 1899, threadinfo=f7604000 task=f75cead0)
[17180051.840000] Stack: c0324600 00000818 000007ce 00000004 00000000 f7605bdc 04000000 00000000 
[17180051.840000]        ffd14ca4 00000000 f7605ea8 00000818 f77a4040 000001fe f755d080 f7976048 
[17180051.840000]        f76e2c80 f7605c50 f7976040 f75c4a80 c02bb612 f76e2c80 c02bb40e c02bd66a 
[17180051.840000] Call Trace:
[17180051.840000]  <c02bb612> ip6_output+0x3c/0x4c  <c02bb40e> ip6_output2+0x0/0x1c8
[17180051.840000]  <c02bd66a> ip6_push_pending_frames+0x250/0x390  <c02ce38e> udp_v6_push_pending_frames+0x13d/0x1a4
[17180051.840000]  <c02ce97f> udpv6_sendmsg+0x58a/0x953  <c02cd7c2> udpv6_recvmsg+0x20c/0x303
[17180051.840000]  <c02a6032> inet_sendmsg+0x4a/0x56  <c0260b82> sock_sendmsg+0xeb/0x105
[17180051.840000]  <c01c18cc> __next_cpu+0x22/0x31  <c01167c7> find_busiest_group+0xd6/0x305
[17180051.840000]  <c012f91e> autoremove_wake_function+0x0/0x57  <c01c662e> copy_from_user+0x46/0x7c
[17180051.840000]  <c01c662e> copy_from_user+0x46/0x7c  <c02626c9> sys_sendmsg+0x191/0x1f8
[17180051.840000]  <c01334c6> futex_wait+0x129/0x238  <c014b75c> find_extend_vma+0x29/0x7e
[17180051.840000]  <c0117927> default_wake_function+0x0/0x12  <c0132b91> futex_wake+0x4a/0xba
[17180051.840000]  <c01c662e> copy_from_user+0x46/0x7c  <c0262b6f> sys_socketcall+0x236/0x254
[17180051.840000]  <c0102be3> syscall_call+0x7/0xb 
[17180051.840000] Code: 50 60 e9 36 f9 ff ff 0f b6 44 24 1b 8b 54 24 2c 89 44 24 0c 8b 45 60 c7 04 24 00 46 32 c0 89 54 24 04 89 44 24 08 e8 50 07 e6 ff <0f> 0b fe 01 41 13 32 c0 e9 68 f8 ff ff 55 57 56 31 f6 53 83 ec 
[17180051.840000] EIP: [<c02bc6bd>] ip6_fragment+0x7f6/0x803 SS:ESP 0068:f7605bb8
---------------------------------------------------------------------------

I hope this helps.

Cheers,
-- 
cagri

On Tue, Aug 29, 2006 at 06:28:28PM +1000, Herbert Xu wrote:
> 
> Thanks.  Please try this patch and tell me if it prints anything out.
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 4fb47a2..5e2e4ea 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -508,6 +508,10 @@ static int ip6_fragment(struct sk_buff *
>  	dev = rt->u.dst.dev;
>  	hlen = ip6_find_1stfragopt(skb, &prevhdr);
>  	nexthdr = *prevhdr;
> +	if (unlikely(hlen > skb->len)) {
> +		printk(KERN_CRIT "ip6_fragment: hlen = 0x%x, len = 0x%x, nexthdr=%d\n", hlen, skb->len, nexthdr);
> +		BUG();
> +	}
>  
>  	mtu = dst_mtu(&rt->u.dst);
>  	if (np && np->frag_size < mtu) {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-08-31 15:12       ` cagri coltekin
@ 2006-09-01  7:05         ` Herbert Xu
  2006-09-01 16:22           ` cagri coltekin
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-09-01  7:05 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Thu, Aug 31, 2006 at 05:12:43PM +0200, cagri coltekin wrote:
> 
> It took a while to find equipment for test environment, but now I
> have a test environment that I can test.
> 
> Here is the result:
> 
> ---------------------------------------------------------------------------
> [17180051.768000] ip6_fragment: hlen = 0x818, len = 0x7ce, nexthdr=4

Thanks for the result.  It looks like something is screwed up with the
extension headers.  What version of bind are you using?

Please try the following patch instead to see if we can further isolate
the problem.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 4fb47a2..e5ba216 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -508,6 +508,10 @@ static int ip6_fragment(struct sk_buff *
 	dev = rt->u.dst.dev;
 	hlen = ip6_find_1stfragopt(skb, &prevhdr);
 	nexthdr = *prevhdr;
+	if (unlikely(hlen > skb->len)) {
+		printk(KERN_CRIT "ip6_fragment: hlen = 0x%x, len = 0x%x, nexthdr = %d\n", hlen, skb->len, skb->nh.ipv6h->nexthdr);
+		BUG();
+	}
 
 	mtu = dst_mtu(&rt->u.dst);
 	if (np && np->frag_size < mtu) {
@@ -1204,6 +1208,8 @@ int ip6_push_pending_frames(struct sock 
 	struct flowi *fl = &inet->cork.fl;
 	unsigned char proto = fl->proto;
 	int err = 0;
+	u8 *prevhdr;
+	unsigned int hlen;
 
 	if ((skb = __skb_dequeue(&sk->sk_write_queue)) == NULL)
 		goto out;
@@ -1249,6 +1255,14 @@ int ip6_push_pending_frames(struct sock 
 
 	skb->dst = dst_clone(&rt->u.dst);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);	
+
+	hlen = ip6_find_1stfragopt(skb, &prevhdr);
+	if (unlikely(hlen > skb->len)) {
+		printk(KERN_CRIT "ip6_push: hlen = 0x%x, len = 0x%x, nexthdr1 = %d, nexthdr2 = %d, proto = %d\n", hlen, skb->len, skb->nh.ipv6h->nexthdr, *prevhdr, proto);
+		printk(KERN_CRIT "ip6_push: opt = 0x%x, flen = %d, nflen = %d\n", (unsigned int)opt, opt ? opt->opt_flen : 0, opt ? opt->opt_nflen : 0);
+		BUG();
+	}
+
 	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output);
 	if (err) {
 		if (err > 0)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-01  7:05         ` Herbert Xu
@ 2006-09-01 16:22           ` cagri coltekin
  2006-09-25 12:15             ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-09-01 16:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

On Fri, Sep 01, 2006 at 05:05:57PM +1000, Herbert Xu wrote:
> On Thu, Aug 31, 2006 at 05:12:43PM +0200, cagri coltekin wrote:
> > 
> > It took a while to find equipment for test environment, but now I
> > have a test environment that I can test.
> > 
> > Here is the result:
> > 
> > ---------------------------------------------------------------------------
> > [17180051.768000] ip6_fragment: hlen = 0x818, len = 0x7ce, nexthdr=4
> 
> Thanks for the result.  It looks like something is screwed up with the
> extension headers.  What version of bind are you using?

It's bind 9.3.2, the version we were using had a specific patch.
However, I've just tested with non-patched bind 9.3.2, it does it
too. The system has large number of zones, with most of them
DNSSEC enabled. That may be the reason for the peculiarity. I can
send configuration/zone files etc. if it would be helpful.

> Please try the following patch instead to see if we can further isolate
> the problem.

The second causes the system to give the bug a couple of seconds
after bind starts, and loads the zones, without any traffic going
on. BTW, patch applied with some offset difference (3 for the
first -48 for the other two changes), on a pristine 2.6.17.11
source tree.

Here is the new result:

---------------------------------------------------------------------------------------------
[17199663.616000] ip6_push: hlen = 0x388, len = 0x8f, nexthdr1 = 0, nexthdr2 = 162, proto = 0
[17199663.712000] ip6_push: opt = 0x0, flen = 0, nflen = 0
[17199663.776000] ------------[ cut here ]------------
[17199663.836000] kernel BUG at net/ipv6/ip6_output.c:1215!
[17199663.896000] invalid opcode: 0000 [#1]
[17199663.944000] SMP 
[17199663.972000] Modules linked in: ipmi_si ipmi_msghandler ide_cd cdrom
[17199664.048000] CPU:    1
[17199664.048000] EIP:    0060:[<c02bd7b9>]    Not tainted VLI
[17199664.048000] EFLAGS: 00010282   (2.6.17.11-ns-pri-debug-p2 #1) 
[17199664.220000] EIP is at ip6_push_pending_frames+0x39d/0x42e
[17199664.288000] eax: 0000003e   ebx: f60fae80   ecx: c038f5bc   edx: 00000286
[17199664.372000] esi: f7258d80   edi: f782ea40   ebp: f6171d00   esp: f60f7c0c
[17199664.456000] ds: 007b   es: 007b   ss: 0068
[17199664.508000] Process named (pid: 15561, threadinfo=f60f6000 task=f7ae9030)
[17199664.592000] Stack: c03246e0 00000000 00000000 00000000 000000a2 00000000 f6171e88 f7258d80 
[17199664.696000]        00000000 f6171edc f782ea48 f60f7c40 00000000 00000000 00000000 00000000 
[17199664.800000]        00000000 f6171e90 f6171ea0 f6171e88 f782ea40 c02ce42e f6171d00 00000008 
[17199664.904000] Call Trace:
[17199664.936000]  <c02ce42e> udp_v6_push_pending_frames+0x13d/0x1a4  <c02cea1f> udpv6_sendmsg+0x58a/0x953
[17199665.048000]  <c02a6032> inet_sendmsg+0x4a/0x56  <c0260b82> sock_sendmsg+0xeb/0x105
[17199665.144000]  <c01c18cc> __next_cpu+0x22/0x31  <c01167c7> find_busiest_group+0xd6/0x305
[17199665.244000]  <c0173c22> file_update_time+0x48/0xcb  <c01177e6> dependent_sleeper+0x1ec/0x32d
[17199665.348000]  <c012f91e> autoremove_wake_function+0x0/0x57  <c01c662e> copy_from_user+0x46/0x7c
[17199665.456000]  <c0267b9c> verify_iovec+0x3c/0x94  <c02626c9> sys_sendmsg+0x191/0x1f8
[17199665.548000]  <c02e4ff7> schedule_timeout+0xa8/0xaa  <c0133356> unqueue_me+0x56/0x9d
[17199665.644000]  <c012f726> add_wait_queue+0x1a/0x46  <c013356a> futex_wait+0x1cd/0x238
[17199665.740000]  <c014b75c> find_extend_vma+0x29/0x7e  <c01c18cc> __next_cpu+0x22/0x31
[17199665.832000]  <c01177e6> dependent_sleeper+0x1ec/0x32d  <c01c662e> copy_from_user+0x46/0x7c
[17199665.936000]  <c0262b6f> sys_socketcall+0x236/0x254  <c0102be3> syscall_call+0x7/0xb
[17199666.032000] Code: 20 89 44 24 0c 31 c0 85 d2 74 08 8b 54 24 20 0f b7 42 04 89 44 24 08 c7 04 24 e0 46 32 c0 8b 44 24 20 89 44 24 04 e8 54 f6 e5 ff <0f> 0b bf 04 41 13 32 c0 e9 b8 fe ff ff 66 c7 41 04 00 00 e9 21 
[17199666.268000] EIP: [<c02bd7b9>] ip6_push_pending_frames+0x39d/0x42e SS:ESP 0068:f60f7c0c
---------------------------------------------------------------------------------------------

Cheers,
-- 
cagri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-01 16:22           ` cagri coltekin
@ 2006-09-25 12:15             ` Herbert Xu
  2006-09-26 11:21               ` cagri coltekin
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-09-25 12:15 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Fri, Sep 01, 2006 at 06:22:48PM +0200, cagri coltekin wrote:
>
> The second causes the system to give the bug a couple of seconds
> after bind starts, and loads the zones, without any traffic going
> on. BTW, patch applied with some offset difference (3 for the
> first -48 for the other two changes), on a pristine 2.6.17.11
> source tree.

Well the good news is that I found a bug with MSG_PROBE that can
cause exactly what you're seeing.  The bad news is that bind doesn't
use MSG_PROBE :)

So please try this patch to narrow the problem down further.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6671691..637b5c4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -990,8 +990,10 @@ int ip6_append_data(struct sock *sk, int
 	int offset = 0;
 	int csummode = CHECKSUM_NONE;
 
-	if (flags&MSG_PROBE)
+	if (flags&MSG_PROBE) {
+		WARN_ON(1);
 		return 0;
+	}
 	if (skb_queue_empty(&sk->sk_write_queue)) {
 		/*
 		 * setup for corking
@@ -1013,6 +1015,7 @@ int ip6_append_data(struct sock *sk, int
 		dst_hold(&rt->u.dst);
 		np->cork.rt = rt;
 		inet->cork.fl = *fl;
+		BUG_ON(!fl->proto);
 		np->cork.hop_limit = hlimit;
 		np->cork.tclass = tclass;
 		mtu = dst_mtu(rt->u.dst.path);
@@ -1032,6 +1035,7 @@ int ip6_append_data(struct sock *sk, int
 	} else {
 		rt = np->cork.rt;
 		fl = &inet->cork.fl;
+		BUG_ON(!fl->proto);
 		if (inet->cork.flags & IPCORK_OPT)
 			opt = np->cork.opt;
 		transhdrlen = 0;
@@ -1285,6 +1289,7 @@ int ip6_push_pending_frames(struct sock 
 
 	if ((skb = __skb_dequeue(&sk->sk_write_queue)) == NULL)
 		goto out;
+	BUG_ON(!proto);
 	tail_skb = &(skb_shinfo(skb)->frag_list);
 
 	/* move skb->data to ip header from ext header */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-25 12:15             ` Herbert Xu
@ 2006-09-26 11:21               ` cagri coltekin
  2006-09-28  0:38                 ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-09-26 11:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

Hi,

On Mon, Sep 25, 2006 at 10:15:30PM +1000, Herbert Xu wrote:
> On Fri, Sep 01, 2006 at 06:22:48PM +0200, cagri coltekin wrote:
> >
> > The second causes the system to give the bug a couple of seconds
> > after bind starts, and loads the zones, without any traffic going
> > on. BTW, patch applied with some offset difference (3 for the
> > first -48 for the other two changes), on a pristine 2.6.17.11
> > source tree.
> 
> Well the good news is that I found a bug with MSG_PROBE that can
> cause exactly what you're seeing.  The bad news is that bind doesn't
> use MSG_PROBE :)
> 
> So please try this patch to narrow the problem down further.

This time I applied patch to 2.6.18. The patch applied with some
offset difference. I can stick to a version you suggest if 2.6.18
is not a good. Here is the new bug message:

------------------------------------------------------------------------------
[ 1395.890897] ------------[ cut here ]------------
[ 1395.946093] kernel BUG at net/ipv6/ip6_output.c:940!
[ 1396.005441] invalid opcode: 0000 [#1]
[ 1396.049225] SMP 
[ 1396.071419] Modules linked in: ipmi_si ipmi_msghandler ide_cd cdrom
[ 1396.146853] CPU:    2
[ 1396.146854] EIP:    0060:[<c02c6148>]    Not tainted VLI
[ 1396.146855] EFLAGS: 00010246   (2.6.18-ns-pri-debug-p3 #2) 
[ 1396.304174] EIP is at ip6_append_data+0xaf8/0xbd6
[ 1396.360405] eax: f7534d00   ebx: 00000000   ecx: f7534e9c   edx: f68f4480
[ 1396.441552] esi: f7534ee4   edi: f7534ee4   ebp: f7534ef0   esp: f742bc20
[ 1396.522691] ds: 007b   es: 007b   ss: 0068
[ 1396.571655] Process named (pid: 1897, ti=f742a000 task=c2b2c030 task.ti=f742)
[ 1396.659026] Stack: f68f4480 c03c3cb4 f742bf00 c02ef7e2 c02ce658 c02ce658 c03 
[ 1396.759947]        00000002 c02ef7e2 f7534eb4 f7534d70 00000000 00000000 f74 
[ 1396.860803]        f742bce4 c02c55c5 f7534d00 f7534e9c f7534d00 00000286 f74 
[ 1396.961659] Call Trace:
[ 1396.993128]  [<c02ef7e2>] _read_unlock_bh+0x12/0x16
[ 1397.051544]  [<c02ce658>] ip6_route_output+0xeb/0x1e9
[ 1397.112038]  [<c02ce658>] ip6_route_output+0xeb/0x1e9
[ 1397.172535]  [<c02ef7e2>] _read_unlock_bh+0x12/0x16
[ 1397.230952]  [<c02c55c5>] ip6_dst_lookup_tail+0xc6/0xd0
[ 1397.293524]  [<c02d7e29>] udpv6_sendmsg+0x3d4/0x9ac
[ 1397.351936]  [<c028b4a2>] ip_generic_getfrag+0x0/0xaf
[ 1397.412431]  [<c02d6e22>] udpv6_recvmsg+0x20c/0x303
[ 1397.470846]  [<c02ae7b3>] inet_sendmsg+0x4a/0x56
[ 1397.526148]  [<c02682f4>] sock_sendmsg+0xe8/0x101
[ 1397.582494]  [<c01306ca>] autoremove_wake_function+0x0/0x57
[ 1397.649214]  [<c01cadc4>] copy_from_user+0x46/0x7e
[ 1397.706594]  [<c0269e4b>] sys_sendmsg+0x191/0x1f8
[ 1397.762941]  [<c014fe63>] find_extend_vma+0x29/0x7e
[ 1397.821357]  [<c0133bca>] get_futex_key+0x4c/0x126
[ 1397.878740]  [<c0135be8>] do_futex+0x6c/0x10a
[ 1397.930928]  [<c01cadc4>] copy_from_user+0x46/0x7e
[ 1397.988307]  [<c026a2f1>] sys_socketcall+0x236/0x254
[ 1398.047762]  [<c0102cdf>] syscall_call+0x7/0xb
[ 1398.100989] Code: 34 c7 44 24 04 5a 00 00 00 89 4c 24 0c e8 89 02 02 00 b8 a 
[ 1398.333299] EIP: [<c02c6148>] ip6_append_data+0xaf8/0xbd6 SS:ESP 0068:f742bc0
------------------------------------------------------------------------------

Cheers,
-- 
cagri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-26 11:21               ` cagri coltekin
@ 2006-09-28  0:38                 ` Herbert Xu
  2006-09-28  8:40                   ` cagri coltekin
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-09-28  0:38 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Tue, Sep 26, 2006 at 01:21:22PM +0200, cagri coltekin wrote:
>
> ------------------------------------------------------------------------------
> [ 1395.890897] ------------[ cut here ]------------
> [ 1395.946093] kernel BUG at net/ipv6/ip6_output.c:940!

Could you go further back in the logs to see if there was a
warning message? Either that or turn the WARN_ON into a BUG.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-28  0:38                 ` Herbert Xu
@ 2006-09-28  8:40                   ` cagri coltekin
  2006-10-03  5:49                     ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: cagri coltekin @ 2006-09-28  8:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

On Thu, Sep 28, 2006 at 10:38:29AM +1000, Herbert Xu wrote:
> On Tue, Sep 26, 2006 at 01:21:22PM +0200, cagri coltekin wrote:
> >
> > ------------------------------------------------------------------------------
> > [ 1395.890897] ------------[ cut here ]------------
> > [ 1395.946093] kernel BUG at net/ipv6/ip6_output.c:940!
> 
> Could you go further back in the logs to see if there was a
> warning message? Either that or turn the WARN_ON into a BUG.

No. Bug is the first after boot:

[   34.042841] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   44.110469] eth0: no IPv6 routers present
[   80.968012] process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT
[   81.452248] process `named' is using obsolete setsockopt SO_BSDCOMPAT
[  110.559560] process `lwresd' is using obsolete setsockopt SO_BSDCOMPAT
[  140.568831] process `named' is using obsolete setsockopt SO_BSDCOMPAT
[ 1395.890897] ------------[ cut here ]------------
[ 1395.946093] kernel BUG at net/ipv6/ip6_output.c:940!
[ 1396.005441] invalid opcode: 0000 [#1]

Cheers,
-- 
cagri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-09-28  8:40                   ` cagri coltekin
@ 2006-10-03  5:49                     ` Herbert Xu
  2006-10-03  6:28                       ` Herbert Xu
  2006-10-03 13:56                       ` James Morris
  0 siblings, 2 replies; 15+ messages in thread
From: Herbert Xu @ 2006-10-03  5:49 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Thu, Sep 28, 2006 at 10:40:18AM +0200, cagri coltekin wrote:
>
> No. Bug is the first after boot:

OK, I think I've got the right bug this time.

[UDP6]: Fix flowi clobbering

The udp6_sendmsg function uses a shared buffer to store the
flow without taking any locks.  This leads to races with SMP.
This patch moves the flowi object onto the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

This bug is pretty old so we need the fix for 2.6.18 too.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -546,7 +546,7 @@ static int udpv6_sendmsg(struct kiocb *i
 	struct in6_addr *daddr, *final_p = NULL, final;
 	struct ipv6_txoptions *opt = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
-	struct flowi *fl = &inet->cork.fl;
+	struct flowi fl;
 	struct dst_entry *dst;
 	int addr_len = msg->msg_namelen;
 	int ulen = len;
@@ -626,19 +626,19 @@ do_udp_sendmsg:
 	}
 	ulen += sizeof(struct udphdr);
 
-	memset(fl, 0, sizeof(*fl));
+	memset(&fl, 0, sizeof(fl));
 
 	if (sin6) {
 		if (sin6->sin6_port == 0)
 			return -EINVAL;
 
-		fl->fl_ip_dport = sin6->sin6_port;
+		fl.fl_ip_dport = sin6->sin6_port;
 		daddr = &sin6->sin6_addr;
 
 		if (np->sndflow) {
-			fl->fl6_flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
-			if (fl->fl6_flowlabel&IPV6_FLOWLABEL_MASK) {
-				flowlabel = fl6_sock_lookup(sk, fl->fl6_flowlabel);
+			fl.fl6_flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
+			if (fl.fl6_flowlabel&IPV6_FLOWLABEL_MASK) {
+				flowlabel = fl6_sock_lookup(sk, fl.fl6_flowlabel);
 				if (flowlabel == NULL)
 					return -EINVAL;
 				daddr = &flowlabel->dst;
@@ -656,32 +656,32 @@ do_udp_sendmsg:
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    sin6->sin6_scope_id &&
 		    ipv6_addr_type(daddr)&IPV6_ADDR_LINKLOCAL)
-			fl->oif = sin6->sin6_scope_id;
+			fl.oif = sin6->sin6_scope_id;
 	} else {
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 
-		fl->fl_ip_dport = inet->dport;
+		fl.fl_ip_dport = inet->dport;
 		daddr = &np->daddr;
-		fl->fl6_flowlabel = np->flow_label;
+		fl.fl6_flowlabel = np->flow_label;
 		connected = 1;
 	}
 
-	if (!fl->oif)
-		fl->oif = sk->sk_bound_dev_if;
+	if (!fl.oif)
+		fl.oif = sk->sk_bound_dev_if;
 
 	if (msg->msg_controllen) {
 		opt = &opt_space;
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
 		opt->tot_len = sizeof(*opt);
 
-		err = datagram_send_ctl(msg, fl, opt, &hlimit, &tclass);
+		err = datagram_send_ctl(msg, &fl, opt, &hlimit, &tclass);
 		if (err < 0) {
 			fl6_sock_release(flowlabel);
 			return err;
 		}
-		if ((fl->fl6_flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
-			flowlabel = fl6_sock_lookup(sk, fl->fl6_flowlabel);
+		if ((fl.fl6_flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
+			flowlabel = fl6_sock_lookup(sk, fl.fl6_flowlabel);
 			if (flowlabel == NULL)
 				return -EINVAL;
 		}
@@ -695,39 +695,39 @@ do_udp_sendmsg:
 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
 	opt = ipv6_fixup_options(&opt_space, opt);
 
-	fl->proto = IPPROTO_UDP;
-	ipv6_addr_copy(&fl->fl6_dst, daddr);
-	if (ipv6_addr_any(&fl->fl6_src) && !ipv6_addr_any(&np->saddr))
-		ipv6_addr_copy(&fl->fl6_src, &np->saddr);
-	fl->fl_ip_sport = inet->sport;
+	fl.proto = IPPROTO_UDP;
+	ipv6_addr_copy(&fl.fl6_dst, daddr);
+	if (ipv6_addr_any(&fl.fl6_src) && !ipv6_addr_any(&np->saddr))
+		ipv6_addr_copy(&fl.fl6_src, &np->saddr);
+	fl.fl_ip_sport = inet->sport;
 	
 	/* merge ip6_build_xmit from ip6_output */
 	if (opt && opt->srcrt) {
 		struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-		ipv6_addr_copy(&final, &fl->fl6_dst);
-		ipv6_addr_copy(&fl->fl6_dst, rt0->addr);
+		ipv6_addr_copy(&final, &fl.fl6_dst);
+		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
 		final_p = &final;
 		connected = 0;
 	}
 
-	if (!fl->oif && ipv6_addr_is_multicast(&fl->fl6_dst)) {
-		fl->oif = np->mcast_oif;
+	if (!fl.oif && ipv6_addr_is_multicast(&fl.fl6_dst)) {
+		fl.oif = np->mcast_oif;
 		connected = 0;
 	}
 
-	security_sk_classify_flow(sk, fl);
+	security_sk_classify_flow(sk, &fl);
 
-	err = ip6_sk_dst_lookup(sk, &dst, fl);
+	err = ip6_sk_dst_lookup(sk, &dst, &fl);
 	if (err)
 		goto out;
 	if (final_p)
-		ipv6_addr_copy(&fl->fl6_dst, final_p);
+		ipv6_addr_copy(&fl.fl6_dst, final_p);
 
-	if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0)
+	if ((err = xfrm_lookup(&dst, &fl, sk, 0)) < 0)
 		goto out;
 
 	if (hlimit < 0) {
-		if (ipv6_addr_is_multicast(&fl->fl6_dst))
+		if (ipv6_addr_is_multicast(&fl.fl6_dst))
 			hlimit = np->mcast_hops;
 		else
 			hlimit = np->hop_limit;
@@ -763,7 +763,7 @@ back_from_confirm:
 do_append_data:
 	up->len += ulen;
 	err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
-		sizeof(struct udphdr), hlimit, tclass, opt, fl,
+		sizeof(struct udphdr), hlimit, tclass, opt, &fl,
 		(struct rt6_info*)dst,
 		corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
 	if (err)
@@ -774,10 +774,10 @@ do_append_data:
 	if (dst) {
 		if (connected) {
 			ip6_dst_store(sk, dst,
-				      ipv6_addr_equal(&fl->fl6_dst, &np->daddr) ?
+				      ipv6_addr_equal(&fl.fl6_dst, &np->daddr) ?
 				      &np->daddr : NULL,
 #ifdef CONFIG_IPV6_SUBTREES
-				      ipv6_addr_equal(&fl->fl6_src, &np->saddr) ?
+				      ipv6_addr_equal(&fl.fl6_src, &np->saddr) ?
 				      &np->saddr :
 #endif
 				      NULL);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-10-03  5:49                     ` Herbert Xu
@ 2006-10-03  6:28                       ` Herbert Xu
  2006-10-03 14:57                         ` cagri coltekin
  2006-10-03 13:56                       ` James Morris
  1 sibling, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2006-10-03  6:28 UTC (permalink / raw)
  To: cagri coltekin; +Cc: netdev, davem, pekkas

On Tue, Oct 03, 2006 at 03:49:35PM +1000, Herbert Xu wrote:
>
> OK, I think I've got the right bug this time.

Here is the patch for the other bug that I found along the way:

[UDP6]: Fix MSG_PROBE crash

UDP tracks corking status through the pending variable.  The
IP layer also tracks it through the socket write queue.  It
is possible for the two to get out of sync when MSG_PROBE is
used.

This patch changes UDP to check the write queue to ensure
that the two stay in sync.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -675,6 +675,8 @@ do_append_data:
 		udp_flush_pending_frames(sk);
 	else if (!corkreq)
 		err = udp_push_pending_frames(sk, up);
+	else if (unlikely(skb_queue_empty(&sk->sk_write_queue)))
+		up->pending = 0;
 	release_sock(sk);
 
 out:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -770,6 +770,8 @@ do_append_data:
 		udp_v6_flush_pending_frames(sk);
 	else if (!corkreq)
 		err = udp_v6_push_pending_frames(sk, up);
+	else if (unlikely(skb_queue_empty(&sk->sk_write_queue)))
+		up->pending = 0;
 
 	if (dst) {
 		if (connected) {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-10-03  5:49                     ` Herbert Xu
  2006-10-03  6:28                       ` Herbert Xu
@ 2006-10-03 13:56                       ` James Morris
  1 sibling, 0 replies; 15+ messages in thread
From: James Morris @ 2006-10-03 13:56 UTC (permalink / raw)
  To: Herbert Xu; +Cc: cagri coltekin, netdev, davem, pekkas

On Tue, 3 Oct 2006, Herbert Xu wrote:

> On Thu, Sep 28, 2006 at 10:40:18AM +0200, cagri coltekin wrote:
> >
> > No. Bug is the first after boot:
> 
> OK, I think I've got the right bug this time.
> 
> [UDP6]: Fix flowi clobbering
> 
> The udp6_sendmsg function uses a shared buffer to store the
> flow without taking any locks.  This leads to races with SMP.
> This patch moves the flowi object onto the stack.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Nice catch.

Acked-by: James Morris <jmorris@namei.org>



-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718
  2006-10-03  6:28                       ` Herbert Xu
@ 2006-10-03 14:57                         ` cagri coltekin
  0 siblings, 0 replies; 15+ messages in thread
From: cagri coltekin @ 2006-10-03 14:57 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem, pekkas

On Tue, Oct 03, 2006 at 04:28:20PM +1000, Herbert Xu wrote:
> On Tue, Oct 03, 2006 at 03:49:35PM +1000, Herbert Xu wrote:
> >
> > OK, I think I've got the right bug this time.
> 
> Here is the patch for the other bug that I found along the way:
> 
> [UDP6]: Fix MSG_PROBE crash
> 

This one fixes. Thanks!

The patch does not cleanly apply to 2.6.18, needed some manual
tweaking (the patch that applies cleanly to vanilla 2.6.18 is
below in case it has any use).

Cheers,
-- 
cagri


--- linux-2.6.18/net/ipv6/udp.c	2006-09-20 05:42:06.000000000 +0200
+++ linux-2.6.18-p4/net/ipv6/udp.c	2006-10-03 08:57:31.000000000 +0200
@@ -613,7 +613,7 @@
 	struct in6_addr *daddr, *final_p = NULL, final;
 	struct ipv6_txoptions *opt = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
-	struct flowi *fl = &inet->cork.fl;
+	struct flowi fl;
 	struct dst_entry *dst;
 	int addr_len = msg->msg_namelen;
 	int ulen = len;
@@ -693,19 +693,19 @@
 	}
 	ulen += sizeof(struct udphdr);
 
-	memset(fl, 0, sizeof(*fl));
+	memset(&fl, 0, sizeof(fl));
 
 	if (sin6) {
 		if (sin6->sin6_port == 0)
 			return -EINVAL;
 
-		fl->fl_ip_dport = sin6->sin6_port;
+		fl.fl_ip_dport = sin6->sin6_port;
 		daddr = &sin6->sin6_addr;
 
 		if (np->sndflow) {
-			fl->fl6_flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
-			if (fl->fl6_flowlabel&IPV6_FLOWLABEL_MASK) {
-				flowlabel = fl6_sock_lookup(sk, fl->fl6_flowlabel);
+			fl.fl6_flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
+			if (fl.fl6_flowlabel&IPV6_FLOWLABEL_MASK) {
+				flowlabel = fl6_sock_lookup(sk, fl.fl6_flowlabel);
 				if (flowlabel == NULL)
 					return -EINVAL;
 				daddr = &flowlabel->dst;
@@ -723,32 +723,32 @@
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    sin6->sin6_scope_id &&
 		    ipv6_addr_type(daddr)&IPV6_ADDR_LINKLOCAL)
-			fl->oif = sin6->sin6_scope_id;
+			fl.oif = sin6->sin6_scope_id;
 	} else {
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 
-		fl->fl_ip_dport = inet->dport;
+		fl.fl_ip_dport = inet->dport;
 		daddr = &np->daddr;
-		fl->fl6_flowlabel = np->flow_label;
+		fl.fl6_flowlabel = np->flow_label;
 		connected = 1;
 	}
 
-	if (!fl->oif)
-		fl->oif = sk->sk_bound_dev_if;
+	if (!fl.oif)
+		fl.oif = sk->sk_bound_dev_if;
 
 	if (msg->msg_controllen) {
 		opt = &opt_space;
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
 		opt->tot_len = sizeof(*opt);
 
-		err = datagram_send_ctl(msg, fl, opt, &hlimit, &tclass);
+		err = datagram_send_ctl(msg, &fl, opt, &hlimit, &tclass);
 		if (err < 0) {
 			fl6_sock_release(flowlabel);
 			return err;
 		}
-		if ((fl->fl6_flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
-			flowlabel = fl6_sock_lookup(sk, fl->fl6_flowlabel);
+		if ((fl.fl6_flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
+			flowlabel = fl6_sock_lookup(sk, fl.fl6_flowlabel);
 			if (flowlabel == NULL)
 				return -EINVAL;
 		}
@@ -762,37 +762,37 @@
 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
 	opt = ipv6_fixup_options(&opt_space, opt);
 
-	fl->proto = IPPROTO_UDP;
-	ipv6_addr_copy(&fl->fl6_dst, daddr);
-	if (ipv6_addr_any(&fl->fl6_src) && !ipv6_addr_any(&np->saddr))
-		ipv6_addr_copy(&fl->fl6_src, &np->saddr);
-	fl->fl_ip_sport = inet->sport;
+	fl.proto = IPPROTO_UDP;
+	ipv6_addr_copy(&fl.fl6_dst, daddr);
+	if (ipv6_addr_any(&fl.fl6_src) && !ipv6_addr_any(&np->saddr))
+		ipv6_addr_copy(&fl.fl6_src, &np->saddr);
+	fl.fl_ip_sport = inet->sport;
 	
 	/* merge ip6_build_xmit from ip6_output */
 	if (opt && opt->srcrt) {
 		struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-		ipv6_addr_copy(&final, &fl->fl6_dst);
-		ipv6_addr_copy(&fl->fl6_dst, rt0->addr);
+		ipv6_addr_copy(&final, &fl.fl6_dst);
+		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
 		final_p = &final;
 		connected = 0;
 	}
 
-	if (!fl->oif && ipv6_addr_is_multicast(&fl->fl6_dst)) {
-		fl->oif = np->mcast_oif;
+	if (!fl.oif && ipv6_addr_is_multicast(&fl.fl6_dst)) {
+		fl.oif = np->mcast_oif;
 		connected = 0;
 	}
 
-	err = ip6_sk_dst_lookup(sk, &dst, fl);
+	err = ip6_sk_dst_lookup(sk, &dst, &fl);
 	if (err)
 		goto out;
 	if (final_p)
-		ipv6_addr_copy(&fl->fl6_dst, final_p);
+		ipv6_addr_copy(&fl.fl6_dst, final_p);
 
-	if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0)
+	if ((err = xfrm_lookup(&dst, &fl, sk, 0)) < 0)
 		goto out;
 
 	if (hlimit < 0) {
-		if (ipv6_addr_is_multicast(&fl->fl6_dst))
+		if (ipv6_addr_is_multicast(&fl.fl6_dst))
 			hlimit = np->mcast_hops;
 		else
 			hlimit = np->hop_limit;
@@ -828,18 +828,20 @@
 do_append_data:
 	up->len += ulen;
 	err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
-		sizeof(struct udphdr), hlimit, tclass, opt, fl,
+		sizeof(struct udphdr), hlimit, tclass, opt, &fl,
 		(struct rt6_info*)dst,
 		corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
 	if (err)
 		udp_v6_flush_pending_frames(sk);
 	else if (!corkreq)
 		err = udp_v6_push_pending_frames(sk, up);
+	else if (unlikely(skb_queue_empty(&sk->sk_write_queue)))
+		up->pending = 0;
 
 	if (dst) {
 		if (connected) {
 			ip6_dst_store(sk, dst,
-				      ipv6_addr_equal(&fl->fl6_dst, &np->daddr) ?
+				      ipv6_addr_equal(&fl.fl6_dst, &np->daddr) ?
 				      &np->daddr : NULL);
 		} else {
 			dst_release(dst);

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-03 14:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-27 14:23 PROBLEM: kernel BUG at net/ipv6/ip6_output.c:718 cagri coltekin
2006-08-28  0:16 ` Herbert Xu
2006-08-28  0:49   ` cagri coltekin
2006-08-29  8:28     ` Herbert Xu
2006-08-31 15:12       ` cagri coltekin
2006-09-01  7:05         ` Herbert Xu
2006-09-01 16:22           ` cagri coltekin
2006-09-25 12:15             ` Herbert Xu
2006-09-26 11:21               ` cagri coltekin
2006-09-28  0:38                 ` Herbert Xu
2006-09-28  8:40                   ` cagri coltekin
2006-10-03  5:49                     ` Herbert Xu
2006-10-03  6:28                       ` Herbert Xu
2006-10-03 14:57                         ` cagri coltekin
2006-10-03 13:56                       ` James Morris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.