linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [OOPS] network related crash with Linux 2.4
@ 2001-07-09 11:51       ` Moritz Schulte
  2001-07-10  5:19         ` Rainer Clasen
  0 siblings, 1 reply; 17+ messages in thread
From: Moritz Schulte @ 2001-07-09 11:51 UTC (permalink / raw)
  To: linux-kernel

Hi,

I often see my Gateway (Cx486DX4 CPU, 14364K RAM, 124956K Swap)
running Linux 2.4.[56] crashing (should I test previous
versions?). These crashes seem related to networking, because they
happen when trying to access some hosts. Now, the system crashes
(reproducible) when attempting a TCP connection to wwwkeys.eu.pgp.net
from a masqueraded System (running Linux 2.2.19) - it doesn't crash
when trying to connect from the Gateway itself. After such a crash,
the system seems really dead; I've to fetch the Oops via a serial
line.

Here's the decoded Oops from clean Linux 2.4.6:

--------------------------------
Unable to handle kernel paging request at virtual address 0100e018
c01268a2
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01268a2>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 0100e000   ebx: 00000000   ecx: 0100e000   edx: 00000000
esi: c068c470   edi: c0ce7a70   ebp: 00000050   esp: c021dde0
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c021d000)
Stack: c0191ebd 00000100 c068c470 c01924a3 c068c470 c068c470 c008c800 c06550b0
       c008c800 fffffffa c0194f3d c068c470 00000020 c0df9770 c068c470 c06550b0
       c01983dd c068c470 c068c470 00000000 00000004 c01a2d60 c01a2ded c068c470
Call Trace: [<c0191ebd>] [<c01924a3>] [<c0194f3d>] [<c01983dd>] [<c01a2d60>] [<c01a2ded>] [<c019a6c7>]
       [<c01a0400>] [<c01a2d32>] [<c01a2d60>] [<c01a044a>] [<c019a6c7>] [<c01905dc>] [<c01a03a4>] [<c01a0400>]
       [<c019f630>] [<c019f7b9>] [<c019f630>] [<c019a6c7>] [<c019f476>] [<c019f630>] [<c019553d>] [<c011416f>]
       [<c010807d>] [<c0105140>] [<c0106c40>] [<c0105140>] [<c0105163>] [<c01051c8>] [<c0105000>]
Code: 8b 41 18 85 c0 7c 11 ff 49 14 0f 94 c0 84 c0 74 07 89 c8 e8

>>EIP; c01268a2 <__free_pages+2/20>   <=====
Trace; c0191ebd <skb_release_data+3d/70>
Trace; c01924a3 <skb_linearize+d3/140>
Trace; c0194f3d <dev_queue_xmit+6d/240>
Trace; c01983dd <neigh_resolve_output+13d/1b0>
Trace; c01a2d60 <ip_finish_output2+0/d0>
Trace; c01a2ded <ip_finish_output2+8d/d0>
Trace; c019a6c7 <nf_hook_slow+e7/140>
Trace; c01a0400 <ip_forward_finish+0/50>
Trace; c01a2d32 <ip_finish_output+f2/100>
Trace; c01a2d60 <ip_finish_output2+0/d0>
Trace; c01a044a <ip_forward_finish+4a/50>
Trace; c019a6c7 <nf_hook_slow+e7/140>
Trace; c01905dc <sys_socketcall+1c/210>
Trace; c01a03a4 <ip_forward+1a4/200>
Trace; c01a0400 <ip_forward_finish+0/50>
Trace; c019f630 <ip_rcv_finish+0/1c0>
Trace; c019f7b9 <ip_rcv_finish+189/1c0>
Trace; c019f630 <ip_rcv_finish+0/1c0>
Trace; c019a6c7 <nf_hook_slow+e7/140>
Trace; c019f476 <ip_rcv+336/370>
Trace; c019f630 <ip_rcv_finish+0/1c0>
Trace; c019553d <net_rx_action+13d/220>
Trace; c011416f <do_softirq+3f/70>
Trace; c010807d <do_IRQ+9d/b0>
Trace; c0105140 <default_idle+0/30>
Trace; c0106c40 <ret_from_intr+0/7>
Trace; c0105140 <default_idle+0/30>
Trace; c0105163 <default_idle+23/30>
Trace; c01051c8 <cpu_idle+38/50>
Trace; c0105000 <_stext+0/0>
Code;  c01268a2 <__free_pages+2/20>
00000000 <_EIP>:
Code;  c01268a2 <__free_pages+2/20>   <=====
   0:   8b 41 18                  mov    0x18(%ecx),%eax   <=====
Code;  c01268a5 <__free_pages+5/20>
   3:   85 c0                     test   %eax,%eax
Code;  c01268a7 <__free_pages+7/20>
   5:   7c 11                     jl     18 <_EIP+0x18> c01268ba <__free_pages+1a/20>
Code;  c01268a9 <__free_pages+9/20>
   7:   ff 49 14                  decl   0x14(%ecx)
Code;  c01268ac <__free_pages+c/20>
   a:   0f 94 c0                  sete   %al
Code;  c01268af <__free_pages+f/20>
   d:   84 c0                     test   %al,%al
Code;  c01268b1 <__free_pages+11/20>
   f:   74 07                     je     18 <_EIP+0x18> c01268ba <__free_pages+1a/20>
Code;  c01268b3 <__free_pages+13/20>
  11:   89 c8                     mov    %ecx,%eax
Code;  c01268b5 <__free_pages+15/20>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> c01268ba <__free_pages+1a/20>
n
Kernel panic: Aiee, killing interrupt handler!

1 warning issued.  Results may not be reliable.
--------------------------------

Boot messages:

--------------------------------
Linux version 2.4.6 (root@gryffindor) (gcc version 2.95.2 20000220 (Debian GNU/Linux)) #1 Mon Jul 9 00:58:38 CEST 2001
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000001000000 (usable)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
On node 0 totalpages: 4096
zone(0): 4096 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Kernel command line: auto BOOT_IMAGE=linux ro root=302 BOOT_FILE=/boot/vmlinuz-2.4.6 console=ttyS1,9600
Initializing CPU#0
Console: colour VGA+ 80x25
Calibrating delay loop... 39.62 BogoMIPS
Memory: 14180k/16384k available (857k kernel code, 1820k reserved, 274k data, 184k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Dentry-cache hash table entries: 2048 (order: 2, 16384 bytes)
Inode-cache hash table entries: 1024 (order: 1, 8192 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
CPU: Before vendor init, caps: 00000000 00000000 00000000, vendor = 1
CPU: After vendor init, caps: 00000000 00000000 00000000 00000000
CPU:     After generic, caps: 00000000 00000000 00000000 00000000
CPU:             Common caps: 00000000 00000000 00000000 00000000
CPU: Cyrix Cx486DX4 stepping 06
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfc0b0, last bus=0
PCI: Using configuration type 1
PCI: Probing PCI hardware
Disabling direct PCI/PCI transfers.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd v1.8
devfs: v0.106 (20010617) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x2
pty: 256 Unix98 ptys configured
Serial driver version 5.05a (2001-03-20) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
block: queued sectors max/low 9304kB/3101kB, 64 slots per queue
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hda: ST3630A, ATA DISK drive
ide0: unexpected interrupt, status=0x51, count=1
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: 1232784 sectors (631 MB) w/120KiB Cache, CHS=611/32/63
Partition check:
 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is an 8272A
PPP generic driver version 2.4.1
PPP Deflate Compression module registered
PPP BSD Compression module registered
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 1024 bind 1024)
ip_conntrack (128 buckets, 1024 max)
ip_tables: (c)2000 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 184k freed
Adding Swap: 124956k swap-space (priority -1)
eth0: 3c5x9 at 0x300, 10baseT port, address  00 a0 24 47 bc 56, IRQ 10.
3c509.c:1.18 12Mar2001 becker@scyld.com
http://www.scyld.com/network/3c509.html
ISDN subsystem Rev: 1.114.6.12/1.94.6.2/1.140.6.6/1.85.6.5/none/1.5.6.3 loaded
HiSax: Linux Driver for passive ISDN cards
HiSax: Version 3.5 (module)
HiSax: Layer1 Revision 2.41.6.3
HiSax: Layer2 Revision 2.25.6.3
HiSax: TeiMgr Revision 2.17.6.2
HiSax: Layer3 Revision 2.17.6.4
HiSax: LinkLayer Revision 2.51.6.4
HiSax: Approval certification valid
HiSax: Approved with ELSA Microlink PCI cards
HiSax: Approved with Eicon Technology Diva 2.01 PCI cards
HiSax: Approved with Sedlbauer Speedfax + cards
HiSax: Total 1 card defined
HiSax: Card 1 Protocol EDSS1 Id=HiSax (0)
HiSax: AVM driver Rev. 2.13.6.1
AVM A1: Byte at 1b40 is 2
AVM A1: Byte at 1b43 is 3
AVM A1: Byte at 1b42 is 2
AVM A1: Byte at 1b40 is 6
HiSax: AVM A1 config irq:4 cfg:0x1B40
HiSax: isac:0x1740/0x1340
HiSax: hscx A:0x740/0x340  hscx B:0xF40/0xB40
AVM A1: ISAC version (0): 2086/2186 V1.1
AVM A1: HSCX version A: V2.1  B: V2.1
AVM A1: IRQ 4 count 0
AVM A1: IRQ 4 count 0
AVM A1: IRQ(4) getting no interrupts during init 1
AVM A1: IRQ 4 count 3
HiSax: DSS1 Rev. 2.30.6.1
HiSax: 2 channels added
HiSax: MAX_WAITING_CALLS added
eth0: Setting Rx mode to 1 addresses.
--------------------------------

lspci -vv:
--------------------------------
00:05.0 Host bridge: Silicon Integrated Systems [SiS] 85C496 (rev 31)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
	Latency: 0 set

00:0f.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+] (prog-if 00 [VGA])
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin A routed to IRQ 12
	Region 0: Memory at f0000000 (32-bit, non-prefetchable) [size=8M]
	Expansion ROM at <unassigned> [disabled] [size=64K]
--------------------------------

Something I should test?

	moritz
-- 
Moritz Schulte <moritz@chaosdorf.de> http://www.chaosdorf.de/moritz/
Debian/GNU supporter - http://www.debian.org/ http://www.gnu.org/
GPG fingerprint = 3A14 3923 15BE FD57 FC06  B501 0841 2D7B 6F98 4199

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] network related crash with Linux 2.4
  2001-07-09 11:51       ` [OOPS] network related crash with Linux 2.4 Moritz Schulte
@ 2001-07-10  5:19         ` Rainer Clasen
  2001-08-01 20:21           ` Rainer Clasen
  0 siblings, 1 reply; 17+ messages in thread
From: Rainer Clasen @ 2001-07-10  5:19 UTC (permalink / raw)
  To: linux-kernel


Am Mon, Jul 09, 2001 at 01:51:17PM +0200 schrieb Moritz Schulte:
> I often see my Gateway (Cx486DX4 CPU, 14364K RAM, 124956K Swap)
> running Linux 2.4.[56] crashing (should I test previous
> versions?). These crashes seem related to networking, because they
> happen when trying to access some hosts. Now, the system crashes

this seems to be similar to my oopsen - 
see MsgId <20010620201648.B12694@zuto.de>

Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* kernel panic problem. (smp, iptables?)
@ 2001-07-17  2:35 Andrew Friedley
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
  2001-07-22  2:07 ` kernel panic problem. (smp, iptables?) Rusty Russell
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Friedley @ 2001-07-17  2:35 UTC (permalink / raw)
  To: linux-kernel

I am sorry if this posts in HTML.
Some background information on my system.

Tyan S1863D (Tomcat IIID) motherboard, dual socket 7.
  128mb 60ns EDO ram (8x16mb)
  Dual Pentium 200mhz processors, non-mmx.
  PIIX3 IDE Controller, onboard
    Single 1.2gb drive on each of the two channels.
  Advansys ISA SCSI Controller
    Seagate SCSI HD, 9.1gb (Ultra Wide? not sure)
  NE2000 Compatible ISA 10mbit ethernet nic IRQ 11 IO 0x300
  Kingston KNE100TX PCI 100mbit fast ethernet nic, (uses tulip.o) IRQ 12 IO
0x6400

I am running a custom built linux system using linuxfromscratch, everything
except the kernel was compiled -O3 -march=i586.  I am running kernel 2.4.6,
glibc 2.2.1, gcc 2.95.3, iptables 1.2.2.  The NE2000 nic connects to my DSL
modem (swbell is my ISP) and the KNE100TX connects to my 100mbit switched
lan.  The first IDE drive contains a small ext2 /boot partition, a swap
partition, and a linux raid autodetect partition.  The second IDE drive has
a swap partition and a linux raid autodetect partition exactly the same size
as the partition on the first drive.  These two partitions are configured
for software RAID0, with a reiserfs filesystem and mounted on /.  The scsi
drive has a reiserfs and a swap partition. The reiserfs partition is mounted
on /home and exported to other linux systems on the lan via knfsd, using
nfsv3 support.  There are no other directories or filesystems exported.  The
kernel has SMP, software raid0, iptables, pppoe, nfs, reiserfs, ext2, and
all drivers except floppy.o compiled in.  My .config file I use for 2.4.6
can be found here:  http://saai.dyn.dhs.org/kernel-2.4.6.config.  As far as
software running on the system, I have named, pppd (the patched version for
pppoe support), rpc.portmap, rpc.nfsd, rpc.mountd, apache, and mysql along
with telnet, rsh and proftpd running through inetd.  As far as usage of the
system goes, I have its /home export mounted as /home on my other two linux
systems (all linuxfromscratch, same version software), I play mp3's residing
on the /home mount over nfs in linux, and via Apache using Winamp in
windows.  This system acts as router for my lan, connected to an ADSL modem.

My problem is, that I seem to be having "random" kernel panics.  When I say
random I mean that the system will run 30 minutes before it panics, or as
long as a week before it panics.  System load and type of load does not seem
to matter, the system will panic in the middle of the night when there is
little/no cpu/network activity, or during very heavy multiuser cpu and
network usage -- it does not seem to matter.  Always, the panic is pretty
much the same, occuring in the Process swapper.  I believe that the problem
is related to iptables.  I have a windows 98SE machine on my lan, that
accesses the internet though this linux system acting as a router.  If I try
to connect to a server using Napster/Napigator on the windows box through
the linux router box I am having problems with, the system always panics.
When I used a 2.4.4 kernel with the reiserfs-nfs patch the system would
panic when I tried to check my email using Outlook Express 5.0 on the
windows box on my lan. Outlook no longer causes a panic, but rather
Napster/Napigator.

Below is the panic output, captured by setting the system console to a
serial port and using hyperterm on my windows box via a 115200kbps serial
link:

 Unable to handle kernel paging request at virtual address 0000d7c5
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<c01d6fc3>]
EFLAGS: 00010206
eax: c166ae00   ebx: 0000d7c5   ecx: 00000000   edx: 0000d7c5
esi: c7b28b40   edi: c166aba0   ebp: 00000060   esp: c1253d8c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c1253000)
Stack: fffffe00 c01d706b c7b28b40 fffffe00 c7b28b40 c01d7653 c7b28b40
c12ad800
       c7b28b40 0000000e c7b28b40 ffffffe6 c01da0f7 c7b28b40 00000020
00000004
       c7058f40 0000000e c01ddfed c7b28b40 00000001 00000000 c7b28b40
c01e7ea0
Call Trace: [<c01d706b>] [<c01d7653>] [<c01da0f7>] [<c01ddfed>] [<c01e7ea0>]
[<c01e7f60>] [<c01df228>]
       [<c01e5450>] [<c01e7e83>] [<c01e7ea0>] [<c01e54aa>] [<c01df228>]
[<c01e05dc>] [<c01e53e4>] [<c01e5450>]
       [<c01e4488>] [<c01e462a>] [<c01e4488>] [<c01df228>] [<c01e42c7>]
[<c01e4488>] [<c01da8ce>] [<c0116c8a>]
       [<c0108680>] [<c0105180>] [<c0106d40>] [<c0105180>] [<c01051ac>]
[<c0105212>] [<c0108680>] [<c0220d4a>]
       [<c019d05f>]

Code: 8b 1b 8b 42 70 83 f8 01 74 0b f0 ff 4a 70 0f 94 c0 84 c0 74
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing


Like I stated before, it is always the Process swapper.  This panic was
caused by using Napster/Napigator, as opposed to a seemingly random panic.
I took the above panic output and pasted it into ksymoops:

>>EIP; c01d6fc3 <skb_drop_fraglist+17/3c>   <=====
Trace; c01d706b <skb_release_data+5f/74>
Trace; c01d7653 <skb_linearize+cf/130>
Trace; c01da0f7 <dev_queue_xmit+27/2e8>
Trace; c01ddfed <neigh_resolve_output+1cd/240>
Trace; c01e7ea0 <ip_finish_output2+0/f8>
Trace; c01e7f60 <ip_finish_output2+c0/f8>
Trace; c01df228 <nf_hook_slow+110/194>
Trace; c01e5450 <ip_forward_finish+0/60>
Trace; c01e7e83 <ip_finish_output+127/130>
Trace; c01e7ea0 <ip_finish_output2+0/f8>
Trace; c01e54aa <ip_forward_finish+5a/60>
Trace; c01df228 <nf_hook_slow+110/194>
Trace; c01e05dc <rt_check_expire__thr+154/184>
Trace; c01e53e4 <ip_forward+1b4/220>
Trace; c01e5450 <ip_forward_finish+0/60>
Trace; c01e4488 <ip_rcv_finish+0/1e8>
Trace; c01e462a <ip_rcv_finish+1a2/1e8>
Trace; c01e4488 <ip_rcv_finish+0/1e8>
Trace; c01df228 <nf_hook_slow+110/194>
Trace; c01e42c7 <ip_rcv+367/3b0>
Trace; c01e4488 <ip_rcv_finish+0/1e8>
Trace; c01da8ce <net_rx_action+16a/274>
Trace; c0116c8a <do_softirq+5a/88>
Trace; c0108680 <do_IRQ+e0/f0>
Trace; c0105180 <default_idle+0/34>
Trace; c0106d40 <ret_from_intr+0/7>
Trace; c0105180 <default_idle+0/34>
Trace; c01051ac <default_idle+2c/34>
Trace; c0105212 <cpu_idle+3e/54>
Trace; c0108680 <do_IRQ+e0/f0>
Trace; c0220d4a <vsprintf+31e/34c>
Trace; c019d05f <serial_console_write+2b/194>
Code;  c01d6fc3 <skb_drop_fraglist+17/3c>
00000000 <_EIP>:
Code;  c01d6fc3 <skb_drop_fraglist+17/3c>   <=====
   0:   8b 1b                     mov    (%ebx),%ebx   <=====
Code;  c01d6fc5 <skb_drop_fraglist+19/3c>
   2:   8b 42 70                  mov    0x70(%edx),%eax
Code;  c01d6fc8 <skb_drop_fraglist+1c/3c>
   5:   83 f8 01                  cmp    $0x1,%eax
Code;  c01d6fcb <skb_drop_fraglist+1f/3c>
   8:   74 0b                     je     15 <_EIP+0x15> c01d6fd8
<skb_drop_fraglist+2c/3c>
Code;  c01d6fcd <skb_drop_fraglist+21/3c>
   a:   f0 ff 4a 70               lock decl 0x70(%edx)
Code;  c01d6fd1 <skb_drop_fraglist+25/3c>
   e:   0f 94 c0                  sete   %al
Code;  c01d6fd4 <skb_drop_fraglist+28/3c>
  11:   84 c0                     test   %al,%al
Code;  c01d6fd6 <skb_drop_fraglist+2a/3c>
  13:   74 00                     je     15 <_EIP+0x15> c01d6fd8
<skb_drop_fraglist+2c/3c>

Kernel panic: Aiee, killing interrupt handler!


Which, by my limited understanding, seems to indicate iptables/softirq or
smp.  My obvious question is, is this a bug? Is there a fix for it?  I have
had this problem since kernel 2.4.3, when I first set this machine up as a
router.  Below are some links with more information that might be useful,
and if theres anything else I can provide please ask.

dmesg output from bootup: http://saai.dyn.dhs.org/dmesg.txt
ifconfig & route output and iptables config:
http://saai.dyn.dhs.org/network.txt
kernel 2.4.6 .config file: http://saai.dyn.dhs.org/kernel-2.4.6.config
panic and ksymoops output: http://saai.dyn.dhs.org/panic.txt
packages installed on my linux systems: http://saai.dyn.dhs.org/packages/
  (In case you would like to know version numbers of any other software I
have)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-17  2:35 kernel panic problem. (smp, iptables?) Andrew Friedley
@ 2001-07-18  3:58 ` David S. Miller
  2001-07-18 14:23   ` Michal Ostrowski
                     ` (5 more replies)
  2001-07-22  2:07 ` kernel panic problem. (smp, iptables?) Rusty Russell
  1 sibling, 6 replies; 17+ messages in thread
From: David S. Miller @ 2001-07-18  3:58 UTC (permalink / raw)
  To: Andrew Friedley
  Cc: linux-kernel, linux-net, netdev, mostrows, prefect_, moritz,
	egger, srwalter, kuznet, rusty


Andrew Friedley writes:
 > >>EIP; c01d6fc3 <skb_drop_fraglist+17/3c>   <=====
 > Trace; c01d706b <skb_release_data+5f/74>
 > Trace; c01d7653 <skb_linearize+cf/130>
 > Trace; c01da0f7 <dev_queue_xmit+27/2e8>
 > Trace; c01ddfed <neigh_resolve_output+1cd/240>

This report has been plagueing us for a month or two now, from
different people.  But we hadn't been able to track it down.

Initially we believed it might be some obscure bug in netfilter
which got triggered more easily when the zerocopy stuff went in.
But all of our code audits turned up nothing.

Then I began to notice that "pppoe" was showing up in all the reports
where the user actually bothered to mention what net devices the
machine was using when it crashed.

I spent the past few days auditing the driver and I think I figured
out what was causing the OOPS (along with some other bugs I found
along the way).

I have no way to test out these changes, so can folks do that for me?
If I didn't screw something else up, then I'm pretty sure the OOPS
will go away.  Let me know if something goes wrong due to these
changes.

Thanks.

--- drivers/net/pppoe.c.~1~	Mon Jul  9 17:52:06 2001
+++ drivers/net/pppoe.c	Tue Jul 17 20:46:24 2001
@@ -5,7 +5,7 @@
  * PPPoE --- PPP over Ethernet (RFC 2516)
  *
  *
- * Version:    0.6.6
+ * Version:    0.6.7
  *
  * 030700 :     Fixed connect logic to allow for disconnect.
  * 270700 :	Fixed potential SMP problems; we must protect against
@@ -20,10 +20,16 @@
  * 111100 :	Fix recvmsg.
  * 050101 :	Fix PADT procesing.
  * 140501 :	Use pppoe_rcv_core to handle all backlog. (Alexey)
+ * 170701 :	Do not lock_sock with rwlock held. (DaveM)
+ *		Ignore discovery frames if user has socket
+ *		locked. (DaveM)
+ *		Ignore return value of dev_queue_xmit in __pppoe_xmit
+ *		or else we may kfree an SKB twice. (DaveM)
  *
  * Author:	Michal Ostrowski <mostrows@styx.uwaterloo.ca>
  * Contributors:
  * 		Arnaldo Carvalho de Melo <acme@xconectiva.com.br>
+ *		David S. Miller (davem@redhat.com)
  *
  * License:
  *		This program is free software; you can redistribute it and/or
@@ -100,10 +106,11 @@
 
 static int hash_item(unsigned long sid, unsigned char *addr)
 {
-	char hash=0;
-	int i,j;
-	for (i = 0; i < ETH_ALEN ; ++i){
-		for (j = 0; j < 8/PPPOE_HASH_BITS ; ++j){
+	char hash = 0;
+	int i, j;
+
+	for (i = 0; i < ETH_ALEN ; ++i) {
+		for (j = 0; j < 8/PPPOE_HASH_BITS ; ++j) {
 			hash ^= addr[i] >> ( j * PPPOE_HASH_BITS );
 		}
 	}
@@ -188,7 +195,7 @@
 
 	read_lock_bh(&pppoe_hash_lock);
 	po = __get_item(sid, addr);
-	if(po)
+	if (po)
 		sock_hold(po->sk);
 	read_unlock_bh(&pppoe_hash_lock);
 
@@ -233,62 +240,83 @@
  *  Certain device events require that sockets be unconnected.
  *
  **************************************************************************/
+
+static void pppoe_flush_dev(struct net_device *dev)
+{
+	int hash;
+
+	if (dev == NULL)
+		BUG();
+
+	read_lock_bh(&pppoe_hash_lock);
+	for (hash = 0; hash < PPPOE_HASH_SIZE; hash++) {
+		struct pppox_opt *po = item_hash_table[hash];
+
+		while (po != NULL) {
+			if (po->pppoe_dev == dev) {
+				struct sock *sk = po->sk;
+
+				sock_hold(sk);
+				po->pppoe_dev = NULL;
+
+				/* We hold a reference to SK, now drop the
+				 * hash table lock so that we may attempt
+				 * to lock the socket (which can sleep).
+				 */
+				read_unlock_bh(&pppoe_hash_lock);
+
+				lock_sock(sk);
+
+				if (sk->state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+					pppox_unbind_sock(sk);
+					dev_put(dev);
+					sk->state = PPPOX_DEAD;
+					sk->state_change(sk);
+				}
+
+				release_sock(sk);
+
+				sock_put(sk);
+
+				read_lock_bh(&pppoe_hash_lock);
+
+				/* Now restart from the beginning of this
+				 * hash chain.  We always NULL out pppoe_dev
+				 * so we are guarenteed to make forward
+				 * progress.
+				 */
+				po = item_hash_table[hash];
+				continue;
+			}
+			po = po->next;
+		}
+	}
+	read_unlock_bh(&pppoe_hash_lock);
+}
+
 static int pppoe_device_event(struct notifier_block *this,
 			      unsigned long event, void *ptr)
 {
-	int error = NOTIFY_DONE;
 	struct net_device *dev = (struct net_device *) ptr;
-	struct pppox_opt *po = NULL;
-	int hash = 0;
 
 	/* Only look at sockets that are using this specific device. */
 	switch (event) {
 	case NETDEV_CHANGEMTU:
-	    /* A change in mtu is a bad thing, requiring
-	     * LCP re-negotiation.
-	     */
+		/* A change in mtu is a bad thing, requiring
+		 * LCP re-negotiation.
+		 */
+
 	case NETDEV_GOING_DOWN:
 	case NETDEV_DOWN:
-
 		/* Find every socket on this device and kill it. */
-		read_lock_bh(&pppoe_hash_lock);
-
-		while (!po && hash < PPPOE_HASH_SIZE){
-			po = item_hash_table[hash];
-			++hash;
-		}
-
-		while (po && hash < PPPOE_HASH_SIZE){
-			if(po->pppoe_dev == dev){
-				lock_sock(po->sk);
-				if (po->sk->state & (PPPOX_CONNECTED|PPPOX_BOUND)){
-					pppox_unbind_sock(po->sk);
-
-					dev_put(po->pppoe_dev);
-					po->pppoe_dev = NULL;
-
-					po->sk->state = PPPOX_DEAD;
-					po->sk->state_change(po->sk);
-				}
- 				release_sock(po->sk);
-			}
-			if (po->next) {
-				po = po->next;
-			} else {
-				po = NULL;
-				while (!po && hash < PPPOE_HASH_SIZE){
-					po = item_hash_table[hash];
-					++hash;
-				}
-			}
-		}
-		read_unlock_bh(&pppoe_hash_lock);
+		pppoe_flush_dev(dev);
 		break;
+
 	default:
 		break;
 	};
 
-	return error;
+	return NOTIFY_DONE;
 }
 
 
@@ -304,40 +332,39 @@
  * Do the real work of receiving a PPPoE Session frame.
  *
  ***********************************************************************/
-int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb){
-	struct pppox_opt  *po=sk->protinfo.pppox;
+int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
+{
+	struct pppox_opt *po = sk->protinfo.pppox;
 	struct pppox_opt *relay_po = NULL;
 
 	if (sk->state & PPPOX_BOUND) {
 		skb_pull(skb, sizeof(struct pppoe_hdr));
-
 		ppp_input(&po->chan, skb);
-	} else if( sk->state & PPPOX_RELAY ){
-
-		relay_po = get_item_by_addr( &po->pppoe_relay );
+	} else if (sk->state & PPPOX_RELAY) {
+		relay_po = get_item_by_addr(&po->pppoe_relay);
 
-		if( relay_po == NULL  ||
-		    !( relay_po->sk->state & PPPOX_CONNECTED ) ){
-			goto abort;
-		}
+		if (relay_po == NULL)
+			goto abort_kfree;
+			
+		if ((relay_po->sk->state & PPPOX_CONNECTED) == 0)
+			goto abort_put;
 
 		skb_pull(skb, sizeof(struct pppoe_hdr));
-		if( !__pppoe_xmit( relay_po->sk , skb) ){
-			goto abort;
-		}
+		if (!__pppoe_xmit( relay_po->sk , skb))
+			goto abort_put;
 	} else {
 		sock_queue_rcv_skb(sk, skb);
 	}
-	return 1;
-abort:
-	if(relay_po)
-		sock_put(relay_po->sk);
-	return 0;
-
-}
 
+	return NET_RX_SUCCESS;
 
+abort_put:
+	sock_put(relay_po->sk);
 
+abort_kfree:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
 
 /************************************************************************
  *
@@ -356,24 +383,25 @@
 
 	po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source);
 
-	if(!po){
+	if (!po) {
 		kfree_skb(skb);
-		return 0;
+		return NET_RX_DROP;
 	}
 
 	sk = po->sk;
         bh_lock_sock(sk);
 
 	/* Socket state is unknown, must put skb into backlog. */
-	if( sk->lock.users != 0 ){
-		sk_add_backlog( sk, skb);
-		ret = 1;
-	}else{
+	if (sk->lock.users != 0) {
+		sk_add_backlog(sk, skb);
+		ret = NET_RX_SUCCESS;
+	} else {
 		ret = pppoe_rcv_core(sk, skb);
 	}
 
 	bh_unlock_sock(sk);
 	sock_put(sk);
+
 	return ret;
 }
 
@@ -390,24 +418,31 @@
 {
 	struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw;
 	struct pppox_opt *po;
-	struct sock *sk = NULL;
 
 	if (ph->code != PADT_CODE)
 		goto abort;
 
 	po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source);
+	if (po) {
+		struct sock *sk = po->sk;
 
-	if (!po)
-		goto abort;
+		bh_lock_sock(sk);
 
-	sk = po->sk;
+		/* If the user has locked the socket, just ignore
+		 * the packet.  With the way two rcv protocols hook into
+		 * one socket family type, we cannot (easily) distinguish
+		 * what kind of SKB it is during backlog rcv.
+		 */
+		if (sk->lock.users == 0)
+			pppox_unbind_sock(sk);
 
-	pppox_unbind_sock(sk);
+		bh_unlock_sock(sk);
+		sock_put(sk);
+	}
 
-	sock_put(sk);
- abort:
+abort:
 	kfree_skb(skb);
-	return 0;
+	return NET_RX_SUCCESS; /* Lies... :-) */
 }
 
 struct packet_type pppoes_ptype = {
@@ -524,7 +559,7 @@
 	struct sock *sk = sock->sk;
 	struct net_device *dev = NULL;
 	struct sockaddr_pppox *sp = (struct sockaddr_pppox *) uservaddr;
-	struct pppox_opt  *po=sk->protinfo.pppox;
+	struct pppox_opt *po = sk->protinfo.pppox;
 	int error;
 
 	lock_sock(sk);
@@ -569,8 +604,9 @@
 
 		po->pppoe_dev = dev;
 
-		if( ! (dev->flags & IFF_UP) )
+		if (!(dev->flags & IFF_UP))
 			goto err_put;
+
 		memcpy(&po->pppoe_pa,
 		       &sp->sa_addr.pppoe,
 		       sizeof(struct pppoe_addr));
@@ -687,7 +723,7 @@
 		/* PPPoE address from the user specifies an outbound
 		   PPPoE address to which frames are forwarded to */
 		err = -EFAULT;
-		if( copy_from_user(&po->pppoe_relay,
+		if (copy_from_user(&po->pppoe_relay,
 				   (void*)arg,
 				   sizeof(struct sockaddr_pppox)))
 			break;
@@ -752,7 +788,7 @@
 	dev = sk->protinfo.pppox->pppoe_dev;
 
 	error = -EMSGSIZE;
- 	if(total_len > dev->mtu+dev->hard_header_len)
+ 	if (total_len > (dev->mtu + dev->hard_header_len))
 		goto end;
 
 
@@ -775,7 +811,7 @@
 	ph = (struct pppoe_hdr *) skb_put(skb, total_len + sizeof(struct pppoe_hdr));
 	start = (char *) &ph->tag[0];
 
-	error = memcpy_fromiovec( start, m->msg_iov, total_len);
+	error = memcpy_fromiovec(start, m->msg_iov, total_len);
 
 	if (error < 0) {
 		kfree_skb(skb);
@@ -793,7 +829,7 @@
 
 	dev_queue_xmit(skb);
 
- end:
+end:
 	release_sock(sk);
 	return error;
 }
@@ -811,9 +847,8 @@
 	int headroom = skb_headroom(skb);
 	int data_len = skb->len;
 
-	if (sk->dead  || !(sk->state & PPPOX_CONNECTED)) {
+	if (sk->dead  || !(sk->state & PPPOX_CONNECTED))
 		goto abort;
-	}
 
 	hdr.ver	= 1;
 	hdr.type = 1;
@@ -821,9 +856,8 @@
 	hdr.sid	= sk->num;
 	hdr.length = htons(skb->len);
 
-	if (!dev) {
+	if (!dev)
 		goto abort;
-	}
 
 	/* Copy the skb if there is no space for the header. */
 	if (headroom < (sizeof(struct pppoe_hdr) + dev->hard_header_len)) {
@@ -844,6 +878,12 @@
 		skb = skb2;
 	}
 
+	/* We may not return error beyond this point.  Potentially this
+	 * is a new SKB and in such a case we've already freed the
+	 * original SKB.  So if we were to return error, our caller would
+	 * double free that original SKB. -DaveM
+	 */
+
 	ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr));
 	memcpy(ph, &hdr, sizeof(struct pppoe_hdr));
 	skb->protocol = __constant_htons(ETH_P_PPP_SES);
@@ -856,11 +896,10 @@
 			 sk->protinfo.pppox->pppoe_pa.remote,
 			 NULL, data_len);
 
-	if (dev_queue_xmit(skb) < 0)
-		goto abort;
-
+	dev_queue_xmit(skb);
 	return 1;
- abort:
+
+abort:
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
@ 2001-07-18 14:23   ` Michal Ostrowski
  2001-07-19 12:30   ` Michal Ostrowski
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Michal Ostrowski @ 2001-07-18 14:23 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andrew Friedley, linux-kernel, linux-net, netdev, prefect_,
	moritz, egger, srwalter, kuznet, rusty

"David S. Miller" <davem@redhat.com> writes:
> 
> This report has been plagueing us for a month or two now, from
> different people.  But we hadn't been able to track it down.
> 
> Initially we believed it might be some obscure bug in netfilter
> which got triggered more easily when the zerocopy stuff went in.
> But all of our code audits turned up nothing.

This brings up an interesting point.

One of the initial issues in developing PPPoE support was how to
ensure that the layers passing packets to PPPoE allocated the correct
amount of header space in the skb (so as to avoid a copy of the skb to
create space for PPPoE headers).

This issue was resolved by having the PPP layer respect the header
lengths specified by the underlying transport layers (PPPoE) when
defining dev->hard_header_len.  However, just to be on the safe side,
this code that copied the packet was left in place.

My guess is that before zerocopy netfilter, netfilter made an skb that
conformed to the header requirements of PPPoE.  Once netfilter stopped
making copies PPPoE was passed skb's without space for PPPoE headers
and thus invoked the code path you've just fixed.

Is it possible for netfilter to pass to the PPP device a packet that
respect's the PPP device's hard_header_len? (This would avoid the copy
in PPPoE.)  More generally, I'm concerned (without having seen the
code) that we may have problems when passing skb's between devices via
zerocopy-netfilter when those devices have varying hard_header_len
requirements.

> 
> I have no way to test out these changes, so can folks do that for me?
> If I didn't screw something else up, then I'm pretty sure the OOPS
> will go away.  Let me know if something goes wrong due to these
> changes.
> 

As far as I can tell it all seems fine.  (I'd like to hear some
success stories from some of the people using netfilter heavily with
this though who have experienced the oops'es.)

Michal Ostrowski
mostrows@speakeasy.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
  2001-07-18 14:23   ` Michal Ostrowski
@ 2001-07-19 12:30   ` Michal Ostrowski
  2001-07-19 17:27     ` kuznet
  2001-07-19 18:57   ` Michal Ostrowski
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Michal Ostrowski @ 2001-07-19 12:30 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andrew Friedley, linux-kernel, linux-net, netdev, prefect_,
	moritz, egger, srwalter, kuznet, rusty


After sleeping on it a bit I've come to the realization that are
still some issues outstanding.

Essentially, if __pppoe_xmit has been forced to make a copy of the skb
(because netfilter did not leave enough room for PPPoE headers), and
dev_queue_xmit fails, the copy of the skb is not freed and we have a
memory leak.

The generic PPP layer assumes that if a PPP-channel's xmit routine
fails then the skb is still available to it (for retransmission) and
otherwise the skb is gone -- handled by the channel.  These are the
semantics that must be implemented by __pppoe_xmit.

The patch below (which requires David Miller's patch from 17/07/01)
implements these semantics.

Michal Ostrowski
mostrows@speakeasy.net

--- drivers/net/pppoe.c~	Wed Jul 18 10:24:25 2001
+++ drivers/net/pppoe.c	Thu Jul 19 08:28:56 2001
@@ -5,7 +5,7 @@
  * PPPoE --- PPP over Ethernet (RFC 2516)
  *
  *
- * Version:    0.6.7
+ * Version:    0.6.8
  *
  * 030700 :     Fixed connect logic to allow for disconnect.
  * 270700 :	Fixed potential SMP problems; we must protect against
@@ -25,8 +25,12 @@
  *		locked. (DaveM)
  *		Ignore return value of dev_queue_xmit in __pppoe_xmit
  *		or else we may kfree an SKB twice. (DaveM)
+ * 190701 :	When doing copies of skb's in __pppoe_xmit, always delete
+ *		the original skb that was passed in on success, never on
+ *		failure.  Delete the copy of the skb on failure to avoid
+ *		a memory leak.
  *
- * Author:	Michal Ostrowski <mostrows@styx.uwaterloo.ca>
+ * Author:	Michal Ostrowski <mostrows@speakeasy.net>
  * Contributors:
  * 		Arnaldo Carvalho de Melo <acme@xconectiva.com.br>
  *		David S. Miller (davem@redhat.com)
@@ -849,6 +853,7 @@
 	struct pppoe_hdr *ph;
 	int headroom = skb_headroom(skb);
 	int data_len = skb->len;
+	struct sk_buff *old_skb = NULL;
 
 	if (sk->dead  || !(sk->state & PPPOX_CONNECTED))
 		goto abort;
@@ -876,17 +881,12 @@
 		skb_reserve(skb2, dev->hard_header_len + sizeof(struct pppoe_hdr));
 		memcpy(skb_put(skb2, skb->len), skb->data, skb->len);
 
-		skb_unlink(skb);
-		kfree_skb(skb);
+		/* Keep a reference to the original */
+		old_skb = skb;
+
 		skb = skb2;
 	}
 
-	/* We may not return error beyond this point.  Potentially this
-	 * is a new SKB and in such a case we've already freed the
-	 * original SKB.  So if we were to return error, our caller would
-	 * double free that original SKB. -DaveM
-	 */
-
 	ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr));
 	memcpy(ph, &hdr, sizeof(struct pppoe_hdr));
 	skb->protocol = __constant_htons(ETH_P_PPP_SES);
@@ -899,7 +899,34 @@
 			 sk->protinfo.pppox->pppoe_pa.remote,
 			 NULL, data_len);
 
-	dev_queue_xmit(skb);
+	/* The skb we are to transmit may be a copy (see above).  If
+	 * this fails, then the caller is responsible for the original
+	 * skb, otherwise we must free it.  Also if this fails we must
+	 * free the copy that we made.
+	 */
+
+	if (dev_queue_xmit(skb)<0) {
+		if (old_skb) {
+			/* The skb we tried to send was a copy.  We
+			 * have to free it (the copy) and let the
+			 * caller deal with the original one.
+			 */
+			skb_unlink(skb);
+			kfree_skb(skb);
+		}
+		goto abort;
+	}
+
+	/* Free original skb now that we know dev_queue_xmit
+	 * succeeded.  We free only "old_skb" because dev_queue_xmit
+	 * actually sent a copy, not the original.
+	 */
+
+	if (old_skb) {
+		skb_unlink(old_skb);
+		kfree_skb(old_skb);
+	}
+
 	return 1;
 
 abort:
@@ -1049,7 +1076,6 @@
  	int err = register_pppox_proto(PX_PROTO_OE, &pppoe_proto);
 
 	if (err == 0) {
-		printk(KERN_INFO "Registered PPPoE v0.6.5\n");
 
 		dev_add_pack(&pppoes_ptype);
 		dev_add_pack(&pppoed_ptype);
--- drivers/net/pppox.c~	Tue Feb 13 16:15:05 2001
+++ drivers/net/pppox.c	Wed Jul 18 10:27:25 2001
@@ -148,10 +148,6 @@
 
 	err = sock_register(&pppox_proto_family);
 
-	if (err == 0) {
-		printk(KERN_INFO "Registered PPPoX v0.5\n");
-	}
-
 	return err;
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-19 12:30   ` Michal Ostrowski
@ 2001-07-19 17:27     ` kuznet
  2001-07-19 18:00       ` Michal Ostrowski
  0 siblings, 1 reply; 17+ messages in thread
From: kuznet @ 2001-07-19 17:27 UTC (permalink / raw)
  To: mostrows
  Cc: davem, saai, linux-kernel, linux-net, netdev, prefect_, moritz,
	egger, srwalter, rusty

Hello!

SOme short comment on the patch:


> -	dev_queue_xmit(skb);
> +	/* The skb we are to transmit may be a copy (see above).  If
> +	 * this fails, then the caller is responsible for the original
> +	 * skb, otherwise we must free it.  Also if this fails we must
> +	 * free the copy that we made.
> +	 */
> +
> +	if (dev_queue_xmit(skb)<0) {

dev_queue_xmit _frees_ frame, not depending on return value.
Return value is not a criterium to assume anything.



> +		if (old_skb) {
> +			/* The skb we tried to send was a copy.  We
> +			 * have to free it (the copy) and let the
> +			 * caller deal with the original one.
> +			 */
> +			skb_unlink(skb);

So, do you pass to dev_queue_xmit some skb, which is on some list?
Not a good idea. Please, clone it and submit clone, if you need to hold
original in some list.

Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-19 17:27     ` kuznet
@ 2001-07-19 18:00       ` Michal Ostrowski
  2001-07-19 18:17         ` kuznet
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Ostrowski @ 2001-07-19 18:00 UTC (permalink / raw)
  To: kuznet; +Cc: davem, linux-kernel, linux-net, netdev

kuznet@ms2.inr.ac.ru writes:

> Hello!
> 
> SOme short comment on the patch:
> 
> 
> > -	dev_queue_xmit(skb);
> > +	/* The skb we are to transmit may be a copy (see above).  If
> > +	 * this fails, then the caller is responsible for the original
> > +	 * skb, otherwise we must free it.  Also if this fails we must
> > +	 * free the copy that we made.
> > +	 */
> > +
> > +	if (dev_queue_xmit(skb)<0) {
> 
> dev_queue_xmit _frees_ frame, not depending on return value.
> Return value is not a criterium to assume anything.
> 

My mistake.  It seemed perfectly reasonable at 6:00 am.  :-)

However, could we not have dev_queue_xmit behave as such (not free
frame on failure)?  That is, could we extend dev_queue_xmit to tell it
(optionally) that we want the skb back in case of failure?
dev_queue_xmit unconditionally frees the skb in any failure mode,
which is I would venture to say that we could do this.

The reason why I'm proposing this is that ppp_generic.c assumes that
the skb is still available after a transmission failure via pppoe.  To
support the semantics of dev_queue_xmit and ppp_generic we would have
to always copy skb's inside pppoe_xmit.  Then, if dev_queue_xmit fails
the original is deleted.

In the common case dev_queue_xmit will not fail, and so in that case
I'd like to have to avoid making a copy of the skb.

Michal Ostrowski
mostrows@speakeasy.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-19 18:00       ` Michal Ostrowski
@ 2001-07-19 18:17         ` kuznet
  0 siblings, 0 replies; 17+ messages in thread
From: kuznet @ 2001-07-19 18:17 UTC (permalink / raw)
  To: mostrows; +Cc: davem, linux-kernel, linux-net, netdev

Hello!

> However, could we not have dev_queue_xmit behave as such (not free
> frame on failure)?  

If you need to hold original skb, you may hold its refcnt.

However, this feature inevitably results in big troubles: dev_queue_xmit()
is allowed to change skb and you cannot assume anything about this.
So that resuing skb dev_queue_xmit() is fatal bug not depending on anything.


> The reason why I'm proposing this is that ppp_generic.c assumes that
> the skb is still available after a transmission failure via pppoe.  To
> support the semantics of dev_queue_xmit and ppp_generic we would have
> to always copy skb's inside pppoe_xmit.  Then, if dev_queue_xmit fails
> the original is deleted.

You need not copy it. I said "clone".

Nobody is allowed to touch _data_ part of skb, so that you need not
to copy data.

Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
  2001-07-18 14:23   ` Michal Ostrowski
  2001-07-19 12:30   ` Michal Ostrowski
@ 2001-07-19 18:57   ` Michal Ostrowski
  2001-07-19 23:13   ` David S. Miller
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Michal Ostrowski @ 2001-07-19 18:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, linux-net, netdev, kuznet


Alexey replied to my last post with some valuable comments and in
response I have a new patch (that goes on top of David Miller's patch
from yesterday).

The approach here is that in case we don't have room in the skb for
PPPoE headers, we create a new one (skb2) and copy the entire thing.
If we do have space, we clone it.  We always give dev_queue_xmit the
copy/clone pointer and let it free it.  We then kfree_skb the original
skb depending on the return value of dev_queue_xmit (to conform to the
expectations of ppp_generic).

Michal Ostrowski
mostrows@speakeasy.net

--- drivers/net/pppoe.c~	Wed Jul 18 10:24:25 2001
+++ drivers/net/pppoe.c	Thu Jul 19 14:49:36 2001
@@ -5,7 +5,7 @@
  * PPPoE --- PPP over Ethernet (RFC 2516)
  *
  *
- * Version:    0.6.7
+ * Version:    0.6.8
  *
  * 030700 :     Fixed connect logic to allow for disconnect.
  * 270700 :	Fixed potential SMP problems; we must protect against
@@ -25,8 +25,12 @@
  *		locked. (DaveM)
  *		Ignore return value of dev_queue_xmit in __pppoe_xmit
  *		or else we may kfree an SKB twice. (DaveM)
+ * 190701 :	When doing copies of skb's in __pppoe_xmit, always delete
+ *		the original skb that was passed in on success, never on
+ *		failure.  Delete the copy of the skb on failure to avoid
+ *		a memory leak.
  *
- * Author:	Michal Ostrowski <mostrows@styx.uwaterloo.ca>
+ * Author:	Michal Ostrowski <mostrows@speakeasy.net>
  * Contributors:
  * 		Arnaldo Carvalho de Melo <acme@xconectiva.com.br>
  *		David S. Miller (davem@redhat.com)
@@ -837,6 +841,7 @@
 	return error;
 }
 
+
 /************************************************************************
  *
  * xmit function for internal use.
@@ -849,6 +854,7 @@
 	struct pppoe_hdr *ph;
 	int headroom = skb_headroom(skb);
 	int data_len = skb->len;
+	struct sk_buff *skb2;
 
 	if (sk->dead  || !(sk->state & PPPOX_CONNECTED))
 		goto abort;
@@ -864,7 +870,6 @@
 
 	/* Copy the skb if there is no space for the header. */
 	if (headroom < (sizeof(struct pppoe_hdr) + dev->hard_header_len)) {
-		struct sk_buff *skb2;
 
 		skb2 = dev_alloc_skb(32+skb->len +
 				     sizeof(struct pppoe_hdr) +
@@ -876,30 +881,36 @@
 		skb_reserve(skb2, dev->hard_header_len + sizeof(struct pppoe_hdr));
 		memcpy(skb_put(skb2, skb->len), skb->data, skb->len);
 
-		skb_unlink(skb);
-		kfree_skb(skb);
-		skb = skb2;
+	} else {
+		/* Make a clone so as to not disturb the original skb,
+		 * give dev_queue_xmit something it can free.
+		 */
+		skb2 = skb_clone(skb, GFP_ATOMIC);
 	}
 
-	/* We may not return error beyond this point.  Potentially this
-	 * is a new SKB and in such a case we've already freed the
-	 * original SKB.  So if we were to return error, our caller would
-	 * double free that original SKB. -DaveM
-	 */
-
-	ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr));
+	ph = (struct pppoe_hdr *) skb_push(skb2, sizeof(struct pppoe_hdr));
 	memcpy(ph, &hdr, sizeof(struct pppoe_hdr));
-	skb->protocol = __constant_htons(ETH_P_PPP_SES);
+	skb2->protocol = __constant_htons(ETH_P_PPP_SES);
 
-	skb->nh.raw = skb->data;
+	skb2->nh.raw = skb2->data;
 
-	skb->dev = dev;
+	skb2->dev = dev;
 
-	dev->hard_header(skb, dev, ETH_P_PPP_SES,
+	dev->hard_header(skb2, dev, ETH_P_PPP_SES,
 			 sk->protinfo.pppox->pppoe_pa.remote,
 			 NULL, data_len);
 
-	dev_queue_xmit(skb);
+	/* We're transmitting skb2, and assuming that dev_queue_xmit
+	 * will free it.  The generic ppp layer however, is expecting
+	 * that we give back the skb in case of failure,
+	 * but free it in case of success.
+	 */
+
+	if (dev_queue_xmit(skb2)<0) {
+		goto abort;
+	}
+
+	kfree_skb(skb);
 	return 1;
 
 abort:
@@ -1049,7 +1060,6 @@
  	int err = register_pppox_proto(PX_PROTO_OE, &pppoe_proto);
 
 	if (err == 0) {
-		printk(KERN_INFO "Registered PPPoE v0.6.5\n");
 
 		dev_add_pack(&pppoes_ptype);
 		dev_add_pack(&pppoed_ptype);
--- drivers/net/pppox.c~	Tue Feb 13 16:15:05 2001
+++ drivers/net/pppox.c	Wed Jul 18 10:27:25 2001
@@ -148,10 +148,6 @@
 
 	err = sock_register(&pppox_proto_family);
 
-	if (err == 0) {
-		printk(KERN_INFO "Registered PPPoX v0.5\n");
-	}
-
 	return err;
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
                     ` (2 preceding siblings ...)
  2001-07-19 18:57   ` Michal Ostrowski
@ 2001-07-19 23:13   ` David S. Miller
  2001-07-19 23:53     ` Andrew Friedley
  2001-07-20  7:13   ` Rainer Clasen
  2001-07-20  7:28   ` David S. Miller
  5 siblings, 1 reply; 17+ messages in thread
From: David S. Miller @ 2001-07-19 23:13 UTC (permalink / raw)
  To: mostrows; +Cc: linux-kernel, linux-net, netdev, kuznet


Michal Ostrowski writes:
 > Alexey replied to my last post with some valuable comments and in
 > response I have a new patch (that goes on top of David Miller's patch
 > from yesterday).

Applied to my tree, thanks.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-19 23:13   ` David S. Miller
@ 2001-07-19 23:53     ` Andrew Friedley
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Friedley @ 2001-07-19 23:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

Hey, I'm having trouble applying your first patch to kernel 2.4.6.  I have a
list of 18 failed hunks.
The command I used is: cd /usr/src/linux && patch -Np0 -i
/home/arch/pppoe-davidmiller.patch  Is the patch for a different kernel? I
had the same problem applying it to 2.4.7-pre8.

Andrew Friedley


> Michal Ostrowski writes:
>  > Alexey replied to my last post with some valuable comments and in
>  > response I have a new patch (that goes on top of David Miller's patch
>  > from yesterday).
>
> Applied to my tree, thanks.
>
> Later,
> David S. Miller
> davem@redhat.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
                     ` (3 preceding siblings ...)
  2001-07-19 23:13   ` David S. Miller
@ 2001-07-20  7:13   ` Rainer Clasen
  2001-07-20  7:28   ` David S. Miller
  5 siblings, 0 replies; 17+ messages in thread
From: Rainer Clasen @ 2001-07-20  7:13 UTC (permalink / raw)
  To: linux-kernel, linux-net

On Tue, Jul 17, 2001 at 08:58:32PM -0700, David S. Miller wrote:
> Initially we believed it might be some obscure bug in netfilter
> which got triggered more easily when the zerocopy stuff went in.
> But all of our code audits turned up nothing.
> 
> Then I began to notice that "pppoe" was showing up in all the reports
> where the user actually bothered to mention what net devices the
> machine was using when it crashed.

well, I have no PPPoE running and my OOPSen look very similar. (see MsgId
<20010620201648.B12694@zuto.de>).

I am using tulip, dummy, Ben Grear's dot1q VLAN devices and some ISDN
syncppp and ISDN rawip devices are configured (but not actively used),
too.


Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
                     ` (4 preceding siblings ...)
  2001-07-20  7:13   ` Rainer Clasen
@ 2001-07-20  7:28   ` David S. Miller
  2001-07-20 15:36     ` Rainer Clasen
  5 siblings, 1 reply; 17+ messages in thread
From: David S. Miller @ 2001-07-20  7:28 UTC (permalink / raw)
  To: bj; +Cc: linux-kernel, linux-net


Rainer Clasen writes:
 > I am using tulip, dummy, Ben Grear's dot1q VLAN devices and some ISDN
 > syncppp and ISDN rawip devices are configured (but not actively used),
 > too.

Can you test without dummy and VLAN?  Man, I now have to audit that
friggin' code too :-(

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
  2001-07-20  7:28   ` David S. Miller
@ 2001-07-20 15:36     ` Rainer Clasen
  2001-07-09 11:51       ` [OOPS] network related crash with Linux 2.4 Moritz Schulte
  0 siblings, 1 reply; 17+ messages in thread
From: Rainer Clasen @ 2001-07-20 15:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, linux-net

On Fri, Jul 20, 2001 at 12:28:35AM -0700, David S. Miller wrote:
> 
> Rainer Clasen writes:
>  > I am using tulip, dummy, Ben Grear's dot1q VLAN devices and some ISDN
>  > syncppp and ISDN rawip devices are configured (but not actively used),
>  > too.
> 
> Can you test without dummy and VLAN?  Man, I now have to audit that
> friggin' code too :-(

As first step I've removed dummy. Eliminating Vlan is difficult and will take
me some more time. 

I could easily reproduce the oops with several nmap -sS through this router.

# ksymoops -K -L -o /lib/modules/2.4.6/ -m /boot/System.map-2.4.6-obs.1.1  < blurb 
ksymoops 2.4.1 on i586 2.4.1.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.6/ (specified)
     -m /boot/System.map-2.4.6-obs.1.1 (specified)

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address 67720a25 printing eip:
c012612a
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c012612a>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 67720a0d   ebx: 00000000   ecx: 67720a0d   edx: 00000000
esi: c165d800   edi: c12d2680   ebp: 00000060   esp: c0209dd8
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0209000)
Stack: c0181e4d fffff800 c165d800 c0182443 c165d800 c165d800 c12f3000 c12c10a0
       c12f3000 ffffffee c01853bd c165d800 00000020 c165d800 00000000 c12c10a0
       c0188935 c165d800 c165d800 00000000 00000004 c01961cc c019625d c165d800
Call Trace: [<c0181e4d>] [<c0182443>] [<c01853bd>] [<c0188935>] [<c01961cc>] [<c019625d>] [<c018aa56>] 
       [<c01938b0>] [<c01961b2>] [<c01961cc>] [<c01938fa>] [<c018aa56>] [<c019385b>] [<c01938b0>] [<c0192c69>]
       [<c0192aa8>] [<c018aa56>] [<c01928f6>] [<c0192aa8>] [<c0185a8d>] [<c0113aff>] [<c0107e5d>] [<c0105120>]
       [<c0106b60>] [<c0105120>] [<c0105143>] [<c01051a7>] [<c0105000>]
Code: 8b 41 18 85 c0 7c 11 ff 49 14 0f 94 c0 84 c0 74 07 89 c8 e8

>>EIP; c012612a <__free_pages+2/1c>   <=====
Trace; c0181e4d <skb_release_data+41/74>
Trace; c0182443 <skb_linearize+cf/130>
Trace; c01853bd <dev_queue_xmit+6d/244>
Trace; c0188935 <neigh_connected_output+95/c8>
Trace; c01961cc <ip_finish_output2+0/c8>
Trace; c019625d <ip_finish_output2+91/c8>
Trace; c018aa56 <nf_hook_slow+ee/144>
Trace; c01938b0 <ip_forward_finish+0/50>
Trace; c01961b2 <ip_finish_output+ee/f4>
Trace; c01961cc <ip_finish_output2+0/c8>
Trace; c01938fa <ip_forward_finish+4a/50>
Trace; c018aa56 <nf_hook_slow+ee/144>
Trace; c019385b <ip_forward+1eb/240>
Trace; c01938b0 <ip_forward_finish+0/50>
Trace; c0192c69 <ip_rcv_finish+1c1/1f8>
Trace; c0192aa8 <ip_rcv_finish+0/1f8>
Trace; c018aa56 <nf_hook_slow+ee/144>
Trace; c01928f6 <ip_rcv+376/3b0>
Trace; c0192aa8 <ip_rcv_finish+0/1f8>
Trace; c0185a8d <net_rx_action+135/258>
Trace; c0113aff <do_softirq+3f/68>
Trace; c0107e5d <do_IRQ+9d/b0>
Trace; c0105120 <default_idle+0/28>
Trace; c0106b60 <ret_from_intr+0/7>
Trace; c0105120 <default_idle+0/28>
Trace; c0105143 <default_idle+23/28>
Trace; c01051a7 <cpu_idle+3f/54>
Trace; c0105000 <_stext+0/0>
Code;  c012612a <__free_pages+2/1c>
00000000 <_EIP>:
Code;  c012612a <__free_pages+2/1c>   <=====
   0:   8b 41 18                  mov    0x18(%ecx),%eax   <=====
Code;  c012612d <__free_pages+5/1c>
   3:   85 c0                     test   %eax,%eax
Code;  c012612f <__free_pages+7/1c>
   5:   7c 11                     jl     18 <_EIP+0x18> c0126142 <__free_pages+1a/1c>
Code;  c0126131 <__free_pages+9/1c>
   7:   ff 49 14                  decl   0x14(%ecx)
Code;  c0126134 <__free_pages+c/1c>
   a:   0f 94 c0                  sete   %al
Code;  c0126137 <__free_pages+f/1c>
   d:   84 c0                     test   %al,%al
Code;  c0126139 <__free_pages+11/1c>
   f:   74 07                     je     18 <_EIP+0x18> c0126142 <__free_pages+1a/1c>
Code;  c012613b <__free_pages+13/1c>
  11:   89 c8                     mov    %ecx,%eax
Code;  c012613d <__free_pages+15/1c>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> c0126142 <__free_pages+1a/1c>

Kernel panic: Aiee, killing interrupt handler!

Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel panic problem. (smp, iptables?)
  2001-07-17  2:35 kernel panic problem. (smp, iptables?) Andrew Friedley
  2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
@ 2001-07-22  2:07 ` Rusty Russell
  1 sibling, 0 replies; 17+ messages in thread
From: Rusty Russell @ 2001-07-22  2:07 UTC (permalink / raw)
  To: Andrew Friedley; +Cc: linux-kernel

In message <005f01c10e69$28273e60$0200a8c0@loki> you write:
> My problem is, that I seem to be having "random" kernel panics.  When I say
> random I mean that the system will run 30 minutes before it panics, or as

Looks like the pppoe problem DaveM found.  Patch in progress...

Thanks,
Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] network related crash with Linux 2.4
  2001-07-10  5:19         ` Rainer Clasen
@ 2001-08-01 20:21           ` Rainer Clasen
  0 siblings, 0 replies; 17+ messages in thread
From: Rainer Clasen @ 2001-08-01 20:21 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, linux-net

On Tue, Jul 10, 2001 at 07:19:24AM +0200, Rainer Clasen wrote:
> Am Mon, Jul 09, 2001 at 01:51:17PM +0200 schrieb Moritz Schulte:
> > I often see my Gateway (Cx486DX4 CPU, 14364K RAM, 124956K Swap)
> > running Linux 2.4.[56] crashing (should I test previous
> > versions?). These crashes seem related to networking, because they
> > happen when trying to access some hosts. Now, the system crashes
> 
> this seems to be similar to my oopsen - 

On Fri, Jul 20, 2001 at 05:36:55PM +0200, Rainer Clasen wrote:
> On Fri, Jul 20, 2001 at 12:28:35AM -0700, David S. Miller wrote:
> > Rainer Clasen writes:
> >  > I am using tulip, dummy, Ben Grear's dot1q VLAN devices and some ISDN
> >  > syncppp and ISDN rawip devices are configured (but not actively used),
> >  > too.
> > 
> > Can you test without dummy and VLAN?  Man, I now have to audit that
> > friggin' code too :-(
> 
> As first step I've removed dummy. Eliminating Vlan is difficult and will take
> me some more time. 

Marc Boucher posted a patch on the netfilter list, that solved my
problems.


Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2001-08-01 20:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-17  2:35 kernel panic problem. (smp, iptables?) Andrew Friedley
2001-07-18  3:58 ` [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) David S. Miller
2001-07-18 14:23   ` Michal Ostrowski
2001-07-19 12:30   ` Michal Ostrowski
2001-07-19 17:27     ` kuznet
2001-07-19 18:00       ` Michal Ostrowski
2001-07-19 18:17         ` kuznet
2001-07-19 18:57   ` Michal Ostrowski
2001-07-19 23:13   ` David S. Miller
2001-07-19 23:53     ` Andrew Friedley
2001-07-20  7:13   ` Rainer Clasen
2001-07-20  7:28   ` David S. Miller
2001-07-20 15:36     ` Rainer Clasen
2001-07-09 11:51       ` [OOPS] network related crash with Linux 2.4 Moritz Schulte
2001-07-10  5:19         ` Rainer Clasen
2001-08-01 20:21           ` Rainer Clasen
2001-07-22  2:07 ` kernel panic problem. (smp, iptables?) Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).