linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Panic in ipt_do_table with 2.6.16.13-xen
@ 2006-05-15 17:46 Matt Ayres
  2006-05-15 19:27 ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-15 17:46 UTC (permalink / raw)
  To: Linux Kernel Mailing List

I have been noticing this same problem dozens of times and have finally 
caught a full trace.  I have run it through ksymoops, but there is no 
/proc/ksyms.  Is there a better method for getting information out of 
the Code line than using ksymoops in 2.6 kernels?

The kernel is for Xen, but it does not appear to be related to Xen.

ksymoops run:

# ksymoops -m System.map -k /proc/kallsyms </root/xen-oops.log
ksymoops 2.4.9 on i686 2.6.16.13-xen.  Options used
      -V (default)
      -k /proc/kallsyms (specified)
      -l /proc/modules (default)
      -o /lib/modules/2.6.16.13-xen/ (default)
      -m System.map (specified)

Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a 
valid ksyms file?
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at virtual addr
c03d96a9
008c2000 -> *pde = 00000001:aeead001
00e1f000 -> *pme = 00000000:eb2ef067
0faef000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
CPU:    0
EIP:    0061:[<c03d96a9>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286   (2.6.16.13-xen #1)
eax: 00000002   ebx: cfcc6020   ecx: cfcc6020   edx: 00000000
esi: d1320000   edi: 00004000   ebp: d1324a00   esp: c0517d8c
ds: 007b   es: 007b   ss: 0069
Stack: <0>c04d056c 00000000 cfcc6020 00000000 cffa8000 c7f4fc00 d1320000 
0000000
00000001 00000000 c0517e2c 80000000 c03acc04 c7f4fc00 c03da724 c0517e70
00000002 cffa8000 c7f4fc00 c04d0540 00000000 c03a2e50 00000002 c0517e70
Call Trace:
[<c03acc04>] ip_forward_finish+0x0/0x36
[<c03da724>] ipt_hook+0x1c/0x20
[<c03a2e50>] nf_iterate+0x2c/0x5e
[<c03acc04>] ip_forward_finish+0x0/0x36
[<c03acc04>] ip_forward_finish+0x0/0x36
[<c03a2f4b>] nf_hook_slow+0x3c/0xc3
[<c03acc04>] ip_forward_finish+0x0/0x36
[<c03acdd8>] ip_forward+0x19e/0x22e
[<c03acc04>] ip_forward_finish+0x0/0x36
[<c03abcf7>] ip_rcv+0x40e/0x48f
[<c037dcb5>] netif_receive_skb+0x255/0x294
[<c02e82e6>] tg3_poll+0x532/0x76c
[<c037bd82>] net_rx_action+0xaa/0x17c
[<c011ea17>] __do_softirq+0x73/0xf0
[<c011ead4>] do_softirq+0x40/0x64
[<c010606b>] do_IRQ+0x1f/0x25
[<c02fc87f>] evtchn_do_upcall+0x60/0x96
[<c0104a2c>] hypervisor_callback+0x2c/0x34
[<c010342e>] xen_idle+0x5e/0x7d
[<c0103509>] cpu_idle+0xbc/0xd5
[<c05184e0>] start_kernel+0x344/0x34b
Code: 89 44 24 18 89 c6 8b 44 24 40 8b 6c 24 18 03 74 83 0c 03 6c 83 20 
c7 44 24


 >>EIP; c03d96a9 <ipt_do_table+ad/2d0>   <=====

 >>esp; c0517d8c <init_thread_union+1d8c/2000>

Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03da724 <ipt_hook+1c/20>
Trace; c03a2e50 <nf_iterate+2c/5e>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03a2f4b <nf_hook_slow+3c/c3>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03acdd8 <ip_forward+19e/22e>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03abcf7 <ip_rcv+40e/48f>
Trace; c037dcb5 <netif_receive_skb+255/294>
Trace; c02e82e6 <tg3_poll+532/76c>
Trace; c037bd82 <net_rx_action+aa/17c>
Trace; c011ea17 <__do_softirq+73/f0>
Trace; c011ead4 <do_softirq+40/64>
Trace; c010606b <do_IRQ+1f/25>
Trace; c02fc87f <evtchn_do_upcall+60/96>
Trace; c0104a2c <hypervisor_callback+2c/34>
Trace; c010342e <xen_idle+5e/7d>
Trace; c0103509 <cpu_idle+bc/d5>
Trace; c05184e0 <start_kernel+344/34b>

Code;  c03d96a9 <ipt_do_table+ad/2d0>
00000000 <_EIP>:
Code;  c03d96a9 <ipt_do_table+ad/2d0>   <=====
    0:   89 44 24 18               mov    %eax,0x18(%esp)   <=====
Code;  c03d96ad <ipt_do_table+b1/2d0>
    4:   89 c6                     mov    %eax,%esi
Code;  c03d96af <ipt_do_table+b3/2d0>
    6:   8b 44 24 40               mov    0x40(%esp),%eax
Code;  c03d96b3 <ipt_do_table+b7/2d0>
    a:   8b 6c 24 18               mov    0x18(%esp),%ebp
Code;  c03d96b7 <ipt_do_table+bb/2d0>
    e:   03 74 83 0c               add    0xc(%ebx,%eax,4),%esi
Code;  c03d96bb <ipt_do_table+bf/2d0>
   12:   03 6c 83 20               add    0x20(%ebx,%eax,4),%ebp
Code;  c03d96bf <ipt_do_table+c3/2d0>
   16:   c7 44 24 00 00 00 00      movl   $0x0,0x0(%esp)
Code;  c03d96c6 <ipt_do_table+ca/2d0>
   1d:   00


1 warning issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-15 17:46 Panic in ipt_do_table with 2.6.16.13-xen Matt Ayres
@ 2006-05-15 19:27 ` Patrick McHardy
  2006-05-16  0:01   ` Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-05-15 19:27 UTC (permalink / raw)
  To: Matt Ayres; +Cc: Linux Kernel Mailing List, Netfilter Development Mailinglist

Matt Ayres wrote:
> I have been noticing this same problem dozens of times and have finally
> caught a full trace.  I have run it through ksymoops, but there is no
> /proc/ksyms.  Is there a better method for getting information out of
> the Code line than using ksymoops in 2.6 kernels?


CONFIG_KALLSYMS will make the kernel decode the oops itself.

> The kernel is for Xen, but it does not appear to be related to Xen.


We haven't had problems in that code for ages, so my initial feeling
is that it probably is related to Xen. Do you have any other patches
applied besides Xen? Please also post the full ruleset you're using
and anything else that might appear special about your setup.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-15 19:27 ` Patrick McHardy
@ 2006-05-16  0:01   ` Matt Ayres
  2006-05-16  3:31     ` James Morris
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-16  0:01 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Linux Kernel Mailing List, Netfilter Development Mailinglist, xen-devel

Patrick McHardy wrote:
> Matt Ayres wrote:
>> I have been noticing this same problem dozens of times and have finally
>> caught a full trace.  I have run it through ksymoops, but there is no
>> /proc/ksyms.  Is there a better method for getting information out of
>> the Code line than using ksymoops in 2.6 kernels?
> 
> 
> CONFIG_KALLSYMS will make the kernel decode the oops itself.
> 

That's odd, I had thought that too.  This is what "zcat /proc/config.gz 
| grep KALL" shows:

CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set

I take it my run through ksymoops was of no help in diagnosing the 
problem?  The panic is _always_ in ipt_do_table.

>> The kernel is for Xen, but it does not appear to be related to Xen.
> 
> 
> We haven't had problems in that code for ages, so my initial feeling
> is that it probably is related to Xen. Do you have any other patches
> applied besides Xen? Please also post the full ruleset you're using
> and anything else that might appear special about your setup.
> 

I had initially sent my traces to the Xen guys.  They have not stated it 
is NOT specific to Xen, just that's it's unlikely.  I did not experience 
the problem with kernel 2.6.12, just with 2.6.16 (up to .13 bugfix 
release).  I have completely disabled all support for SCTP 
(protocol/netfilter/conntrack) as I know it is still quite buggy.  I 
know Xen touches the network code a lot, but nothing specific to 
iptables.  I had contacted them twice before LKML as I didn't want to 
post patch specific problems here.  I have no other patches applied 
besides the Xen patch.

My ruleset is pretty bland.  2 rules in the raw table to tell the system 
to only track my forwarded ports, 2 rules in the nat table for 
forwarding (intercepting) 2 ports, and then in the FORWARD tables 2 
rules per VM to just account traffic.

I've CC'ed xen-devel on this in case they can provide some insight.  I 
am not subscribed to LKML so please make sure to reply to me also in 
responses.

Thank you,
Matt Ayres

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-16  0:01   ` Matt Ayres
@ 2006-05-16  3:31     ` James Morris
  2006-05-16 13:49       ` [Xen-devel] " Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: James Morris @ 2006-05-16  3:31 UTC (permalink / raw)
  To: Matt Ayres
  Cc: Patrick McHardy, Linux Kernel Mailing List,
	Netfilter Development Mailinglist, xen-devel

On Mon, 15 May 2006, Matt Ayres wrote:

> I had initially sent my traces to the Xen guys.  They have not stated it is
> NOT specific to Xen, just that's it's unlikely.  I did not experience the
> problem with kernel 2.6.12, just with 2.6.16 (up to .13 bugfix release).  I
> have completely disabled all support for SCTP (protocol/netfilter/conntrack)
> as I know it is still quite buggy.  I know Xen touches the network code a lot,
> but nothing specific to iptables.  I had contacted them twice before LKML as I
> didn't want to post patch specific problems here.  I have no other patches
> applied besides the Xen patch.
> 
> My ruleset is pretty bland.  2 rules in the raw table to tell the system to
> only track my forwarded ports, 2 rules in the nat table for forwarding
> (intercepting) 2 ports, and then in the FORWARD tables 2 rules per VM to just
> account traffic.

Can you try using a different NIC?


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-16  3:31     ` James Morris
@ 2006-05-16 13:49       ` Matt Ayres
  2006-05-16 15:28         ` James Morris
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-16 13:49 UTC (permalink / raw)
  To: James Morris
  Cc: xen-devel, Netfilter Development Mailinglist, Patrick McHardy,
	Linux Kernel Mailing List



James Morris wrote:
> On Mon, 15 May 2006, Matt Ayres wrote:
> 
>> I had initially sent my traces to the Xen guys.  They have not stated it is
>> NOT specific to Xen, just that's it's unlikely.  I did not experience the
>> problem with kernel 2.6.12, just with 2.6.16 (up to .13 bugfix release).  I
>> have completely disabled all support for SCTP (protocol/netfilter/conntrack)
>> as I know it is still quite buggy.  I know Xen touches the network code a lot,
>> but nothing specific to iptables.  I had contacted them twice before LKML as I
>> didn't want to post patch specific problems here.  I have no other patches
>> applied besides the Xen patch.
>>
>> My ruleset is pretty bland.  2 rules in the raw table to tell the system to
>> only track my forwarded ports, 2 rules in the nat table for forwarding
>> (intercepting) 2 ports, and then in the FORWARD tables 2 rules per VM to just
>> account traffic.
> 
> Can you try using a different NIC?
> 

This happens on 30 different hosts.  Using the same kernel I get varying 
uptime of "hasn't crashed since the upgrade to 2.6.16" to "crashes every 
day".  All are Tyan S2882D boards w/ integrated Tigon3.  The trace I 
posted to this thread indicate tg3, but in many other traces I have the 
trace doesn't include any driver calls.  They all panic in ipt_do_table. 
  I would have pasted the others, but I didn't save the System.map for 
either of them and they are all pretty similar.

Here is another that just crashed:

 >>EIP; c03d96a9 <ipt_do_table+ad/2d0>   <=====

 >>esp; c0517d8c <init_thread_union+1d8c/2000>

Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03da724 <ipt_hook+1c/20>
Trace; c03a2e50 <nf_iterate+2c/5e>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03a2f4b <nf_hook_slow+3c/c3>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03acdd8 <ip_forward+19e/22e>
Trace; c03acc04 <ip_forward_finish+0/36>
Trace; c03abcf7 <ip_rcv+40e/48f>
Trace; c037dcb5 <netif_receive_skb+255/294>
Trace; c02e82e6 <tg3_poll+532/76c>
Trace; c037bd82 <net_rx_action+aa/17c>
Trace; c011ea17 <__do_softirq+73/f0>
Trace; c011ead4 <do_softirq+40/64>
Trace; c010606b <do_IRQ+1f/25>
Trace; c02fc87f <evtchn_do_upcall+60/96>
Trace; c0104a2c <hypervisor_callback+2c/34>
Trace; c010342e <xen_idle+5e/7d>
Trace; c0103509 <cpu_idle+bc/d5>
Trace; c05184e0 <start_kernel+344/34b>

Code;  c03d96a9 <ipt_do_table+ad/2d0>
00000000 <_EIP>:
Code;  c03d96a9 <ipt_do_table+ad/2d0>   <=====
    0:   89 44 24 18               mov    %eax,0x18(%esp)   <=====
Code;  c03d96ad <ipt_do_table+b1/2d0>
    4:   89 c6                     mov    %eax,%esi
Code;  c03d96af <ipt_do_table+b3/2d0>
    6:   8b 44 24 40               mov    0x40(%esp),%eax
Code;  c03d96b3 <ipt_do_table+b7/2d0>
    a:   8b 6c 24 18               mov    0x18(%esp),%ebp
Code;  c03d96b7 <ipt_do_table+bb/2d0>
    e:   03 74 83 0c               add    0xc(%ebx,%eax,4),%esi
Code;  c03d96bb <ipt_do_table+bf/2d0>
   12:   03 6c 83 20               add    0x20(%ebx,%eax,4),%ebp
Code;  c03d96bf <ipt_do_table+c3/2d0>
   16:   c7 44 24 00 00 00 00      movl   $0x0,0x0(%esp)
Code;  c03d96c6 <ipt_do_table+ca/2d0>
   1d:   00

  <0>Kernel panic - not syncing: Fatal exception in interrupt 



Thanks,
Matt Ayres

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-16 13:49       ` [Xen-devel] " Matt Ayres
@ 2006-05-16 15:28         ` James Morris
  2006-05-18 23:58           ` Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: James Morris @ 2006-05-16 15:28 UTC (permalink / raw)
  To: Matt Ayres
  Cc: xen-devel, Netfilter Development Mailinglist, Patrick McHardy,
	Linux Kernel Mailing List

On Tue, 16 May 2006, Matt Ayres wrote:

> > > My ruleset is pretty bland.  2 rules in the raw table to tell the system
> > > to
> > > only track my forwarded ports, 2 rules in the nat table for forwarding
> > > (intercepting) 2 ports, and then in the FORWARD tables 2 rules per VM to
> > > just
> > > account traffic.
> > 
> > Can you try using a different NIC?
> > 
> 
> This happens on 30 different hosts.  Using the same kernel I get varying
> uptime of "hasn't crashed since the upgrade to 2.6.16" to "crashes every day".
> All are Tyan S2882D boards w/ integrated Tigon3.  The trace I posted to this
> thread indicate tg3, but in many other traces I have the trace doesn't include
> any driver calls.  They all panic in ipt_do_table.  I would have pasted the
> others, but I didn't save the System.map for either of them and they are all
> pretty similar.

I'm trying to suggest eliminating this driver & possible interaction with 
Xen network changes as a cause.  If you can find a different type of NIC 
to plug in and use, or even try and change all of the params for the tg3 
with ethtool, it'll help.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-16 15:28         ` James Morris
@ 2006-05-18 23:58           ` Matt Ayres
  2006-05-19  0:05             ` James Morris
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-18 23:58 UTC (permalink / raw)
  To: James Morris
  Cc: xen-devel, Netfilter Development Mailinglist, Patrick McHardy,
	Linux Kernel Mailing List



James Morris wrote:
> On Tue, 16 May 2006, Matt Ayres wrote:
> 
>>>> My ruleset is pretty bland.  2 rules in the raw table to tell the system
>>>> to
>>>> only track my forwarded ports, 2 rules in the nat table for forwarding
>>>> (intercepting) 2 ports, and then in the FORWARD tables 2 rules per VM to
>>>> just
>>>> account traffic.
>>> Can you try using a different NIC?
>>>
>> This happens on 30 different hosts.  Using the same kernel I get varying
>> uptime of "hasn't crashed since the upgrade to 2.6.16" to "crashes every day".
>> All are Tyan S2882D boards w/ integrated Tigon3.  The trace I posted to this
>> thread indicate tg3, but in many other traces I have the trace doesn't include
>> any driver calls.  They all panic in ipt_do_table.  I would have pasted the
>> others, but I didn't save the System.map for either of them and they are all
>> pretty similar.
> 
> I'm trying to suggest eliminating this driver & possible interaction with 
> Xen network changes as a cause.  If you can find a different type of NIC 
> to plug in and use, or even try and change all of the params for the tg3 
> with ethtool, it'll help.
> 

Hi,

Thank you for the assistance. Which parameters do you suggest changing? 
  TSO/flow control off?

Here is my ruleset for those interested:

# iptables -t raw -L -v
Chain OUTPUT (policy ACCEPT 27441 packets, 4832K bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain PREROUTING (policy ACCEPT 195M packets, 156G bytes)
  pkts bytes target     prot opt in     out     source 
destination
1332K  144M NOTRACK   !tcp  --  any    any     anywhere 
anywhere
    54  5293 ACCEPT     tcp  --  any    any     anywhere 
anywhere            tcp dpt:7373
  4564  223K ACCEPT     tcp  --  any    any     anywhere 
anywhere            tcp dpt:7322
  194M  156G NOTRACK    tcp  --  any    any     anywhere 
anywhere            tcp dpt:!7373
  194M  156G NOTRACK    tcp  --  any    any     anywhere 
anywhere            tcp dpt:!7322

# iptables -t nat -L -v
Chain OUTPUT (policy ACCEPT 2114 packets, 155K bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 2114 packets, 155K bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source 
destination
     6   344 DNAT       tcp  --  eth0   any     anywhere 
anywhere            tcp dpt:7373 to:host.ip.address:443
     8   408 DNAT       tcp  --  eth0   any     anywhere 
anywhere            tcp dpt:7322 to:host.ip.address:22


iptables -L -v just shows 2 rules per Virtual Machine for accounting. 
This averages about 100 rules in the FORWARD chain.  Example:

# iptables -L -v
Chain FORWARD (policy ACCEPT 195M packets, 156G bytes)
  pkts bytes target     prot opt in     out     source 
destination
     0     0            all  --  any    any     xx.xx.xx.xx 
  anywhere
     0     0            all  --  any    any     anywhere 
xx.xx.xx.xx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-18 23:58           ` Matt Ayres
@ 2006-05-19  0:05             ` James Morris
  2006-05-19  0:16               ` Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: James Morris @ 2006-05-19  0:05 UTC (permalink / raw)
  To: Matt Ayres
  Cc: xen-devel, Netfilter Development Mailinglist, Patrick McHardy,
	Linux Kernel Mailing List

On Thu, 18 May 2006, Matt Ayres wrote:

> > I'm trying to suggest eliminating this driver & possible interaction with
> > Xen network changes as a cause.  If you can find a different type of NIC to
> > plug in and use, or even try and change all of the params for the tg3 with
> > ethtool, it'll help.
> > 
> 
> Hi,
> 
> Thank you for the assistance. Which parameters do you suggest changing?
> TSO/flow control off?

Yep, anything.

> iptables -L -v just shows 2 rules per Virtual Machine for accounting. This
> averages about 100 rules in the FORWARD chain.  Example:

Do you know if the problem starts appearing after a certain number of 
hosts?


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-19  0:05             ` James Morris
@ 2006-05-19  0:16               ` Matt Ayres
  2006-05-19  0:45                 ` Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-19  0:16 UTC (permalink / raw)
  To: James Morris
  Cc: xen-devel, Netfilter Development Mailinglist, Patrick McHardy,
	Linux Kernel Mailing List



James Morris wrote:
> On Thu, 18 May 2006, Matt Ayres wrote:
> 
>>> I'm trying to suggest eliminating this driver & possible interaction with
>>> Xen network changes as a cause.  If you can find a different type of NIC to
>>> plug in and use, or even try and change all of the params for the tg3 with
>>> ethtool, it'll help.
>>>
>> Hi,
>>
>> Thank you for the assistance. Which parameters do you suggest changing?
>> TSO/flow control off?
> 
> Yep, anything.

Ok, "ethtool -K eth0 rx off tx off sg off tso off" should have turned it 
  all off.

> 
>> iptables -L -v just shows 2 rules per Virtual Machine for accounting. This
>> averages about 100 rules in the FORWARD chain.  Example:
> 
> Do you know if the problem starts appearing after a certain number of 
> hosts?
> 

No... I have some servers that are running just 2.6.16-xen (no bugfix 
patches) for 30 days without a problem, some of these have rulesets 
larger than the ones that crash daily. I'd estimate this affects 90% of 
my servers, just some reboot daily and others can make it to 7-10 days.

Thanks,
Matt


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-19  0:16               ` Matt Ayres
@ 2006-05-19  0:45                 ` Matt Ayres
  2006-05-21 17:43                   ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-19  0:45 UTC (permalink / raw)
  To: Matt Ayres
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Patrick McHardy, Linux Kernel Mailing List



Matt Ayres wrote:
> 
> 
> James Morris wrote:
>> On Thu, 18 May 2006, Matt Ayres wrote:
>>
>>>> I'm trying to suggest eliminating this driver & possible interaction 
>>>> with
>>>> Xen network changes as a cause.  If you can find a different type of 
>>>> NIC to
>>>> plug in and use, or even try and change all of the params for the 
>>>> tg3 with
>>>> ethtool, it'll help.
>>>>
>>> Hi,
>>>
>>> Thank you for the assistance. Which parameters do you suggest changing?
>>> TSO/flow control off?
>>
>> Yep, anything.
> 
> Ok, "ethtool -K eth0 rx off tx off sg off tso off" should have turned it 
>  all off.
> 

I think I confirmed the NIC is not the source of the problem.  A few of 
my servers have e100/tulip NIC's due to a bug with the chipset of the 
on-board TG3 cards firmware and TSO.  These servers that use the 
e100/tulip drivers also experience the ipt_do_table bug.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-19  0:45                 ` Matt Ayres
@ 2006-05-21 17:43                   ` Patrick McHardy
  2006-05-22 14:31                     ` Matt Ayres
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-05-21 17:43 UTC (permalink / raw)
  To: Matt Ayres
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Linux Kernel Mailing List

Matt Ayres wrote:
> I think I confirmed the NIC is not the source of the problem.  A few of
> my servers have e100/tulip NIC's due to a bug with the chipset of the
> on-board TG3 cards firmware and TSO.  These servers that use the
> e100/tulip drivers also experience the ipt_do_table bug.

There is an identical report in the netfilter bugzilla, also crashes
(on x86_64) in ipt_do_table with Xen. I haven't heard anything of
similar crashes without Xen, so I doubt that the bug is in the
netfilter code.

https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=478

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-21 17:43                   ` Patrick McHardy
@ 2006-05-22 14:31                     ` Matt Ayres
  2006-05-22 14:42                       ` Keir Fraser
  2006-05-22 14:43                       ` Patrick McHardy
  0 siblings, 2 replies; 20+ messages in thread
From: Matt Ayres @ 2006-05-22 14:31 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Linux Kernel Mailing List



Patrick McHardy wrote:
> Matt Ayres wrote:
>> I think I confirmed the NIC is not the source of the problem.  A few of
>> my servers have e100/tulip NIC's due to a bug with the chipset of the
>> on-board TG3 cards firmware and TSO.  These servers that use the
>> e100/tulip drivers also experience the ipt_do_table bug.
> 
> There is an identical report in the netfilter bugzilla, also crashes
> (on x86_64) in ipt_do_table with Xen. I haven't heard anything of
> similar crashes without Xen, so I doubt that the bug is in the
> netfilter code.
> 
> https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=478

Yep... too coincidental.  I'd say it has _something_ to do with Xen. 
I've been doing different things on my side to try to reduce the 
severity of the problem, but I'd really like to hear what the Xen guys 
have to say about this now..

Thanks,
Matt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-22 14:31                     ` Matt Ayres
@ 2006-05-22 14:42                       ` Keir Fraser
  2006-05-22 14:43                       ` Patrick McHardy
  1 sibling, 0 replies; 20+ messages in thread
From: Keir Fraser @ 2006-05-22 14:42 UTC (permalink / raw)
  To: Matt Ayres
  Cc: Patrick McHardy, James Morris, xen-devel,
	Netfilter Development Mailinglist, Linux Kernel Mailing List


On 22 May 2006, at 15:31, Matt Ayres wrote:

>> There is an identical report in the netfilter bugzilla, also crashes
>> (on x86_64) in ipt_do_table with Xen. I haven't heard anything of
>> similar crashes without Xen, so I doubt that the bug is in the
>> netfilter code.
>> https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=478
>
> Yep... too coincidental.  I'd say it has _something_ to do with Xen. 
> I've been doing different things on my side to try to reduce the 
> severity of the problem, but I'd really like to hear what the Xen guys 
> have to say about this now..

If you can provide a vmlinux image and a backtrace we'll take a look.

  -- Keir


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-22 14:31                     ` Matt Ayres
  2006-05-22 14:42                       ` Keir Fraser
@ 2006-05-22 14:43                       ` Patrick McHardy
  2006-05-23  9:54                         ` Keir Fraser
  2006-05-23 21:15                         ` Keir Fraser
  1 sibling, 2 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-05-22 14:43 UTC (permalink / raw)
  To: Matt Ayres
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Linux Kernel Mailing List

Matt Ayres wrote:
> 
> 
> Patrick McHardy wrote:
> 
>> Matt Ayres wrote:
>>
>>> I think I confirmed the NIC is not the source of the problem.  A few of
>>> my servers have e100/tulip NIC's due to a bug with the chipset of the
>>> on-board TG3 cards firmware and TSO.  These servers that use the
>>> e100/tulip drivers also experience the ipt_do_table bug.
>>
>>
>> There is an identical report in the netfilter bugzilla, also crashes
>> (on x86_64) in ipt_do_table with Xen. I haven't heard anything of
>> similar crashes without Xen, so I doubt that the bug is in the
>> netfilter code.
>>
>> https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=478
> 
> 
> Yep... too coincidental.  I'd say it has _something_ to do with Xen.
> I've been doing different things on my side to try to reduce the
> severity of the problem, but I'd really like to hear what the Xen guys
> have to say about this now..


Maybe this helps: there is not too much the Xen code could be doing
wrong here. If I read your crash correctly it happend in the FORWARD
chain, which could mean that the outgoing device (probably the Xen
virtual network driver) has some bugs, but iptables really only cares
about the names at this point, which practically can't be bogus.
The only other thing I can imagine is that something is wrong with
the per-CPU copy of the ruleset, i.e. either smp_processor_id is
returning garbage or for_each_possible_cpu misses a CPU during
initialization. I have no idea if Xen really does touch this code,
but other than that I don't really see what it could break.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-22 14:43                       ` Patrick McHardy
@ 2006-05-23  9:54                         ` Keir Fraser
  2006-05-23 12:03                           ` Matt Ayres
  2006-05-23 21:15                         ` Keir Fraser
  1 sibling, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-05-23  9:54 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Linux Kernel Mailing List, Matt Ayres


On 22 May 2006, at 15:43, Patrick McHardy wrote:

> The only other thing I can imagine is that something is wrong with
> the per-CPU copy of the ruleset, i.e. either smp_processor_id is
> returning garbage or for_each_possible_cpu misses a CPU during
> initialization. I have no idea if Xen really does touch this code,
> but other than that I don't really see what it could break.

Of the options you consider, this sounds most likely. Really we need 
some more info from a crash: I'd like to get disassembly from a vmlinux 
image if that's possible, Matt.

  Thanks,
  Keir


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-23  9:54                         ` Keir Fraser
@ 2006-05-23 12:03                           ` Matt Ayres
  0 siblings, 0 replies; 20+ messages in thread
From: Matt Ayres @ 2006-05-23 12:03 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Patrick McHardy, James Morris, xen-devel,
	Netfilter Development Mailinglist, Linux Kernel Mailing List

Keir Fraser wrote:
> 
> On 22 May 2006, at 15:43, Patrick McHardy wrote:
> 
>> The only other thing I can imagine is that something is wrong with
>> the per-CPU copy of the ruleset, i.e. either smp_processor_id is
>> returning garbage or for_each_possible_cpu misses a CPU during
>> initialization. I have no idea if Xen really does touch this code,
>> but other than that I don't really see what it could break.
> 
> Of the options you consider, this sounds most likely. Really we need 
> some more info from a crash: I'd like to get disassembly from a vmlinux 
> image if that's possible, Matt.
> 

I have an un-stripped vmlinux built with kernel debugging and the 
corresponding System.map.  I will be sending these to you privately 
shortly.  You can see the multiple traces sent to this list.

It appears having the bandwidth accounting being performed by count 
rules in the FORWARD chain is causing it for my setup.  I suppose I 
could optimize this to make the kernel spend as little time in 
ipt_do_table in regards to the FORWARD chain.  I flushed the FORWARD 
chain (normally 100-120 rules) and have not experienced any crashes so 
far... disabling bandwidth monitoring is by no means a long term fix.

It might be more generic in the symptoms, perhaps just any chain with 
many rules and lots of traffic or It's just the FORWARD one that seems 
to be doing it for me as that is where ipt_do_table spends most of it's 
time.

Thanks,
Matt Ayres

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-22 14:43                       ` Patrick McHardy
  2006-05-23  9:54                         ` Keir Fraser
@ 2006-05-23 21:15                         ` Keir Fraser
  2006-05-23 21:23                           ` Matt Ayres
  1 sibling, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-05-23 21:15 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: James Morris, xen-devel, Netfilter Development Mailinglist,
	Linux Kernel Mailing List, Matt Ayres


On 22 May 2006, at 15:43, Patrick McHardy wrote:

> Maybe this helps: there is not too much the Xen code could be doing
> wrong here. If I read your crash correctly it happend in the FORWARD
> chain, which could mean that the outgoing device (probably the Xen
> virtual network driver) has some bugs, but iptables really only cares
> about the names at this point, which practically can't be bogus.
> The only other thing I can imagine is that something is wrong with
> the per-CPU copy of the ruleset, i.e. either smp_processor_id is
> returning garbage or for_each_possible_cpu misses a CPU during
> initialization. I have no idea if Xen really does touch this code,
> but other than that I don't really see what it could break.

Having looked at disassembly, the fault happens when accessing 
e->ip.invflags in ip_packet_match() inlined inside ipt_do_table().

  e = private->entries[smp_processor_id()] + 
private->hook_entry[NF_IP_FORWARD]

smp_processor_id() should be 0 (since the oops appears to occur on 
cpu0) and presumably all the ipt_entry structures are static once set 
up. Since this crash happens on a common path in ipt_do_table(), and 
since it happens only after the system has been up a while (I 
believe?), it rather looks as though something has either corrupted a 
pointer or unmapped memory from under iptables' feet.

  -- Keir


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-23 21:15                         ` Keir Fraser
@ 2006-05-23 21:23                           ` Matt Ayres
  2006-05-23 21:27                             ` Keir Fraser
  0 siblings, 1 reply; 20+ messages in thread
From: Matt Ayres @ 2006-05-23 21:23 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Patrick McHardy, xen-devel, Netfilter Development Mailinglist,
	James Morris, Linux Kernel Mailing List



Keir Fraser wrote:
> 
> On 22 May 2006, at 15:43, Patrick McHardy wrote:
> 
>> Maybe this helps: there is not too much the Xen code could be doing
>> wrong here. If I read your crash correctly it happend in the FORWARD
>> chain, which could mean that the outgoing device (probably the Xen
>> virtual network driver) has some bugs, but iptables really only cares
>> about the names at this point, which practically can't be bogus.
>> The only other thing I can imagine is that something is wrong with
>> the per-CPU copy of the ruleset, i.e. either smp_processor_id is
>> returning garbage or for_each_possible_cpu misses a CPU during
>> initialization. I have no idea if Xen really does touch this code,
>> but other than that I don't really see what it could break.
> 
> Having looked at disassembly, the fault happens when accessing 
> e->ip.invflags in ip_packet_match() inlined inside ipt_do_table().
> 
>  e = private->entries[smp_processor_id()] + 
> private->hook_entry[NF_IP_FORWARD]
> 
> smp_processor_id() should be 0 (since the oops appears to occur on cpu0) 
> and presumably all the ipt_entry structures are static once set up. 
> Since this crash happens on a common path in ipt_do_table(), and since 
> it happens only after the system has been up a while (I believe?), it 
> rather looks as though something has either corrupted a pointer or 
> unmapped memory from under iptables' feet.
> 

As the concerned user, what does this mean to me?  It will only affect 
SMP systems?  It is a bug in Xen or netfilter?

I'd just like to understand what is going on.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-23 21:23                           ` Matt Ayres
@ 2006-05-23 21:27                             ` Keir Fraser
  2006-05-24  7:16                               ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-05-23 21:27 UTC (permalink / raw)
  To: Matt Ayres
  Cc: Patrick McHardy, James Morris, xen-devel,
	Netfilter Development Mailinglist, Linux Kernel Mailing List


On 23 May 2006, at 22:23, Matt Ayres wrote:

>> Having looked at disassembly, the fault happens when accessing 
>> e->ip.invflags in ip_packet_match() inlined inside ipt_do_table().
>>  e = private->entries[smp_processor_id()] + 
>> private->hook_entry[NF_IP_FORWARD]
>> smp_processor_id() should be 0 (since the oops appears to occur on 
>> cpu0) and presumably all the ipt_entry structures are static once set 
>> up. Since this crash happens on a common path in ipt_do_table(), and 
>> since it happens only after the system has been up a while (I 
>> believe?), it rather looks as though something has either corrupted a 
>> pointer or unmapped memory from under iptables' feet.
>
> As the concerned user, what does this mean to me?  It will only affect 
> SMP systems?  It is a bug in Xen or netfilter?

Probably a Xen bug, but if so then it's basically a memory corruption. 
It's weird it would hit the iptables rules every time though, and not 
be a more random crash. This might well need reproducing in a developer 
test-machine environment to stand a chance of tracking down.

  -- Keir

> I'd just like to understand what is going on.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: Panic in ipt_do_table with 2.6.16.13-xen
  2006-05-23 21:27                             ` Keir Fraser
@ 2006-05-24  7:16                               ` Gerd Hoffmann
  0 siblings, 0 replies; 20+ messages in thread
From: Gerd Hoffmann @ 2006-05-24  7:16 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Matt Ayres, Patrick McHardy, James Morris, xen-devel,
	Netfilter Development Mailinglist, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

>> As the concerned user, what does this mean to me?  It will only affect
>> SMP systems?  It is a bug in Xen or netfilter?
> 
> Probably a Xen bug, but if so then it's basically a memory corruption.

Might also be a netfilter bug which is simply triggered by the way how
xen manages the memory.  Due to ballooning you can have holes in memory,
so out-of-range access may fault with xen whereas it will go unnoticed
with normal kernels.

One such beast is in bridging netfilter code, additionally it triggers
with certain ethernet cards only, patch below.  Pinned down last week ;)

cheers,

  Gerd

[-- Attachment #2: nf_bridge-header-size --]
[-- Type: text/plain, Size: 1279 bytes --]

Subject: nf_bridge: ethernet header is 14 not 16 bytes
From: jbeulich@novell.com
Acked-by: kraxel@suse.de
References: 150410

The bridge netfilter code saves two more bytes that it should.
In most cases it doesn't hurt because many drivers use NET_IP_ALIGN
to make the IP header aligned, so there are two extra bytes head room
available.

Some drivers don't do that though (sky2 for example), so copying
accesses data outside the skbuff data allocation.  On xen kernels
this can kill the machine with a page fault due to the way how
skbuffs are allocated and the memory is managed.

Index: linux-2.6.16/include/linux/netfilter_bridge.h
===================================================================
--- linux-2.6.16.orig/include/linux/netfilter_bridge.h
+++ linux-2.6.16/include/linux/netfilter_bridge.h
@@ -73,14 +73,14 @@ void nf_bridge_maybe_copy_header(struct 
 			memcpy(skb->data - 18, skb->nf_bridge->data, 18);
 			skb_push(skb, 4);
 		} else
-			memcpy(skb->data - 16, skb->nf_bridge->data, 16);
+			memcpy(skb->data - 14, skb->nf_bridge->data, 14);
 	}
 }
 
 static inline
 void nf_bridge_save_header(struct sk_buff *skb)
 {
-        int header_size = 16;
+        int header_size = 14;
 
 	if (skb->protocol == __constant_htons(ETH_P_8021Q))
 		header_size = 18;

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-05-24  7:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-15 17:46 Panic in ipt_do_table with 2.6.16.13-xen Matt Ayres
2006-05-15 19:27 ` Patrick McHardy
2006-05-16  0:01   ` Matt Ayres
2006-05-16  3:31     ` James Morris
2006-05-16 13:49       ` [Xen-devel] " Matt Ayres
2006-05-16 15:28         ` James Morris
2006-05-18 23:58           ` Matt Ayres
2006-05-19  0:05             ` James Morris
2006-05-19  0:16               ` Matt Ayres
2006-05-19  0:45                 ` Matt Ayres
2006-05-21 17:43                   ` Patrick McHardy
2006-05-22 14:31                     ` Matt Ayres
2006-05-22 14:42                       ` Keir Fraser
2006-05-22 14:43                       ` Patrick McHardy
2006-05-23  9:54                         ` Keir Fraser
2006-05-23 12:03                           ` Matt Ayres
2006-05-23 21:15                         ` Keir Fraser
2006-05-23 21:23                           ` Matt Ayres
2006-05-23 21:27                             ` Keir Fraser
2006-05-24  7:16                               ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).