All of lore.kernel.org
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
@ 2015-08-17 21:47 Bjoern Franke
  2015-08-18  8:08 ` Bastian Bittorf
  0 siblings, 1 reply; 7+ messages in thread
From: Bjoern Franke @ 2015-08-17 21:47 UTC (permalink / raw)
  To: b.a.t.m.a.n

Hi,

we are using batman-adv in our Freifunk network for meshing over
several interfaces. The nodes are connected to vpn-servers via fastd,
the vpn-servers are connected via tincd or gretap.

When we put the fastd-interface and the tinc (or gretap)-inteface into
bat0, the kernel crashes after some time.  With 2013.4, no crashes
occured, but since the update to 2014.3 the issue happens sometimes
after 5 minutes, sometimes after 3 days. Upgrading to 2015.1 did not
solve the problem, it seems the crashes occur more often then.

Unfornately we have no logs now, because after the crashes no ssh
-access to the servers are possible.

nc / mm / bl are set disabled and seem to have no effect on this issue.

Regards
Bjoern
-- 
xmpp bjo@schafweide.org 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-17 21:47 [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0 Bjoern Franke
@ 2015-08-18  8:08 ` Bastian Bittorf
  2015-08-18 16:35   ` Bjoern Franke
  0 siblings, 1 reply; 7+ messages in thread
From: Bastian Bittorf @ 2015-08-18  8:08 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

* Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> Unfornately we have no logs now, because after the crashes no ssh
> -access to the servers are possible.

try to set

/sbin/sysctl -w kernel.panic_on_oops=1
/sbin/sysctl -w kernel.panic=10
/sbin/sysctl -w vm.panic_on_oom=2

if the devices crashed (and reboots now) you have
the crash in the file:

/sys/kernel/debug/crashlog

bye, bastian - happy crashing!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-18  8:08 ` Bastian Bittorf
@ 2015-08-18 16:35   ` Bjoern Franke
  2015-08-20 10:08     ` Hans-Werner Hilse
  2015-08-20 11:09     ` Simon Wunderlich
  0 siblings, 2 replies; 7+ messages in thread
From: Bjoern Franke @ 2015-08-18 16:35 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Am Dienstag, den 18.08.2015, 10:08 +0200 schrieb Bastian Bittorf:
> * Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> > Unfornately we have no logs now, because after the crashes no ssh
> > -access to the servers are possible.
> 
> try to set
> 
> /sbin/sysctl -w kernel.panic_on_oops=1
> /sbin/sysctl -w kernel.panic=10
> /sbin/sysctl -w vm.panic_on_oom=2
> 
> if the devices crashed (and reboots now) you have
> the crash in the file:
> 
> /sys/kernel/debug/crashlog
> 

Thanks for the hint, it did not work on the debian machines, but I got
the systems running with crashkernel enabled. Now we got the first
crash:
https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2

Regards
Bjoern
-- 
xmpp bjo@schafweide.org 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-18 16:35   ` Bjoern Franke
@ 2015-08-20 10:08     ` Hans-Werner Hilse
  2015-08-20 11:10       ` Matthias Schiffer
  2015-08-23 17:59       ` Bjoern Franke
  2015-08-20 11:09     ` Simon Wunderlich
  1 sibling, 2 replies; 7+ messages in thread
From: Hans-Werner Hilse @ 2015-08-20 10:08 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Hi,

Am 2015-08-18 18:35, schrieb Bjoern Franke:

> Thanks for the hint, it did not work on the debian machines, but I got
> the systems running with crashkernel enabled. Now we got the first
> crash:
> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2

We've seen these on our Goettingen Freifunk gateways, too. There, too, 
batadv_frag_purge_orig was the smoking gun. However, I didn't report it, 
because:

- first and foremost, we were using the outdated legacy 2013.4 version
- it was most probably an issue with RCU lists
- and either disabling SMP or using a much more current kernel fixed it.

So I blamed a buggy RCU implementation in older kernels, plus maybe some 
ill behaviour in the old batman-adv codebase. The crashing kernel was 
the old debian-wheezy one - pretty old, I'd say.

-hwh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-18 16:35   ` Bjoern Franke
  2015-08-20 10:08     ` Hans-Werner Hilse
@ 2015-08-20 11:09     ` Simon Wunderlich
  1 sibling, 0 replies; 7+ messages in thread
From: Simon Wunderlich @ 2015-08-20 11:09 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

Hi Bjoern,

thanks a lot for reporting this issue - it looks like there are some problems 
in the cleanup of OGMs when fragmentation is used.

For now, I've created a ticket here:

https://www.open-mesh.org/issues/223

Thanks,
    Simon

On Tuesday 18 August 2015 18:35:17 Bjoern Franke wrote:
> Am Dienstag, den 18.08.2015, 10:08 +0200 schrieb Bastian Bittorf:
> > * Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> > > Unfornately we have no logs now, because after the crashes no ssh
> > > -access to the servers are possible.
> > 
> > try to set
> > 
> > /sbin/sysctl -w kernel.panic_on_oops=1
> > /sbin/sysctl -w kernel.panic=10
> > /sbin/sysctl -w vm.panic_on_oom=2
> > 
> > if the devices crashed (and reboots now) you have
> > the crash in the file:
> > 
> > /sys/kernel/debug/crashlog
> 
> Thanks for the hint, it did not work on the debian machines, but I got
> the systems running with crashkernel enabled. Now we got the first
> crash:
> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
> 
> Regards
> Bjoern

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-20 10:08     ` Hans-Werner Hilse
@ 2015-08-20 11:10       ` Matthias Schiffer
  2015-08-23 17:59       ` Bjoern Franke
  1 sibling, 0 replies; 7+ messages in thread
From: Matthias Schiffer @ 2015-08-20 11:10 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

On 08/20/2015 12:08 PM, Hans-Werner Hilse wrote:
> Hi,
> 
> Am 2015-08-18 18:35, schrieb Bjoern Franke:
> 
>> Thanks for the hint, it did not work on the debian machines, but I got
>> the systems running with crashkernel enabled. Now we got the first
>> crash:
>> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
> 
> We've seen these on our Goettingen Freifunk gateways, too. There, too,
> batadv_frag_purge_orig was the smoking gun. However, I didn't report it,
> because:
> 
> - first and foremost, we were using the outdated legacy 2013.4 version
> - it was most probably an issue with RCU lists
> - and either disabling SMP or using a much more current kernel fixed it.
> 
> So I blamed a buggy RCU implementation in older kernels, plus maybe some
> ill behaviour in the old batman-adv codebase. The crashing kernel was
> the old debian-wheezy one - pretty old, I'd say.
> 
> -hwh

This is an independent bug (2013.4 uses a completely different
fragmentation implementation) that has been reported in
https://github.com/freifunk-gluon/batman-adv-legacy/issues/1 . Please
don't bother the upstream BATMAN developers with batman-adv-legacy bugs.

Matthias


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
  2015-08-20 10:08     ` Hans-Werner Hilse
  2015-08-20 11:10       ` Matthias Schiffer
@ 2015-08-23 17:59       ` Bjoern Franke
  1 sibling, 0 replies; 7+ messages in thread
From: Bjoern Franke @ 2015-08-23 17:59 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Hi,

> We've seen these on our Goettingen Freifunk gateways, too. There,
> too, 
> batadv_frag_purge_orig was the smoking gun. However, I didn't report
> it, 
> because:
> 
> - first and foremost, we were using the outdated legacy 2013.4
> version

We had also some other issues with older versions, so we upgraded to
2015.1 hoping the gateways will run stable.

> - it was most probably an issue with RCU lists
> - and either disabling SMP or using a much more current kernel fixed
> it.

Did you build own kernels with disabled smp?

> So I blamed a buggy RCU implementation in older kernels, plus maybe
> some 
> ill behaviour in the old batman-adv codebase. The crashing kernel was
> the old debian-wheezy one - pretty old, I'd say.
> 

We are running 3.16 and partially upgraded to 4.1 for testing purposes.
But we have some "general protection fault: 0000 [#1] SMP" unrelated to
batman also on the gateways - with different hardware. For the record,
a KVM gateway (the other ones are dedicated servers without
virtualization) does not crash.

Regards
Bjoern

-- 
xmpp bjo@schafweide.org 





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-23 17:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 21:47 [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0 Bjoern Franke
2015-08-18  8:08 ` Bastian Bittorf
2015-08-18 16:35   ` Bjoern Franke
2015-08-20 10:08     ` Hans-Werner Hilse
2015-08-20 11:10       ` Matthias Schiffer
2015-08-23 17:59       ` Bjoern Franke
2015-08-20 11:09     ` Simon Wunderlich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.