* [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
@ 2015-08-17 21:47 Bjoern Franke
2015-08-18 8:08 ` Bastian Bittorf
0 siblings, 1 reply; 7+ messages in thread
From: Bjoern Franke @ 2015-08-17 21:47 UTC (permalink / raw)
To: b.a.t.m.a.n
Hi,
we are using batman-adv in our Freifunk network for meshing over
several interfaces. The nodes are connected to vpn-servers via fastd,
the vpn-servers are connected via tincd or gretap.
When we put the fastd-interface and the tinc (or gretap)-inteface into
bat0, the kernel crashes after some time. With 2013.4, no crashes
occured, but since the update to 2014.3 the issue happens sometimes
after 5 minutes, sometimes after 3 days. Upgrading to 2015.1 did not
solve the problem, it seems the crashes occur more often then.
Unfornately we have no logs now, because after the crashes no ssh
-access to the servers are possible.
nc / mm / bl are set disabled and seem to have no effect on this issue.
Regards
Bjoern
--
xmpp bjo@schafweide.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-17 21:47 [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0 Bjoern Franke
@ 2015-08-18 8:08 ` Bastian Bittorf
2015-08-18 16:35 ` Bjoern Franke
0 siblings, 1 reply; 7+ messages in thread
From: Bastian Bittorf @ 2015-08-18 8:08 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
* Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> Unfornately we have no logs now, because after the crashes no ssh
> -access to the servers are possible.
try to set
/sbin/sysctl -w kernel.panic_on_oops=1
/sbin/sysctl -w kernel.panic=10
/sbin/sysctl -w vm.panic_on_oom=2
if the devices crashed (and reboots now) you have
the crash in the file:
/sys/kernel/debug/crashlog
bye, bastian - happy crashing!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-18 8:08 ` Bastian Bittorf
@ 2015-08-18 16:35 ` Bjoern Franke
2015-08-20 10:08 ` Hans-Werner Hilse
2015-08-20 11:09 ` Simon Wunderlich
0 siblings, 2 replies; 7+ messages in thread
From: Bjoern Franke @ 2015-08-18 16:35 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
Am Dienstag, den 18.08.2015, 10:08 +0200 schrieb Bastian Bittorf:
> * Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> > Unfornately we have no logs now, because after the crashes no ssh
> > -access to the servers are possible.
>
> try to set
>
> /sbin/sysctl -w kernel.panic_on_oops=1
> /sbin/sysctl -w kernel.panic=10
> /sbin/sysctl -w vm.panic_on_oom=2
>
> if the devices crashed (and reboots now) you have
> the crash in the file:
>
> /sys/kernel/debug/crashlog
>
Thanks for the hint, it did not work on the debian machines, but I got
the systems running with crashkernel enabled. Now we got the first
crash:
https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
Regards
Bjoern
--
xmpp bjo@schafweide.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-18 16:35 ` Bjoern Franke
@ 2015-08-20 10:08 ` Hans-Werner Hilse
2015-08-20 11:10 ` Matthias Schiffer
2015-08-23 17:59 ` Bjoern Franke
2015-08-20 11:09 ` Simon Wunderlich
1 sibling, 2 replies; 7+ messages in thread
From: Hans-Werner Hilse @ 2015-08-20 10:08 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
Hi,
Am 2015-08-18 18:35, schrieb Bjoern Franke:
> Thanks for the hint, it did not work on the debian machines, but I got
> the systems running with crashkernel enabled. Now we got the first
> crash:
> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
We've seen these on our Goettingen Freifunk gateways, too. There, too,
batadv_frag_purge_orig was the smoking gun. However, I didn't report it,
because:
- first and foremost, we were using the outdated legacy 2013.4 version
- it was most probably an issue with RCU lists
- and either disabling SMP or using a much more current kernel fixed it.
So I blamed a buggy RCU implementation in older kernels, plus maybe some
ill behaviour in the old batman-adv codebase. The crashing kernel was
the old debian-wheezy one - pretty old, I'd say.
-hwh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-18 16:35 ` Bjoern Franke
2015-08-20 10:08 ` Hans-Werner Hilse
@ 2015-08-20 11:09 ` Simon Wunderlich
1 sibling, 0 replies; 7+ messages in thread
From: Simon Wunderlich @ 2015-08-20 11:09 UTC (permalink / raw)
To: b.a.t.m.a.n
[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]
Hi Bjoern,
thanks a lot for reporting this issue - it looks like there are some problems
in the cleanup of OGMs when fragmentation is used.
For now, I've created a ticket here:
https://www.open-mesh.org/issues/223
Thanks,
Simon
On Tuesday 18 August 2015 18:35:17 Bjoern Franke wrote:
> Am Dienstag, den 18.08.2015, 10:08 +0200 schrieb Bastian Bittorf:
> > * Bjoern Franke <bjo@nord-west.org> [18.08.2015 09:47]:
> > > Unfornately we have no logs now, because after the crashes no ssh
> > > -access to the servers are possible.
> >
> > try to set
> >
> > /sbin/sysctl -w kernel.panic_on_oops=1
> > /sbin/sysctl -w kernel.panic=10
> > /sbin/sysctl -w vm.panic_on_oom=2
> >
> > if the devices crashed (and reboots now) you have
> > the crash in the file:
> >
> > /sys/kernel/debug/crashlog
>
> Thanks for the hint, it did not work on the debian machines, but I got
> the systems running with crashkernel enabled. Now we got the first
> crash:
> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
>
> Regards
> Bjoern
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-20 10:08 ` Hans-Werner Hilse
@ 2015-08-20 11:10 ` Matthias Schiffer
2015-08-23 17:59 ` Bjoern Franke
1 sibling, 0 replies; 7+ messages in thread
From: Matthias Schiffer @ 2015-08-20 11:10 UTC (permalink / raw)
To: b.a.t.m.a.n
[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]
On 08/20/2015 12:08 PM, Hans-Werner Hilse wrote:
> Hi,
>
> Am 2015-08-18 18:35, schrieb Bjoern Franke:
>
>> Thanks for the hint, it did not work on the debian machines, but I got
>> the systems running with crashkernel enabled. Now we got the first
>> crash:
>> https://p.rrbone.net/paste/nnNHrIJI#oHfBMOs2
>
> We've seen these on our Goettingen Freifunk gateways, too. There, too,
> batadv_frag_purge_orig was the smoking gun. However, I didn't report it,
> because:
>
> - first and foremost, we were using the outdated legacy 2013.4 version
> - it was most probably an issue with RCU lists
> - and either disabling SMP or using a much more current kernel fixed it.
>
> So I blamed a buggy RCU implementation in older kernels, plus maybe some
> ill behaviour in the old batman-adv codebase. The crashing kernel was
> the old debian-wheezy one - pretty old, I'd say.
>
> -hwh
This is an independent bug (2013.4 uses a completely different
fragmentation implementation) that has been reported in
https://github.com/freifunk-gluon/batman-adv-legacy/issues/1 . Please
don't bother the upstream BATMAN developers with batman-adv-legacy bugs.
Matthias
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0
2015-08-20 10:08 ` Hans-Werner Hilse
2015-08-20 11:10 ` Matthias Schiffer
@ 2015-08-23 17:59 ` Bjoern Franke
1 sibling, 0 replies; 7+ messages in thread
From: Bjoern Franke @ 2015-08-23 17:59 UTC (permalink / raw)
To: The list for a Better Approach To Mobile Ad-hoc Networking
Hi,
> We've seen these on our Goettingen Freifunk gateways, too. There,
> too,
> batadv_frag_purge_orig was the smoking gun. However, I didn't report
> it,
> because:
>
> - first and foremost, we were using the outdated legacy 2013.4
> version
We had also some other issues with older versions, so we upgraded to
2015.1 hoping the gateways will run stable.
> - it was most probably an issue with RCU lists
> - and either disabling SMP or using a much more current kernel fixed
> it.
Did you build own kernels with disabled smp?
> So I blamed a buggy RCU implementation in older kernels, plus maybe
> some
> ill behaviour in the old batman-adv codebase. The crashing kernel was
> the old debian-wheezy one - pretty old, I'd say.
>
We are running 3.16 and partially upgraded to 4.1 for testing purposes.
But we have some "general protection fault: 0000 [#1] SMP" unrelated to
batman also on the gateways - with different hardware. For the record,
a KVM gateway (the other ones are dedicated servers without
virtualization) does not crash.
Regards
Bjoern
--
xmpp bjo@schafweide.org
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-08-23 17:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 21:47 [B.A.T.M.A.N.] Kernel crashes when using more than one interface in bat0 Bjoern Franke
2015-08-18 8:08 ` Bastian Bittorf
2015-08-18 16:35 ` Bjoern Franke
2015-08-20 10:08 ` Hans-Werner Hilse
2015-08-20 11:10 ` Matthias Schiffer
2015-08-23 17:59 ` Bjoern Franke
2015-08-20 11:09 ` Simon Wunderlich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.