b.a.t.m.a.n.lists.open-mesh.org archive mirror
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check
@ 2015-11-13 19:46 Sven Eckelmann
  2016-01-19  0:39 ` Antonio Quartulli
  0 siblings, 1 reply; 5+ messages in thread
From: Sven Eckelmann @ 2015-11-13 19:46 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Jetmir Gigollai

batman-adv checks in different situation if a new device is already on top
of a different batman-adv device. This is done by getting the iflink of a
device and all its parent. It assumes that this iflink is always a parent
device in an acyclic graph. But this assumption is broken by devices like
veth which are actually a pair of two devices linked to each other. The
recursive check would therefore get veth0 when calling dev_get_iflink on
veth1. And it gets veth0 when calling dev_get_iflink with veth1.

Creating a veth pair and loading batman-adv freezes parts of the system

    ip link add veth0 type veth peer name veth1
    modprobe batman-adv

An RCU stall will be detected on the system which cannot be fixed.

    INFO: rcu_sched self-detected stall on CPU
            1: (5264 ticks this GP) idle=3e9/140000000000001/0
    softirq=144683/144686 fqs=5249
             (t=5250 jiffies g=46 c=45 q=43)
    Task dump for CPU 1:
    insmod          R  running task        0   247    245 0x00000008
     ffffffff8151f140 ffffffff8107888e ffff88000fd141c0 ffffffff8151f140
     0000000000000000 ffffffff81552df0 ffffffff8107b420 0000000000000001
     ffff88000e3fa700 ffffffff81540b00 ffffffff8107d667 0000000000000001
    Call Trace:
     <IRQ>  [<ffffffff8107888e>] ? rcu_dump_cpu_stacks+0x7e/0xd0
     [<ffffffff8107b420>] ? rcu_check_callbacks+0x3f0/0x6b0
     [<ffffffff8107d667>] ? hrtimer_run_queues+0x47/0x180
     [<ffffffff8107cf9d>] ? update_process_times+0x2d/0x50
     [<ffffffff810873fb>] ? tick_handle_periodic+0x1b/0x60
     [<ffffffff810290ae>] ? smp_trace_apic_timer_interrupt+0x5e/0x90
     [<ffffffff813bbae2>] ? apic_timer_interrupt+0x82/0x90
     <EOI>  [<ffffffff812c3fd7>] ? __dev_get_by_index+0x37/0x40
     [<ffffffffa0031f3e>] ? batadv_hard_if_event+0xee/0x3a0 [batman_adv]
     [<ffffffff812c5801>] ? register_netdevice_notifier+0x81/0x1a0
    [...]

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Jetmir Gigollai <jetmir_gigollaj@web.de>
---
 compat-include/linux/netdevice.h |  6 ------
 net/batman-adv/hard-interface.c  | 42 ----------------------------------------
 2 files changed, 48 deletions(-)

diff --git a/compat-include/linux/netdevice.h b/compat-include/linux/netdevice.h
index f19f624..0e4fcc8 100644
--- a/compat-include/linux/netdevice.h
+++ b/compat-include/linux/netdevice.h
@@ -111,10 +111,4 @@ static inline int batadv_netdev_set_master(struct net_device *slave,
 
 #endif /* < KERNEL_VERSION(3, 17, 0) */
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)
-
-#define dev_get_iflink(_net_dev) ((_net_dev)->iflink)
-
-#endif /* < KERNEL_VERSION(3, 19, 0) */
-
 #endif	/* _NET_BATMAN_ADV_COMPAT_LINUX_NETDEVICE_H_ */
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index bef9147..0c2643d 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -75,44 +75,6 @@ out:
 	return hard_iface;
 }
 
-/**
- * batadv_is_on_batman_iface - check if a device is a batman iface descendant
- * @net_dev: the device to check
- *
- * If the user creates any virtual device on top of a batman-adv interface, it
- * is important to prevent this new interface to be used to create a new mesh
- * network (this behaviour would lead to a batman-over-batman configuration).
- * This function recursively checks all the fathers of the device passed as
- * argument looking for a batman-adv soft interface.
- *
- * Return: true if the device is descendant of a batman-adv mesh interface (or
- * if it is a batman-adv interface itself), false otherwise
- */
-static bool batadv_is_on_batman_iface(const struct net_device *net_dev)
-{
-	struct net_device *parent_dev;
-	bool ret;
-
-	/* check if this is a batman-adv mesh interface */
-	if (batadv_softif_is_valid(net_dev))
-		return true;
-
-	/* no more parents..stop recursion */
-	if (dev_get_iflink(net_dev) == 0 ||
-	    dev_get_iflink(net_dev) == net_dev->ifindex)
-		return false;
-
-	/* recurse over the parent device */
-	parent_dev = __dev_get_by_index(&init_net, dev_get_iflink(net_dev));
-	/* if we got a NULL parent_dev there is something broken.. */
-	if (WARN(!parent_dev, "Cannot find parent device"))
-		return false;
-
-	ret = batadv_is_on_batman_iface(parent_dev);
-
-	return ret;
-}
-
 static int batadv_is_valid_iface(const struct net_device *net_dev)
 {
 	if (net_dev->flags & IFF_LOOPBACK)
@@ -124,10 +86,6 @@ static int batadv_is_valid_iface(const struct net_device *net_dev)
 	if (net_dev->addr_len != ETH_ALEN)
 		return 0;
 
-	/* no batman over batman */
-	if (batadv_is_on_batman_iface(net_dev))
-		return 0;
-
 	return 1;
 }
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check
  2015-11-13 19:46 [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check Sven Eckelmann
@ 2016-01-19  0:39 ` Antonio Quartulli
  2016-01-19  1:22   ` Andrew Lunn
  0 siblings, 1 reply; 5+ messages in thread
From: Antonio Quartulli @ 2016-01-19  0:39 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: Jetmir Gigollai,
	The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

Hi Sven and thanks for this RFC. Sorry for taking soo long to comment.



On 14/11/15 03:46, Sven Eckelmann wrote:
> batman-adv checks in different situation if a new device is already on top
> of a different batman-adv device. This is done by getting the iflink of a
> device and all its parent. It assumes that this iflink is always a parent
> device in an acyclic graph. But this assumption is broken by devices like
> veth which are actually a pair of two devices linked to each other. The
> recursive check would therefore get veth0 when calling dev_get_iflink on
> veth1. And it gets veth0 when calling dev_get_iflink with veth1.
> 

I agree that this check implemented this way represents a problem.
However I also believe that we should still have some kind of prevention
against this particular scenario (chain of batman interfaces), because
if ignored it could lead to troubles.

Unfortunately I don't know how to implement this check in an elegant and
extendible manner.

First of all we should add a check that follows the master interface,
but at the same time we should still follow the iflink, otherwise we
can't check relationships like VLAN_DEV->REAL_DEV. But how to implement
the latter without hitting the cyclic case is not clear to me ..

An option is to add a specific check for veth and break the recursion,
but this is not really nice, because the next time a cyclic interface
type will be introduced our check will become troublesome again.

Suggestions?


Cheers,

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check
  2016-01-19  0:39 ` Antonio Quartulli
@ 2016-01-19  1:22   ` Andrew Lunn
  2016-01-19 12:34     ` Antonio Quartulli
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Lunn @ 2016-01-19  1:22 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking
  Cc: Jetmir Gigollai

On Tue, Jan 19, 2016 at 08:39:07AM +0800, Antonio Quartulli wrote:
> Hi Sven and thanks for this RFC. Sorry for taking soo long to comment.
> 
> 
> 
> On 14/11/15 03:46, Sven Eckelmann wrote:
> > batman-adv checks in different situation if a new device is already on top
> > of a different batman-adv device. This is done by getting the iflink of a
> > device and all its parent. It assumes that this iflink is always a parent
> > device in an acyclic graph. But this assumption is broken by devices like
> > veth which are actually a pair of two devices linked to each other. The
> > recursive check would therefore get veth0 when calling dev_get_iflink on
> > veth1. And it gets veth0 when calling dev_get_iflink with veth1.
> > 
> 
> I agree that this check implemented this way represents a problem.
> However I also believe that we should still have some kind of prevention
> against this particular scenario (chain of batman interfaces), because
> if ignored it could lead to troubles.
> 
> Unfortunately I don't know how to implement this check in an elegant and
> extendible manner.
> 
> First of all we should add a check that follows the master interface,
> but at the same time we should still follow the iflink, otherwise we
> can't check relationships like VLAN_DEV->REAL_DEV. But how to implement
> the latter without hitting the cyclic case is not clear to me ..
> 
> An option is to add a specific check for veth and break the recursion,
> but this is not really nice, because the next time a cyclic interface
> type will be introduced our check will become troublesome again.
> 
> Suggestions?

Hi Antonio

I've got a set of patches i went to send out as soon, once net-next
reopens. They add support for network name spaces.

One of the issues i had is in this piece of code and with veth. The
other end of the veth can be in a different name space. Hence you
cannot look it up using the ifindex in the current name space. So i
made the code just exit with an O.K. if the parent device cannot be
found.
 
Something to think about when redesigning this code. The parent could
be in a different network stack!

You probably can handle veth and both ends in the same namespace. It
should not be too difficult to detect two interfaces which are the
parent of each other. You just need to keep a bit more state when
doing the recursion. It does however get more complex when there are
more than two devices in a parent loop. But can that happen?

   Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check
  2016-01-19  1:22   ` Andrew Lunn
@ 2016-01-19 12:34     ` Antonio Quartulli
  2016-01-19 14:40       ` Andrew Lunn
  0 siblings, 1 reply; 5+ messages in thread
From: Antonio Quartulli @ 2016-01-19 12:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jetmir Gigollai,
	The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

On 19/01/16 09:22, Andrew Lunn wrote:
> On Tue, Jan 19, 2016 at 08:39:07AM +0800, Antonio Quartulli wrote:
>> Hi Sven and thanks for this RFC. Sorry for taking soo long to comment.
>>
>>
>>
>> On 14/11/15 03:46, Sven Eckelmann wrote:
>>> batman-adv checks in different situation if a new device is already on top
>>> of a different batman-adv device. This is done by getting the iflink of a
>>> device and all its parent. It assumes that this iflink is always a parent
>>> device in an acyclic graph. But this assumption is broken by devices like
>>> veth which are actually a pair of two devices linked to each other. The
>>> recursive check would therefore get veth0 when calling dev_get_iflink on
>>> veth1. And it gets veth0 when calling dev_get_iflink with veth1.
>>>
>>
>> I agree that this check implemented this way represents a problem.
>> However I also believe that we should still have some kind of prevention
>> against this particular scenario (chain of batman interfaces), because
>> if ignored it could lead to troubles.
>>
>> Unfortunately I don't know how to implement this check in an elegant and
>> extendible manner.
>>
>> First of all we should add a check that follows the master interface,
>> but at the same time we should still follow the iflink, otherwise we
>> can't check relationships like VLAN_DEV->REAL_DEV. But how to implement
>> the latter without hitting the cyclic case is not clear to me ..
>>
>> An option is to add a specific check for veth and break the recursion,
>> but this is not really nice, because the next time a cyclic interface
>> type will be introduced our check will become troublesome again.
>>
>> Suggestions?
> 
> Hi Antonio
> 
> I've got a set of patches i went to send out as soon, once net-next
> reopens. They add support for network name spaces.
> 
> One of the issues i had is in this piece of code and with veth. The
> other end of the veth can be in a different name space. Hence you
> cannot look it up using the ifindex in the current name space. So i
> made the code just exit with an O.K. if the parent device cannot be
> found.

Is this really ok ? Isn't it possible to create "loops" by jumping from
one namespace to the other ?

>  
> Something to think about when redesigning this code. The parent could
> be in a different network stack!

exactly my point above! :)
So this means that an interface A might have A.iflink pointing to a
device in another namespace ? Is this what you are saying ? will there
be any way to "retrieve" the namespace of such device?

> 
> You probably can handle veth and both ends in the same namespace. It
> should not be too difficult to detect two interfaces which are the
> parent of each other. You just need to keep a bit more state when
> doing the recursion. It does however get more complex when there are
> more than two devices in a parent loop. But can that happen?

first of all I'd like to answer the question: which device type can
create a loop? only veth? or there are other possibilities?

But having more than one interface in the parent loop should probably
never happen..

This said, what's your plan about this code? How are you going to change
it? :)


Thank you very much for your feedback Andrew!

Cheers,

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check
  2016-01-19 12:34     ` Antonio Quartulli
@ 2016-01-19 14:40       ` Andrew Lunn
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Lunn @ 2016-01-19 14:40 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: Jetmir Gigollai,
	The list for a Better Approach To Mobile Ad-hoc Networking

> > One of the issues i had is in this piece of code and with veth. The
> > other end of the veth can be in a different name space. Hence you
> > cannot look it up using the ifindex in the current name space. So i
> > made the code just exit with an O.K. if the parent device cannot be
> > found.
> 
> Is this really ok ? Isn't it possible to create "loops" by jumping from
> one namespace to the other ?

Quite possible. I took the easy option:

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index a5ce58a..a7a92ae 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -104,8 +104,12 @@ static bool batadv_is_on_batman_iface(const struct net_device *net_dev)
 
        /* recurse over the parent device */
        parent_dev = __dev_get_by_index(net, dev_get_iflink(net_dev));
-       /* if we got a NULL parent_dev there is something broken.. */
-       if (WARN(!parent_dev, "Cannot find parent device"))
+       /* if we got a NULL parent_dev, it might mean the parent is in
+        * a different network namespace. Things then get really
+        * complex, so lets just assume the user knows what she is
+        * doing.
+        */
+       if (!parent_dev)
                return false;
 
        ret = batadv_is_on_batman_iface(parent_dev);

I didn't spend the time to think all the possible loop situations
though. It is not too bad in that the software interface cannot be
moved into a different netns. Also, all the hard interfaces have to be
in the same netns as the soft interface. If you move a hard interface
into a different netns, it will automatically be removed from the soft
interface using the notification mechanism.

There are some other restrictions which help avoid loops. You cannot
in general move 'soft interfaces' between netns. You cannot move
bridge interfaces, tunnel interfaces, team/bonding interfaces, etc.

> exactly my point above! :)
> So this means that an interface A might have A.iflink pointing to a
> device in another namespace ?

Partly correct. The current code for getting the parent is:

parent_dev = __dev_get_by_index(net, dev_get_iflink(net_dev));

This is actually totally broken, when used with veth. It possible can
return a completely different interface than the parent, since the
ifindex returned for a veth could be in a different namespace. ifindex
is only unique within a namespace, not across name spaces.

I need to look deeper into the code and see if there is a different
parenting mechanism that can be used.

	  Andrew

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-19 14:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13 19:46 [B.A.T.M.A.N.] [RFC] batman-adv: Remove recursive bat-on-bat netdevice check Sven Eckelmann
2016-01-19  0:39 ` Antonio Quartulli
2016-01-19  1:22   ` Andrew Lunn
2016-01-19 12:34     ` Antonio Quartulli
2016-01-19 14:40       ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).