All of lore.kernel.org
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] Roaming issues in basic network
@ 2014-07-01  2:22 Simon Wong
  2014-07-01  8:50 ` Linus Lüssing
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Wong @ 2014-07-01  2:22 UTC (permalink / raw)
  To: b.a.t.m.a.n

Hi list,

I'm new to batman-adv, and I'm setting up some basic tests to verify
my use case. I'm running into a roaming issue.

Consider a basic 2 node network running batman-adv 2014.2 with default
settings on OpenWRT AA. Let the routing nodes be A and B. Each node
has 2 wireless interfaces. wlan0 is for the mesh, and wlan0 is AP. The
interfaces are bridged per the batman start guide.

Let there be 2 wifi clients, client 1 and 2. Initially, both clients
are wirelessly attached to node A. Client 2 can ping 1 and the nodes.
Client 2 can also Telnet into node A and B, so all is fine.

I take client 2 and roam to node B. Client 2 can no longer ping client
1 and that is the issue.

If 2 roamed back to A, pings to 1 is good again. A few more
observations while pings are no good:

1) Client 2 can ping the nodes and the Telnet sessions to the nodes are fine.
2) Node B local translation table says 2 is at B, node A local
translation table says 1 is at A. So the local translation tables
check out.
3) Both nodes A and B can ping client 1, so client 1 is still up.
4) Running 'batctl td wlan1' on node 1 shows ICMP requests and
replies, but 'batctl td bat0' shows only requests. So client 1 is
getting the ICMP packets and is responding.
5) If I ran 'iw wlan1 station del <client 2 mac>' on node 1, pings
will work again.

It almost looks like the wifi driver (ath5k) is blocking data for
roamed clients that was once attached to it. So this issue might not
be a batman but a driver thing. Has anyone ran into this problem?

Thanks,

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-01  2:22 [B.A.T.M.A.N.] Roaming issues in basic network Simon Wong
@ 2014-07-01  8:50 ` Linus Lüssing
  2014-07-01 11:26   ` Antonio Quartulli
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Lüssing @ 2014-07-01  8:50 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Hi Simon,

On Mon, Jun 30, 2014 at 07:22:51PM -0700, Simon Wong wrote:
> Let there be 2 wifi clients, client 1 and 2. Initially, both clients
> are wirelessly attached to node A. Client 2 can ping 1 and the nodes.
> Client 2 can also Telnet into node A and B, so all is fine.
> 
> I take client 2 and roam to node B. Client 2 can no longer ping client
> 1 and that is the issue.

At the Wireless Battle Mesh a few months ago we've been discussing
just such a (until now?) hypothetical problem. Maybe it applies here,
maybe it doesn't:

It could be a problem with a not yet updated MAC address table in
the bridge, therefore the bridge on node A not forwarding ICMP
requests from client 1 towards client 2.

Questions: Are your clients using IPv4, IPv6 or both? Are your
clients issuing gratuitous ARP replies or ICMPv6 unsolicited
Neighbor Advertisements upon roaming? Is this a permanent problem
or are clients 1 and 2 able to reach each other again after a
while? In your tests, did client 1 ping client 2 or the other way
round?

What you could try to check whether it is a problem with the
learning of the bridge is transforming them to stupid hubs on node
A and node B:

$ brctl setageing br0 0


Cheers, Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-01  8:50 ` Linus Lüssing
@ 2014-07-01 11:26   ` Antonio Quartulli
  2014-07-02  1:24     ` Simon Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Quartulli @ 2014-07-01 11:26 UTC (permalink / raw)
  To: locborgtus, Linus Lüssing; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]



On 01/07/14 10:50, Linus Lüssing wrote:
> It could be a problem with a not yet updated MAC address table in
> the bridge, therefore the bridge on node A not forwarding ICMP
> requests from client 1 towards client 2.
> 

Hey Linus,

I agree that the problem is probably in the bridge, but how can it be an
inconsistency in the table given that the bridge is receiving the Echo
requests from client 2 through bat0?

Shouldn't this immediately update the bridge table to reflect the client
movement (client2 --is-behind--> bat0)?


@Simon: are you sure that the client is not associated anymore with node
A at that moment (maybe it was jumping here and there)? You said that
you can fix situation this by deleting the station entry, but is this
station entry obsolete at that point? (meaning: is the inactivity time
high? - you can see this through the "iw dev wlan0 station get <client2
mac>" command before deleting it)
If not, it can be that something wrong is happening at the wifi layer
and given the driver you are using (ath5k) it would not be totally
unexpected.

I am asking this because I expect the station to disappear immediately
in case of roaming (the client usually deauthenticates itself before
associating with the new AP). Still, we can have cases when this does
not happen, but the AP should be able to react properly.


Cheers,

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-01 11:26   ` Antonio Quartulli
@ 2014-07-02  1:24     ` Simon Wong
  2014-07-02  5:58       ` Antonio Quartulli
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Wong @ 2014-07-02  1:24 UTC (permalink / raw)
  To: Antonio Quartulli; +Cc: b.a.t.m.a.n

A few more observations:

- Client 1 is a Win 7 machine, and Client 2 I have tried a Win 7 and
OSX machine. In both cases the behavior is repeatable.

- All clients are on IPv4 only.

- I ran a Wireshark cap on the roaming client - no gratuitous ARP
replies seen during the roam

- Client 2 is doing the pinging to Client 1

- The problem is permanent, and can be fixed by one of the below:
  - Manually delete the roaming client via 'iw station del'
  - Restart node A network stack (/etc/init.d/network restart), but
which client attaches to which AP is not deterministic.
  - Client 2 roams back to A

- I tried 'brctl setaging 0' on node A's bridge, that didn't affect the behavior

- Running 'iw station get' on the 2 nodes during the problem yields
some interesting results. On both nodes, the inactive time resets to 0
while the ping is running. If I stopped the ping, the inactive time on
both nodes will rise as expected.

- Even more strange with 'iw station get' during the problem:
interacting with the Telnet connection from Client 2 to Node A will
also reset the inactive time count for Client 2, and this is while
Client 2 is roamed to node B. On node A, only the tx {bytes, packets}
counters will increase. rx counts do not. On node B, the tx/rx counts
increase as expected.

- I am in a relatively small area, so even if Client 2 roamed to B, it
is still within RF range of both nodes.

I mentioned before that both nodes' local translation tables were
accurate after the roam. I also mentioned that doing a 'iw station
del' will fix the problem. So, I took advantage of this and wrote a
quick hack script to verify. The pseudo code is as follows:

while true
  run batctl tl and get current local client list
  compare current local client list with the last client list
  if (old list has clients that the new list doesn't have)
    run iw station del for those clients
  save current list to last client list
  sleep
done

Terrible hack, but I was able to roam successfully while this script is running.

Thanks,

- Simon

On Tue, Jul 1, 2014 at 4:26 AM, Antonio Quartulli
<antonio@meshcoding.com> wrote:
>
>
> On 01/07/14 10:50, Linus Lüssing wrote:
>> It could be a problem with a not yet updated MAC address table in
>> the bridge, therefore the bridge on node A not forwarding ICMP
>> requests from client 1 towards client 2.
>>
>
> Hey Linus,
>
> I agree that the problem is probably in the bridge, but how can it be an
> inconsistency in the table given that the bridge is receiving the Echo
> requests from client 2 through bat0?
>
> Shouldn't this immediately update the bridge table to reflect the client
> movement (client2 --is-behind--> bat0)?
>
>
> @Simon: are you sure that the client is not associated anymore with node
> A at that moment (maybe it was jumping here and there)? You said that
> you can fix situation this by deleting the station entry, but is this
> station entry obsolete at that point? (meaning: is the inactivity time
> high? - you can see this through the "iw dev wlan0 station get <client2
> mac>" command before deleting it)
> If not, it can be that something wrong is happening at the wifi layer
> and given the driver you are using (ath5k) it would not be totally
> unexpected.
>
> I am asking this because I expect the station to disappear immediately
> in case of roaming (the client usually deauthenticates itself before
> associating with the new AP). Still, we can have cases when this does
> not happen, but the AP should be able to react properly.
>
>
> Cheers,
>
> --
> Antonio Quartulli
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-02  1:24     ` Simon Wong
@ 2014-07-02  5:58       ` Antonio Quartulli
  2014-07-02  6:22         ` Antonio Quartulli
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Quartulli @ 2014-07-02  5:58 UTC (permalink / raw)
  To: Simon Wong; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

Simon,

On 02/07/14 03:24, Simon Wong wrote:
> 
> - Even more strange with 'iw station get' during the problem:
> interacting with the Telnet connection from Client 2 to Node A will
> also reset the inactive time count for Client 2, and this is while
> Client 2 is roamed to node B. On node A, only the tx {bytes, packets}
> counters will increase. rx counts do not. On node B, the tx/rx counts
> increase as expected.


very stupid question: but the two nodes have different MAC addresses for
wlan1, right ? I expect the answer to be yes, otherwise this would have
probably created more problems...but just to be sure...


However there is something strange with the AP interface (as you already
pointed out)..
Did you see any deauth sent by the client while roaming

Cheers,

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-02  5:58       ` Antonio Quartulli
@ 2014-07-02  6:22         ` Antonio Quartulli
  2014-07-04  7:36           ` Simon Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Quartulli @ 2014-07-02  6:22 UTC (permalink / raw)
  To: Simon Wong; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]



On 02/07/14 07:58, Antonio Quartulli wrote:
> Simon,
> 
> On 02/07/14 03:24, Simon Wong wrote:
>>
>> - Even more strange with 'iw station get' during the problem:
>> interacting with the Telnet connection from Client 2 to Node A will
>> also reset the inactive time count for Client 2, and this is while
>> Client 2 is roamed to node B. On node A, only the tx {bytes, packets}
>> counters will increase. rx counts do not. On node B, the tx/rx counts
>> increase as expected.
> 
> 
> very stupid question: but the two nodes have different MAC addresses for
> wlan1, right ? 

ehm, here I meant wlan0 (the AP interface where client connect to).

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-02  6:22         ` Antonio Quartulli
@ 2014-07-04  7:36           ` Simon Wong
  2014-07-04  8:13             ` Antonio Quartulli
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Wong @ 2014-07-04  7:36 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Antonio,

I haven't tried monitoring for deauths yet, but I have tried another
device for the AP interface (a USB stick using ath9k_htc, on wlan2). I
am able to repeat the same inter-AP roaming problem.

I was thinking this could have been a problem with the ath5k drivers,
but that seems less likely.

Another observation:
Let's say I'm roaming client 2 is attached to node A. I am monitoring
client 2 on node A via `iw wlan1 station dump`. If I turned off client
2 WiFi or switched SSID, client 2 disappears from the station list as
expected - a deauth probably got sent. I am guessing roaming might not
trigger a deauth on the client. In any case, we can't count on deauth
being received anyways.

Hypothesis:
It seems as if the wireless driver/hardware has an internal forwarding
rule. If the AP interface thinks it's got the client, it'll forward
data internally to it and batman never sees the data and thus can't
route it. But since the roam happened and another node has picked up
the roaming client, translation tables updates are still triggered and
states are still synchronized.

What do you think?

Thanks,
- Simon


On Tue, Jul 1, 2014 at 11:22 PM, Antonio Quartulli
<antonio@meshcoding.com> wrote:
>
>
> On 02/07/14 07:58, Antonio Quartulli wrote:
>> Simon,
>>
>> On 02/07/14 03:24, Simon Wong wrote:
>>>
>>> - Even more strange with 'iw station get' during the problem:
>>> interacting with the Telnet connection from Client 2 to Node A will
>>> also reset the inactive time count for Client 2, and this is while
>>> Client 2 is roamed to node B. On node A, only the tx {bytes, packets}
>>> counters will increase. rx counts do not. On node B, the tx/rx counts
>>> increase as expected.
>>
>>
>> very stupid question: but the two nodes have different MAC addresses for
>> wlan1, right ?
>
> ehm, here I meant wlan0 (the AP interface where client connect to).
>
> --
> Antonio Quartulli
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [B.A.T.M.A.N.] Roaming issues in basic network
  2014-07-04  7:36           ` Simon Wong
@ 2014-07-04  8:13             ` Antonio Quartulli
  0 siblings, 0 replies; 8+ messages in thread
From: Antonio Quartulli @ 2014-07-04  8:13 UTC (permalink / raw)
  To: Simon Wong; +Cc: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

Simon,

On 04/07/14 09:36, Simon Wong wrote:
> I am guessing roaming might not
> trigger a deauth on the client. 

at least a disassoc should be sent.

> In any case, we can't count on deauth
> being received anyways.

of course, but we should rely on the layer below being working consistently.

> 
> Hypothesis:
> It seems as if the wireless driver/hardware has an internal forwarding
> rule. If the AP interface thinks it's got the client, it'll forward
> data internally to it and batman never sees the data and thus can't
> route it. 

this is exactly how AP mode is supposed to work: if source and
destination are connected to the same interface unicast traffic will not
be delivered to the upper layer but will directly be forwarded to the
destination.

> But since the roam happened and another node has picked up
> the roaming client, translation tables updates are still triggered and
> states are still synchronized.
> 
> What do you think?
> 

Looks like there is a problem at the wifi layer. batman-adv here is only
playing the role of a generic Distribution System. The current behaviour
would break any other backbone that you would have instead of
batman-adv. The inactivity time getting reset when the client is
connected to another AP is definitely a bogus behaviour and points
towards a wifi problem.


At this point I would suggest you to involve the linux-wireless guys
(they also have their own mailing list) and to try describing the
problem to them. What I can say here is that batman-adv seems to be
unrelated..

Cheers,


-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-07-04  8:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-01  2:22 [B.A.T.M.A.N.] Roaming issues in basic network Simon Wong
2014-07-01  8:50 ` Linus Lüssing
2014-07-01 11:26   ` Antonio Quartulli
2014-07-02  1:24     ` Simon Wong
2014-07-02  5:58       ` Antonio Quartulli
2014-07-02  6:22         ` Antonio Quartulli
2014-07-04  7:36           ` Simon Wong
2014-07-04  8:13             ` Antonio Quartulli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.