All of lore.kernel.org
 help / color / mirror / Atom feed
* Network stops passing traffic randomly
@ 2020-05-25  8:35 smartwires
  2020-05-25  8:43 ` Sven Eckelmann
  2020-05-28 19:03 ` Steve Newcomb
  0 siblings, 2 replies; 23+ messages in thread
From: smartwires @ 2020-05-25  8:35 UTC (permalink / raw)
  To: b.a.t.m.a.n

I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem. 
1. Gateway is up and running and able the reach the internet.
2. batctl o show the neighbor/s
3. batctl ping [MAC] fails

root@Main-GW:~# batctl o
[B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73 (bat0/22:55:4d:3e:5f:8f BATMAN_IV)]
   Originator        last-seen (#/255) Nexthop           [outgoingIF]
 * e8:5b:b7:00:10:6b    0.880s   (255) e8:5b:b7:00:10:6b [     mesh0]
root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b
PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-25  8:35 Network stops passing traffic randomly smartwires
@ 2020-05-25  8:43 ` Sven Eckelmann
       [not found]   ` <CAL3ir7+RWLrYOzjNQh1VwiKg1sxSgHZMwwqx=9xSfXFnFjE_KQ@mail.gmail.com>
                     ` (2 more replies)
  2020-05-28 19:03 ` Steve Newcomb
  1 sibling, 3 replies; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-25  8:43 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: smartwires

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Monday, 25 May 2020 10:35:12 CEST smartwires@gmail.com wrote:
> I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem. 
[...]
> root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b
> PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out

My first guess is that the underlying interfaces (mesh0) stopped to transport 
unicast frames. Did you check this by setting an IP on mesh0 and ping between 
these devices using the IPv4 ping?

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
       [not found]   ` <CAL3ir7+RWLrYOzjNQh1VwiKg1sxSgHZMwwqx=9xSfXFnFjE_KQ@mail.gmail.com>
@ 2020-05-25 13:22     ` Sven Eckelmann
  2020-05-25 13:45     ` Sven Eckelmann
  1 sibling, 0 replies; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-25 13:22 UTC (permalink / raw)
  To: Daniel Ghansah; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

On Monday, 25 May 2020 15:19:22 CEST Daniel Ghansah wrote:
> Hi Sven,
> Yes I did ping via the IP, there is no response, I am using IPV4

Ok, when your underlying layer (mesh0) is not working then you should not 
expect batman-adv to work.


Btw. the `batctl dc` output doesn't look like your tested the IPv4 ping
on mesh0 but on bat0 - not what I've asked for.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
       [not found]   ` <CAL3ir7+RWLrYOzjNQh1VwiKg1sxSgHZMwwqx=9xSfXFnFjE_KQ@mail.gmail.com>
  2020-05-25 13:22     ` Sven Eckelmann
@ 2020-05-25 13:45     ` Sven Eckelmann
  1 sibling, 0 replies; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-25 13:45 UTC (permalink / raw)
  To: Daniel Ghansah; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 294 bytes --]

On Monday, 25 May 2020 15:19:22 CEST Daniel Ghansah wrote:
> [B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73

Just noticed another thing - why is your revision of batman-adv so low? The 
OpenWrt 18.06.x version of batman-adv is already at 2018.1-11.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-25  8:43 ` Sven Eckelmann
       [not found]   ` <CAL3ir7+RWLrYOzjNQh1VwiKg1sxSgHZMwwqx=9xSfXFnFjE_KQ@mail.gmail.com>
@ 2020-05-28  1:05   ` smartwires
  2020-05-28  8:46     ` Sven Eckelmann
       [not found]   ` <cf75d66e-b0ac-632d-34e6-681ed9c6769d@coolheads.com>
  2 siblings, 1 reply; 23+ messages in thread
From: smartwires @ 2020-05-28  1:05 UTC (permalink / raw)
  To: b.a.t.m.a.n

When the problem occurs I have no access to the non-gateway node, with it is working I can do a ping

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28  1:05   ` smartwires
@ 2020-05-28  8:46     ` Sven Eckelmann
  0 siblings, 0 replies; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-28  8:46 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: smartwires

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

On Thursday, 28 May 2020 03:05:07 CEST smartwires@gmail.com wrote:
> When the problem occurs I have no access to the non-gateway node, with it is working I can do a ping

You said before that the ping on the underlying device (mesh0) is not working 
when this problem is observed. I would therefore propose to contact the 
developers of the driver for the underlying device to figure out why it is no 
longer able to transport unicast frames.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-25  8:35 Network stops passing traffic randomly smartwires
  2020-05-25  8:43 ` Sven Eckelmann
@ 2020-05-28 19:03 ` Steve Newcomb
  2020-05-28 19:19   ` Sven Eckelmann
  2020-05-29  0:13   ` smartwires
  1 sibling, 2 replies; 23+ messages in thread
From: Steve Newcomb @ 2020-05-28 19:03 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: smartwires

I have exactly the same problem with the same symptoms.  I'm running a 
fresh build of OpenWRT trunk.  The problem is not new. On some days it 
happens several times.  On other days it doesn't happen at all.

I'm curious to know what your hardware(s) and driver(s) are, 
Smartwires.  Mine is TPLink Archer [AC]7 v[245].  I'm running the QCA 
988x driver on the 5GHz radio.  My solution is the same as yours: reboot 
the gateway.  It's a terrible solution, having only one advantage, which 
is that it (sort of) works.

I have seen Sven's remark about unicast packets.  I'm not sanguine about 
getting Qualcomm to fix a driver for an older product. The Candela 
Technologies driver refuses to function on the DFS channels (100, 116, 
132), which in my large, populous US residential environment work far, 
far better than channels 36 or 149.

All ideas welcome.

On 5/25/20 4:35 AM, smartwires@gmail.com wrote:
> I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem.
> 1. Gateway is up and running and able the reach the internet.
> 2. batctl o show the neighbor/s
> 3. batctl ping [MAC] fails
>
> root@Main-GW:~# batctl o
> [B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73 (bat0/22:55:4d:3e:5f:8f BATMAN_IV)]
>     Originator        last-seen (#/255) Nexthop           [outgoingIF]
>   * e8:5b:b7:00:10:6b    0.880s   (255) e8:5b:b7:00:10:6b [     mesh0]
> root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b
> PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out
> Reply from host e8:5b:b7:00:10:6b timed out

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 19:03 ` Steve Newcomb
@ 2020-05-28 19:19   ` Sven Eckelmann
  2020-05-28 19:22     ` Ben Greear
  2020-05-28 20:59     ` Steve Newcomb
  2020-05-29  0:13   ` smartwires
  1 sibling, 2 replies; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-28 19:19 UTC (permalink / raw)
  To: b.a.t.m.a.n, srn; +Cc: smartwires, greearb

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
> I have seen Sven's remark about unicast packets.  I'm not sanguine about 
> getting Qualcomm to fix a driver for an older product. 

I am slightly confused now about the mentioning of the candelatech driver.

Just to sync both of you up:

* Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k 
  with the ath10k-firmware*?
* And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this 
  encrypted or not encrypted?

> The Candela 
> Technologies driver refuses to function on the DFS channels (100, 116, 
> 132), which in my large, populous US residential environment work far, 
> far better than channels 36 or 149.

Was this reported to Ben Greear?

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 19:19   ` Sven Eckelmann
@ 2020-05-28 19:22     ` Ben Greear
  2020-05-28 20:59     ` Steve Newcomb
  1 sibling, 0 replies; 23+ messages in thread
From: Ben Greear @ 2020-05-28 19:22 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n, srn; +Cc: smartwires



On 05/28/2020 12:19 PM, Sven Eckelmann wrote:
> On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
>> I have seen Sven's remark about unicast packets.  I'm not sanguine about
>> getting Qualcomm to fix a driver for an older product.
>
> I am slightly confused now about the mentioning of the candelatech driver.
>
> Just to sync both of you up:
>
> * Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k
>   with the ath10k-firmware*?
> * And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this
>   encrypted or not encrypted?
>
>> The Candela
>> Technologies driver refuses to function on the DFS channels (100, 116,
>> 132), which in my large, populous US residential environment work far,
>> far better than channels 36 or 149.
>
> Was this reported to Ben Greear?
>
> Kind regards,
> 	Sven

If you are using my firmware, what chipset are you using?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
       [not found]   ` <cf75d66e-b0ac-632d-34e6-681ed9c6769d@coolheads.com>
@ 2020-05-28 19:31     ` Sven Eckelmann
  2020-05-28 21:17       ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Sven Eckelmann @ 2020-05-28 19:31 UTC (permalink / raw)
  To: srn; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

[please don't send me private mails about batman-adv - unless you have a 
 really good reason to do so. And if not stated otherwise, I must assume
 that you actually wanted to send you message to the mailing list]

On Thursday, 28 May 2020 21:18:36 CEST Steve Newcomb wrote:
> > My first guess is that the underlying interfaces (mesh0) stopped to transport
> > unicast frames. Did you check this by setting an IP on mesh0 and ping between
> > these devices using the IPv4 ping?
> Not sure what the phrase "to set an IP on mesh0" means, if not simply to 
> endow the corresponding bridge with a static IP.  Which is what I'm doing.
> 
> Not sure what "IPv4 ping" means.  I've disabled IPv6, so I'm not using 
> anything but IPv4.

I am assuming that mesh0 is the device which was added to bat0 as slave. 
Please replace this with whatever you are using

    # on device 1
    ip addr add 192.168.23.1/24 dev mesh0

    # on device 2
    ip addr add 192.168.23.2/24 dev mesh0


> If "IPv4 ping" means "the ordinary Linux ping command", then, yes, I've 
> tried that.

The IPv4 ping was just a placeholder for "not batman-adv ping packets". So you 
can also use ICMPv6 if you prefer. Just make sure to send it over the 
underlying ("slave") interface of batman-adv. And not on bat0 or any higher 
layer bridge/vlan/... interface.

With the addresses mentioned earlier:

    # on device 1
    ping 192.168.23.2

    # on device 2
    ping 192.168.23.1

And also observe with tcpdump what is received by the other end.

> 100% packet loss when the offline condition occurs.  Batctl 
> o, on the other hand, looks just fine.

Sounds to me like "mesh0" is still able to transport broadcast frames (which 
are used for the OGMs - which "create" the originator lists in `batctl o`). 
And if you cannot send unicast frames anymore on mesh0 then something is wrong 
with the unicast part.

For example, when you are using encryption for the mesh0 link, maybe the group 
key is still set correctly but something happened with the pairwise key and it
is now "corrupted".

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 19:19   ` Sven Eckelmann
  2020-05-28 19:22     ` Ben Greear
@ 2020-05-28 20:59     ` Steve Newcomb
  2020-05-28 21:28       ` Ben Greear
  1 sibling, 1 reply; 23+ messages in thread
From: Steve Newcomb @ 2020-05-28 20:59 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: smartwires, greearb



On 5/28/20 3:19 PM, Sven Eckelmann wrote:
> On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
>> I have seen Sven's remark about unicast packets.  I'm not sanguine about
>> getting Qualcomm to fix a driver for an older product.
> I am slightly confused now about the mentioning of the candelatech driver.
>
> Just to sync both of you up:
>
> * Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k
>    with the ath10k-firmware*?
> * And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this
>    encrypted or not encrypted?
Speaking only for myself:

CONFIG_PACKAGE_ath10k-firmware-qca988x=y
CONFIG_PACKAGE_kmod-ath10k=y
# CONFIG_PACKAGE_ath10k-firmware-qca988x-ct is not set
# CONFIG_PACKAGE_kmod-ath10k-ct is not set

option mesh_fwding '0'
option encryption 'psk2+ccmp'

>
>> The Candela
>> Technologies driver refuses to function on the DFS channels (100, 116,
>> 132), which in my large, populous US residential environment work far,
>> far better than channels 36 or 149.
> Was this reported to Ben Greear?
Not yet, no.  I am planning to do that when I can get serious about 
testing the adhoc alternative.  I tried it just long enough to discover 
that DFS didn't work (log message was something like "forbidden" (can't 
remember exactly what it said right now, but that was the sense of it), 
although I had specified country 'US' and the driver seemed to be aware 
of the corresponding hex code.  No such log message appeared when 
channel was 36 or 149.  I said to myself: hmmm, at least the QCA driver 
*sort-of* works in my environment and returned to it.
>
> Kind regards,
> 	Sven


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 19:31     ` Sven Eckelmann
@ 2020-05-28 21:17       ` Steve Newcomb
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Newcomb @ 2020-05-28 21:17 UTC (permalink / raw)
  To: Sven Eckelmann; +Cc: b.a.t.m.a.n

Thanks very much for the advice and clues.  I'll report what happens.

By the way, the problem *never* occurs when all devices are inside my 
house.  It only happens in the field.  It will take a long time to do 
this test, because I'll have to set one device up, first, in a remote 
location, wait for the problem to occur, and then perform the test.  If 
the problem doesn't occur, I assume that would be significant, too.

On 5/28/20 3:31 PM, Sven Eckelmann wrote:
> [please don't send me private mails about batman-adv - unless you have a
>   really good reason to do so. And if not stated otherwise, I must assume
>   that you actually wanted to send you message to the mailing list]
I did.  Oops.
>
> On Thursday, 28 May 2020 21:18:36 CEST Steve Newcomb wrote:
>>> My first guess is that the underlying interfaces (mesh0) stopped to transport
>>> unicast frames. Did you check this by setting an IP on mesh0 and ping between
>>> these devices using the IPv4 ping?
>> Not sure what the phrase "to set an IP on mesh0" means, if not simply to
>> endow the corresponding bridge with a static IP.  Which is what I'm doing.
>>
>> Not sure what "IPv4 ping" means.  I've disabled IPv6, so I'm not using
>> anything but IPv4.
> I am assuming that mesh0 is the device which was added to bat0 as slave.
> Please replace this with whatever you are using
>
>      # on device 1
>      ip addr add 192.168.23.1/24 dev mesh0
>
>      # on device 2
>      ip addr add 192.168.23.2/24 dev mesh0
>
>
>> If "IPv4 ping" means "the ordinary Linux ping command", then, yes, I've
>> tried that.
> The IPv4 ping was just a placeholder for "not batman-adv ping packets". So you
> can also use ICMPv6 if you prefer. Just make sure to send it over the
> underlying ("slave") interface of batman-adv. And not on bat0 or any higher
> layer bridge/vlan/... interface.
>
> With the addresses mentioned earlier:
>
>      # on device 1
>      ping 192.168.23.2
>
>      # on device 2
>      ping 192.168.23.1
>
> And also observe with tcpdump what is received by the other end.
>
>> 100% packet loss when the offline condition occurs.  Batctl
>> o, on the other hand, looks just fine.
> Sounds to me like "mesh0" is still able to transport broadcast frames (which
> are used for the OGMs - which "create" the originator lists in `batctl o`).
> And if you cannot send unicast frames anymore on mesh0 then something is wrong
> with the unicast part.
>
> For example, when you are using encryption for the mesh0 link, maybe the group
> key is still set correctly but something happened with the pairwise key and it
> is now "corrupted".
>
> Kind regards,
> 	Sven


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 20:59     ` Steve Newcomb
@ 2020-05-28 21:28       ` Ben Greear
  2020-06-02  1:41         ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Ben Greear @ 2020-05-28 21:28 UTC (permalink / raw)
  To: srn, Sven Eckelmann, b.a.t.m.a.n; +Cc: smartwires



On 05/28/2020 01:59 PM, Steve Newcomb wrote:
>
>
> On 5/28/20 3:19 PM, Sven Eckelmann wrote:
>> On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
>>> I have seen Sven's remark about unicast packets.  I'm not sanguine about
>>> getting Qualcomm to fix a driver for an older product.
>> I am slightly confused now about the mentioning of the candelatech driver.
>>
>> Just to sync both of you up:
>>
>> * Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k
>>    with the ath10k-firmware*?
>> * And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this
>>    encrypted or not encrypted?
> Speaking only for myself:
>
> CONFIG_PACKAGE_ath10k-firmware-qca988x=y
> CONFIG_PACKAGE_kmod-ath10k=y
> # CONFIG_PACKAGE_ath10k-firmware-qca988x-ct is not set
> # CONFIG_PACKAGE_kmod-ath10k-ct is not set
>
> option mesh_fwding '0'
> option encryption 'psk2+ccmp'

wave-1 ath10k-ct does not support mesh, and while it supports ADHOC, it has issues
sometimes, especially when using encryption, and I have not had interest to debug it
so far.

wave-2 firmware supports mesh, and I think adhoc is stable as well.  I have not done
any serious testing on either mesh nor adhoc though.

I've tested DFS in STA/AP mode and that works fine on my driver/firmware, possibly
due to us setting the regdom as a fwcfg option, I suppose.

Thanks,
Ben

>
>>
>>> The Candela
>>> Technologies driver refuses to function on the DFS channels (100, 116,
>>> 132), which in my large, populous US residential environment work far,
>>> far better than channels 36 or 149.
>> Was this reported to Ben Greear?
> Not yet, no.  I am planning to do that when I can get serious about testing the adhoc alternative.  I tried it just long enough to discover that DFS didn't work (log message was something like "forbidden" (can't remember exactly what it said right now, but that was the sense of it), although I had specified country 'US' and the driver seemed to be aware of the corresponding hex code.  No such log message appeared when channel was 36 or 149.  I said to myself: hmmm, at least the QCA driver *sort-of* works in my environment and returned to it.
>>
>> Kind regards,
>>     Sven
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 19:03 ` Steve Newcomb
  2020-05-28 19:19   ` Sven Eckelmann
@ 2020-05-29  0:13   ` smartwires
  2020-06-02  2:05     ` Steve Newcomb
  1 sibling, 1 reply; 23+ messages in thread
From: smartwires @ 2020-05-29  0:13 UTC (permalink / raw)
  To: b.a.t.m.a.n

Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-28 21:28       ` Ben Greear
@ 2020-06-02  1:41         ` Steve Newcomb
  2020-06-02 12:40           ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Newcomb @ 2020-06-02  1:41 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: smartwires



On 5/28/20 5:28 PM, Ben Greear wrote:
> I've tested DFS in STA/AP mode and that works fine on my 
> driver/firmware, possibly
> due to us setting the regdom as a fwcfg option, I suppose. 
I wonder how I can set the regdom as fwcfg option?  I don't know the 
procedure to try that.  Do I need to cross-compile the firmware myself, 
rather than using the OpenWRT package?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-05-29  0:13   ` smartwires
@ 2020-06-02  2:05     ` Steve Newcomb
  2020-06-02 20:02       ` Ben Greear
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Newcomb @ 2020-06-02  2:05 UTC (permalink / raw)
  To: b.a.t.m.a.n, Ben Greear, smartwires



On 5/28/20 8:13 PM, smartwires@gmail.com wrote:
> Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.

I think I discovered something yesterday that explains everything, and 
it's very reproducible.  The mesh mode in the QCA firmware works 
reliably in the lab and in the field, but only when there are 3 or fewer 
nodes.  If I add one more node, the mesh will completely fail, either 
immediately or within a few hours.  If the nodes are strung out in a 
daisy chain, failure is usually, but not always, delayed for a while, 
and the links break in a piecemeal fashion, one at a time.  If the nodes 
are close enough to each other, total failure occurs quite quickly.  I 
surmise that the 802.11s implementation in the QCA driver was not tested 
with more than 3 nodes, or perhaps it wasn't designed to support more 
than 3 nodes.  Sigh.

Sven, I think this epiphany obviates the need for your test (which I 
still haven't figured out how to execute in the field), but I'll return 
to that effort if you think I should.

So in the end, unless I replace the hardware throughout the neighborhood 
with far more expensive hardware, I must find a way to use Ben's driver, 
or to have no mesh network with more than 3 nodes in it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-02  1:41         ` Steve Newcomb
@ 2020-06-02 12:40           ` Steve Newcomb
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Newcomb @ 2020-06-02 12:40 UTC (permalink / raw)
  To: b.a.t.m.a.n

Never mind.  I shouldn't have asked, because 
openwrt/dl/ath10k-ct-2020-03-25-3d173a47.tar.xz!ath10k-ct-2020-03-25-3d173a47/README.txt 
clearly states:

This is a copy of the drivers/net/wireless/ath/ath10k tree from
the Candela-Technologies (CT) 4.7, 4.9, and 4.13 kernels.

This package may be useful for people trying to use CT ath10k
firmware on LEDE/OpenWRT, or other custom-built kernels.

The ath10k driver has a lot of patches, most of which are to enable it to
work more effectively with the ath10k CT firmware:

http://www.candelatech.com/ath10k.php

To compile with some help:
./build_me.sh

To compile manually:
cd ath10k
cp make_all make_all.mine
chmod a+x make_all.mine
# Edit make_all.mine to point to your compiled kernel
# Copy ath/*.h files into ../
# This header file stuff is not obvious, sorry..but it helps us compile
# properly on LEDE/OpenWRT backports infrastructure.
./make_all.mine


For full kernel source that these drivers came from, see:

http://dmz2.candelatech.com/?p=linux-4.7.dev.y/.git;a=summary
git clone git://dmz2.candelatech.com/linux-4.7.dev.y

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary
git clone git://dmz2.candelatech.com/linux-4.4.dev.y

Please send bug reports to:  greearb@candelatech.com


On 6/1/20 9:41 PM, Steve Newcomb wrote:
>
>
> On 5/28/20 5:28 PM, Ben Greear wrote:
>> I've tested DFS in STA/AP mode and that works fine on my 
>> driver/firmware, possibly
>> due to us setting the regdom as a fwcfg option, I suppose. 
> I wonder how I can set the regdom as fwcfg option?  I don't know the 
> procedure to try that.  Do I need to cross-compile the firmware 
> myself, rather than using the OpenWRT package?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-02  2:05     ` Steve Newcomb
@ 2020-06-02 20:02       ` Ben Greear
  2020-06-03  2:06         ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Ben Greear @ 2020-06-02 20:02 UTC (permalink / raw)
  To: srn, b.a.t.m.a.n, smartwires



On 06/01/2020 07:05 PM, Steve Newcomb wrote:
>
>
> On 5/28/20 8:13 PM, smartwires@gmail.com wrote:
>> Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.
>
> I think I discovered something yesterday that explains everything, and it's very reproducible.  The mesh mode in the QCA firmware works reliably in the lab and in the field, but only when there are 3 or fewer nodes.  If I add one more node, the mesh will completely fail, either immediately or within a few hours.  If the nodes are strung out in a daisy chain, failure is usually, but not always, delayed for a while, and the links break in a piecemeal fashion, one at a time.  If the nodes are close enough to each other, total failure occurs quite quickly.  I surmise that the 802.11s implementation in the QCA driver was not tested with more than 3 nodes, or perhaps it wasn't designed to support more than 3 nodes.  Sigh.
>
> Sven, I think this epiphany obviates the need for your test (which I still haven't figured out how to execute in the field), but I'll return to that effort if you think I should.
>
> So in the end, unless I replace the hardware throughout the neighborhood with far more expensive hardware, I must find a way to use Ben's driver, or to have no mesh network with more than 3 nodes in it.

Have you tried using IPQ4019 based systems?  They seem pretty affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable
in my testing recently (in AP mode, not tested mesh).

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-02 20:02       ` Ben Greear
@ 2020-06-03  2:06         ` Steve Newcomb
  2020-06-03 12:48           ` Ben Greear
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Newcomb @ 2020-06-03  2:06 UTC (permalink / raw)
  To: Ben Greear, b.a.t.m.a.n, smartwires



On 6/2/20 4:02 PM, Ben Greear wrote:
> Have you tried using IPQ4019 based systems?  They seem pretty 
> affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable
> in my testing recently (in AP mode, not tested mesh).
No.  Used, on E-bay, the cheaper of the two, the EA8300  is at least 
twice as expensive as what we're using now.  To buy a dozen or so in 
hope of getting them to work with Batman is out of the question, alas.

Ben, I fully understand your lack of incentive to spend your time on 
drivers for older hardware.  This is not your problem, really. (Unless, 
like me, you are seeking ways to address the digital divide, where low 
entry cost is the key consideration.  Every dollar cheaper means more 
people can connect, which was important even before the pandemic began.  
I selected these Archer [AC]7 v[245] units because there are so many of 
them for sale that they are kind of hard to sell.  True, I didn't know 
the QCA driver would limit me to 3 nodes per mesh, nor did I know your 
driver couldn't support encryption nor DFS, at least not out of the 
box.  It looked like a reasonable bet at the time; with 2 drivers to 
choose from, what could go wrong?  Oh, well, nothing worthwhile was ever 
easy.)

Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-03  2:06         ` Steve Newcomb
@ 2020-06-03 12:48           ` Ben Greear
  2020-06-03 15:35             ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Ben Greear @ 2020-06-03 12:48 UTC (permalink / raw)
  To: srn, b.a.t.m.a.n, smartwires



On 06/02/2020 07:06 PM, Steve Newcomb wrote:
>
>
> On 6/2/20 4:02 PM, Ben Greear wrote:
>> Have you tried using IPQ4019 based systems?  They seem pretty affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable
>> in my testing recently (in AP mode, not tested mesh).
> No.  Used, on E-bay, the cheaper of the two, the EA8300  is at least twice as expensive as what we're using now.  To buy a dozen or so in hope of getting them to work with Batman is out of the question, alas.
>
> Ben, I fully understand your lack of incentive to spend your time on drivers for older hardware.  This is not your problem, really. (Unless, like me, you are seeking ways to address the digital divide, where low entry cost is the key consideration.  Every dollar cheaper means more people can connect, which was important even before the pandemic began.  I selected these Archer [AC]7 v[245] units because there are so many of them for sale that they are kind of hard to sell.  True, I didn't know the QCA driver would limit me to 3 nodes per mesh, nor did I know your driver couldn't support encryption nor DFS, at least not out of the box.  It looked like a reasonable bet at the time; with 2 drivers to choose from, what could go wrong?  Oh, well, nothing worthwhile was ever easy.)
>
> Steve

I'm working with the TIP project, which aims to provide stable OpenWrt capable hardware, among lots of other things.

I know some others in that group are interested in low cost solutions, so curious to know what price you think is
viable for your market...

https://telecominfraproject.com/wifi/

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-03 12:48           ` Ben Greear
@ 2020-06-03 15:35             ` Steve Newcomb
  2020-06-03 16:42               ` Ben Greear
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Newcomb @ 2020-06-03 15:35 UTC (permalink / raw)
  To: Ben Greear, b.a.t.m.a.n, smartwires



On 6/3/20 8:48 AM, Ben Greear wrote:
 > I'm working with the TIP project, which aims to provide stable 
OpenWrt capable hardware, among lots of other things.
 >
 > I know some others in that group are interested in low cost 
solutions, so curious to know what price you think is
 > viable for your market...
 >
 > https://telecominfraproject.com/wifi/

Ben, your question is appropriately industrial, but I've already given 
my answer, which is: "Less is more."  The lower the price, the more 
human capital is protected from the abyss of the digital divide.

True, the primary problem is not so much the one-time cost of a cheap 
radio.  However, one way to address the primary problem is to give 
public-spirited digital-haves a way to lift up their digital-have-not 
neighbors.  With mesh computing, that means purchasing multiple routers 
and making gifts of at least one of them, along with the gift of 
access.  What can the digital-haves afford to give, besides access?  
Typically, not much.

What follows is a rant.  Advice: skip it.

---------------------------------------------

Let me explain my perspective, here.

Personally, I believe that there is no difference between the 
information highway and any other public highway.  The digital divide is 
compelling evidence of oppression of the poor by the wealthy.  The 
nature of the oppression is comparable to highway tolls that restrict 
the mobility of the poor.  Since the capture of US federal regulatory 
bodies by the telecom industry, and in the absence of effective telecom 
regulation in the public interest, the only course available to people 
who want to relieve the damage caused by the exclusion of the 
dispossessed is to work around the edges, which is what I'm doing.  
While I would welcome industrial help, I expect none, because I'm only 
interested in the prosperity of the *entire* public.  (In my retirement, 
I can just barely afford to be.)

I'm unfamiliar with the Telecom Infra Project (TIP).  However, I spent 
more than 20 years voluntarily working on ANSI and ISO information 
interchange standards (ISO 10743, 10744, 13250), and I know exactly what 
I'm talking about when I say that the public interest is unlikely to be 
served by industrial consortia who say things like what the TIP website 
says:

      "We believe that accelerating innovation coupled with new business 
approaches and cost efficiencies will help the industry build the 
networks of the future and create business opportunities for new and 
existing companies, alike."


Such information technology consortia are typically violators of the 
spirit, and generally the letter, of the antitrust legislation that has 
been on the books since the end of the gilded age that preceded the 
current gilded age, and which no recent U.S. administration has seen fit 
to enforce.  They are simply dog fights in which the public interest has 
no dog.  The reason for their existence is to form alliances between  
aggregations of capital as they conspire against the market-leading 
aggregations of capital.

ANSI (the American National Standards Institute) promulgates rules for 
such activities that keep all participants from violating antitrust law 
-- from being "conspiracies in restraint of trade" -- but you never hear 
about ANSI standards any more because nobody bothers to avoid antitrust 
prosecution.  There isn't any, basically.  Why put up with burdensome 
transparency rules?  Why put up with the sandbagging machinations of 
representatives of the actual market leaders?  Open societies are 
expensive, frustrating, and annoying.

The purpose of a business is to make a profit, and that incentive *does* 
serve the public interest, but only in the context of regulation that 
forces the public interest to be served by it.  In the case of the US 
telecom industry, and basically since the Consent Decree of 1982, 
regulation has served the interests of its investors, but not the public 
interest.  The history is appalling, really, and the story keeps getting 
worse.

That's just how things are these days, as we flush whole sections of 
each generation's human potential down the toilet.  It bothers me a 
lot.  For each succeeding generation, the cost of each generation's loss 
is exponentially increased.  Try not to think about it.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-03 15:35             ` Steve Newcomb
@ 2020-06-03 16:42               ` Ben Greear
  2020-06-03 17:56                 ` Steve Newcomb
  0 siblings, 1 reply; 23+ messages in thread
From: Ben Greear @ 2020-06-03 16:42 UTC (permalink / raw)
  To: srn, b.a.t.m.a.n, smartwires



On 06/03/2020 08:35 AM, Steve Newcomb wrote:
>
>
> On 6/3/20 8:48 AM, Ben Greear wrote:
>> I'm working with the TIP project, which aims to provide stable OpenWrt capable hardware, among lots of other things.
>>
>> I know some others in that group are interested in low cost solutions, so curious to know what price you think is
>> viable for your market...
>>
>> https://telecominfraproject.com/wifi/
>
> Ben, your question is appropriately industrial, but I've already given my answer, which is: "Less is more."  The lower the price, the more human capital is protected from the abyss of the digital divide.
>
> True, the primary problem is not so much the one-time cost of a cheap radio.  However, one way to address the primary problem is to give public-spirited digital-haves a way to lift up their digital-have-not neighbors.  With mesh computing, that means purchasing multiple routers and making gifts of at least one of them, along with the gift of access.  What can the digital-haves afford to give, besides access?  Typically, not much.
>
> What follows is a rant.  Advice: skip it.

If you are trying to lift up a broad swath of the world, then you need scale and vision, and part of that is how
to make it self sustaining.  Giving a few crumbs to folks is less useful in my mind than helping give them the means
to make their own bread.  Think someone starting a company that wants to deploy 10k hotspots with 40k satellite
wifi mesh nodes....

Open source software (and maybe hardware) with high volume, affordable, and solid hardware
is one of the core aims of TIP.  Think of a price and minimum hardware that meets your goals,
if I find someone that can make such a thing at such a price, I'll let you know.

If you want wave-1 ath10k to mesh, my advice is to use 7 virtual station vdevs and one AP on each radio.  ath10k-ct
firmware and software will support this nicely.  Have those 7 stations connect to peers' AP vdevs.  Do routing mesh
magic through this topology.  Then you don't care about anything other than STA + AP working.  This might also scale to other platforms
that don't support IBSS or MESH well.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Network stops passing traffic randomly
  2020-06-03 16:42               ` Ben Greear
@ 2020-06-03 17:56                 ` Steve Newcomb
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Newcomb @ 2020-06-03 17:56 UTC (permalink / raw)
  To: Ben Greear, b.a.t.m.a.n, smartwires



On 6/3/20 12:42 PM, Ben Greear wrote:
> If you are trying to lift up a broad swath of the world, then you need 
> scale and vision, and part of that is how
> to make it self sustaining.  Giving a few crumbs to folks is less 
> useful in my mind than helping give them the means
> to make their own bread.  Think someone starting a company that wants 
> to deploy 10k hotspots with 40k satellite
> wifi mesh nodes....
Your vision is deeply correct, but so is mine.  One difference between 
the two ideas is that yours is top-down, with the purpose of offering a 
service, while mine is bottom-up, with the purpose of developing 
community and neighborliness, regardless of service provider(s).   (I'm 
an admirer of the late Fred Rogers, and I'm not ashamed to admit it.)

But you are right, too.  I'm also an admirer of the Indian cataract 
surgery guy (can't remember his name) who insisted that all patients pay 
for their surgery "because charity doesn't scale".  He ultimately built 
a mind-boggling practice that, among other things, presumably now 
manufactures most of the world's intraocular lenses.

>
> Open source software (and maybe hardware) with high volume, 
> affordable, and solid hardware
> is one of the core aims of TIP.  Think of a price and minimum hardware 
> that meets your goals,
> if I find someone that can make such a thing at such a price, I'll let 
> you know.
Fair enough.  How about a *delivered* price, i.e. an out-of-pocket cost, 
of $50.  Obviously a minimum of 2 radios, 128 Mb, and reasonable CPU 
power.  Since we're forward-looking, here, with support for the new 
channels now presumably forthcoming from the FCC.
> If you want wave-1 ath10k to mesh, my advice is to use 7 virtual 
> station vdevs and one AP on each radio.  ath10k-ct
> firmware and software will support this nicely.  Have those 7 stations 
> connect to peers' AP vdevs.  Do routing mesh
> magic through this topology.  Then you don't care about anything other 
> than STA + AP working.  This might also scale to other platforms
> that don't support IBSS or MESH well. 
Many thanks, Ben.  This is a helpful idea, and I daresay not many people 
could have come up with it.  (Certainly not I!)  You have just given me 
some homework to do, and I'm grateful for it.  Bravo. If I get anywhere 
with it, you'll be the first to know.

Steve



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-06-03 17:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25  8:35 Network stops passing traffic randomly smartwires
2020-05-25  8:43 ` Sven Eckelmann
     [not found]   ` <CAL3ir7+RWLrYOzjNQh1VwiKg1sxSgHZMwwqx=9xSfXFnFjE_KQ@mail.gmail.com>
2020-05-25 13:22     ` Sven Eckelmann
2020-05-25 13:45     ` Sven Eckelmann
2020-05-28  1:05   ` smartwires
2020-05-28  8:46     ` Sven Eckelmann
     [not found]   ` <cf75d66e-b0ac-632d-34e6-681ed9c6769d@coolheads.com>
2020-05-28 19:31     ` Sven Eckelmann
2020-05-28 21:17       ` Steve Newcomb
2020-05-28 19:03 ` Steve Newcomb
2020-05-28 19:19   ` Sven Eckelmann
2020-05-28 19:22     ` Ben Greear
2020-05-28 20:59     ` Steve Newcomb
2020-05-28 21:28       ` Ben Greear
2020-06-02  1:41         ` Steve Newcomb
2020-06-02 12:40           ` Steve Newcomb
2020-05-29  0:13   ` smartwires
2020-06-02  2:05     ` Steve Newcomb
2020-06-02 20:02       ` Ben Greear
2020-06-03  2:06         ` Steve Newcomb
2020-06-03 12:48           ` Ben Greear
2020-06-03 15:35             ` Steve Newcomb
2020-06-03 16:42               ` Ben Greear
2020-06-03 17:56                 ` Steve Newcomb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.