All of lore.kernel.org
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
@ 2018-05-06 19:55 Leonardo Mörlein
  2018-05-07  6:32 ` Sven Eckelmann
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Leonardo Mörlein @ 2018-05-06 19:55 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Leonardo Mörlein

This patch is not a fix for the issue itself, but a fix for the
other nodes, which are also influenced by the issue.

- Some (affected) nodes send TT announcements for an empty vlan (for
  now only seen with vlan_id = 0).
- This behaviour is a bug! Batman-adv nodes must not send TT
  announcements for empty vlans.
- The receiving batman-adv can not handle incoming TT announcements
  with empty vlans. (The crc check routine batadv_tt_global_check_crc()
  fails.)
- As the receiving node thinks, the crc is broken, it does a full
  table request then.
- The announcing node sends a TT full table to the receiving node
  then, which also contains the empty vlan, so
  *batadv_tt_global_check_crc()* fails again on the receiver side.
- This causes a lot of unneeded TT traffic. In Freifunk Hannover we
  decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes.
  We have ~800 nodes, which are connected via vpn to one of six
  supernodes. Those supernodes are connected with each other in a fully
  connected mesh.

Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
---
 net/batman-adv/translation-table.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 0225616d..d2a843fd 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -2996,8 +2996,21 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node,
 
 		vlan = batadv_orig_node_vlan_get(orig_node,
 						 ntohs(tt_vlan_tmp->vid));
-		if (!vlan)
-			return false;
+		if (!vlan) {
+			/* Due to a bug, some originators send TT
+			 * announcements for empty vlans. As the receiving
+			 * nodes will ignore those empty vlans (they do not
+			 * add a batadv_orig_node_vlan into the transglobal
+			 * for the originating node), crc check will fail
+			 * here. To circumvent this issue, we skip the
+			 * verification for the vlan if the crc is
+			 * equal to 0x00000000.
+			 */
+			if (tt_vlan_tmp->crc == 0x00000000)
+				continue;
+			else
+				return false;
+		}
 
 		crc = vlan->tt.crc;
 		batadv_orig_node_vlan_put(vlan);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
  2018-05-06 19:55 [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received Leonardo Mörlein
@ 2018-05-07  6:32 ` Sven Eckelmann
  2018-05-07  8:06   ` me
  2018-05-09 14:08 ` Marek Lindner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-07  6:32 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Leonardo Mörlein, Antonio Quartulli

[-- Attachment #1: Type: text/plain, Size: 1268 bytes --]

On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote:
> This patch is not a fix for the issue itself, but a fix for the
> other nodes, which are also influenced by the issue.
> 
> - Some (affected) nodes send TT announcements for an empty vlan (for
>   now only seen with vlan_id = 0).
> - This behaviour is a bug! Batman-adv nodes must not send TT
>   announcements for empty vlans.
> - The receiving batman-adv can not handle incoming TT announcements
>   with empty vlans. (The crc check routine batadv_tt_global_check_crc()
>   fails.)
> - As the receiving node thinks, the crc is broken, it does a full
>   table request then.
> - The announcing node sends a TT full table to the receiving node
>   then, which also contains the empty vlan, so
>   *batadv_tt_global_check_crc()* fails again on the receiver side.
> - This causes a lot of unneeded TT traffic. In Freifunk Hannover we
>   decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes.
>   We have ~800 nodes, which are connected via vpn to one of six
>   supernodes. Those supernodes are connected with each other in a fully
>   connected mesh.

Can you attach a (small) pcap of actual traffic which show a packet which 
triggers this behavior?

Thanks,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
  2018-05-07  6:32 ` Sven Eckelmann
@ 2018-05-07  8:06   ` me
  2018-05-07 10:23     ` Sven Eckelmann
  0 siblings, 1 reply; 16+ messages in thread
From: me @ 2018-05-07  8:06 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: Antonio Quartulli

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On 05/07/2018 08:32 AM, Sven Eckelmann wrote:
> Can you attach a (small) pcap of actual traffic which show a packet which 
> triggers this behavior?

For sure. See for the originator 5e:25:f7:13:2d:eb and vid=0x8000 in the
attached pcap. I selected the interested packets (one ogm, one full
table request and one full table response).

Kind regards,
Leonardo Mörlein

[-- Attachment #2: interesting.pcap --]
[-- Type: application/vnd.tcpdump.pcap, Size: 778 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
  2018-05-07  8:06   ` me
@ 2018-05-07 10:23     ` Sven Eckelmann
  2018-05-08  8:45       ` Sven Eckelmann
  0 siblings, 1 reply; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-07 10:23 UTC (permalink / raw)
  To: Leonardo Mörlein; +Cc: b.a.t.m.a.n, Antonio Quartulli

[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]

On Montag, 7. Mai 2018 10:06:20 CEST me wrote:
> On 05/07/2018 08:32 AM, Sven Eckelmann wrote:
> > Can you attach a (small) pcap of actual traffic which show a packet which 
> > triggers this behavior?
> 
> For sure. See for the originator 5e:25:f7:13:2d:eb and vid=0x8000 in the
> attached pcap. I selected the interested packets (one ogm, one full
> table request and one full table response).

Thanks,

These are always from intermediate nodes. I was hoping to see the initial 
packet which "introduced" the incorrect vid=0x8000 without entry in the first 
place to better understand the problem which causes this.

At least I would have hoped that this might help the people at WBM to have a 
better discussion about it. At least to understand whether this is from an 
intermediate node or from the sender.

On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote:
[...]
> - The receiving batman-adv can not handle incoming TT announcements
>   with empty vlans. (The crc check routine batadv_tt_global_check_crc()
>   fails.)

This part is not 100% clear in the commit message.

Right now it just sounds like softif_vlan_list (see 
batadv_tt_prepare_tvlv_local_data) has an "empty" vlan entry (an entry without 
mac addresses). Which brings me to the point that batadv_softif_create_vlan 
initializes new vlan's with the CRC 0x0 before it adds it to the list (see tt/
batadv_vlan_tt in batadv_softif_vlan). We know now that the function 
batadv_tt_local_update_crc is responsible for updating the crc (holding the 
tt.commit_lock) on the sender site (for non-intermediate nodes). And this crc 
is calculated over the bat_priv->tt.local_hash - which (according to the crc 
0x0) doesn't have a single (non-NEW) entry for this VID. The return (new crc) 
is therefore 0x0 and it will be submitted this way.

The same should be true for the receiver. The batadv_tt_global_update_crc goes 
over the list of entries for this vid. And we just received a full table which 
only contains non-vlan entries and no vlan 0 entries. My assumption would 
therefore be that the crc on the receiver should also be 0x0 when 
batadv_tt_global_update_crc was done. But if it is not then it would mean that 
there was still some entry on the receiver which batadv_tt_global_crc used to 
calculate the crc for this vid or the crc was just not calculated for it 
(because it doesn't exist).

The latter would match the content of the full table reply in your pcap, the 
comment in the patch and the info info I got from Antonio: The vid entry will 
only be created in the originator vlan_list when an entry was received (I 
haven't checked this). This would sound to me like the vlan should just be 
created instead of adding this workaround. But maybe you should discuss the 
details here with Antonio because at the end it is just about the different 
ways of handling "!vlan" case from your patch.



Btw. Antonio, wasn't the VLAN 0 always created on each device in the local 
table by the VLAN driver? See the message

   [   29.866038] 8021q: adding VLAN 0 to HW filter on device bat0

when batman-adv attaches itself to the vlan monitoring. But usually, there is 
at least one entry in it (the bat0 mac address). Not sure why this isn't the 
case here. At least batadv_interface_add_vid should add it at the end. Maybe 
there was an error before the entry was created? Would be interesting to know 
where exactly.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
  2018-05-07 10:23     ` Sven Eckelmann
@ 2018-05-08  8:45       ` Sven Eckelmann
  0 siblings, 0 replies; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-08  8:45 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Leonardo Mörlein, Antonio Quartulli

[-- Attachment #1: Type: text/plain, Size: 2887 bytes --]

n Montag, 7. Mai 2018 12:23:57 CEST Sven Eckelmann wrote:
> This would sound to me like the vlan should just be 
> created instead of adding this workaround. But maybe you should discuss the 
> details here with Antonio because at the end it is just about the different 
> ways of handling "!vlan" case from your patch.

Just had a look at the cleanup code for vlans and it seems like each vlan
in vlan_list must contain at least one entry (according to
vlan->tt.num_entries) and otherwise batadv_tt_global_size_mod will drop it.
So it isn't as easy as just adding a simple vlan for each received vlan
entry via TT response.

So your patch (or a variant of it) will most likely be the way to go for
receivers.


Now to your commit message:

On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote:

> batman-adv: mitigate issue when empty vlan is received

Something more specific would be nice. Maybe somebody else has a better
idea but what about:

batman-adv: Mitigate TT loop when receiving empty VLAN


> This patch is not a fix for the issue itself, but a fix for the
> other nodes, which are also influenced by the issue.

I personally don't like to read "this patch" in commit messages. This part
should maybe be reformulated and put at the end (before the Fixes line).
Something like:

While the problem is caused by the sender, the receiver can mitigate the
problem by ignoring (potentially) empty VLANs during the CRC check.

[...]
> - This behaviour is a bug! Batman-adv nodes must not send TT
>   announcements for empty vlans.

Where is this defined as incorrect message? But yes, the current
implementation cannot handle it.

> - The receiving batman-adv can not handle incoming TT announcements
>   with empty vlans. (The crc check routine batadv_tt_global_check_crc()
>   fails.)

This is to vague as discussed earlier. Maybe something more like:

- The receiving batman-adv can not handle incoming TT announcements with
  empty vlans. No new struct batadv_orig_node_vlan entry will be added
  to batadv_orig_node->vlan_list. This will cause the crc check routine
  batadv_tt_global_check_crc() to fail because it cannot find the local
  representation of this empty vlan.

[...]
> - This causes a lot of unneeded TT traffic. In Freifunk Hannover we
>   decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes.
>   We have ~800 nodes, which are connected via vpn to one of six
>   supernodes. Those supernodes are connected with each other in a fully
>   connected mesh.

This part doesn't seem to fit in the list and might just be a new paragraph
at the end to summarize the impact of the problem.


A "Fixes: " line is missing here. Please check whether this sounds about
right:

Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")


Kind regards,
	Sven


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received
  2018-05-06 19:55 [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received Leonardo Mörlein
  2018-05-07  6:32 ` Sven Eckelmann
@ 2018-05-09 14:08 ` Marek Lindner
  2018-05-09 16:06 ` [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs Marek Lindner
  2018-05-10 14:41 ` [B.A.T.M.A.N.] [RFC v2] " Marek Lindner
  3 siblings, 0 replies; 16+ messages in thread
From: Marek Lindner @ 2018-05-09 14:08 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 2302 bytes --]

On Monday, May 7, 2018 3:55:59 AM HKT Leonardo Mörlein wrote:
> -               if (!vlan)
> -                       return false;
> +               if (!vlan) {
> +                       /* Due to a bug, some originators send TT
> +                        * announcements for empty vlans. As the receiving
> +                        * nodes will ignore those empty vlans (they do not
> +                        * add a batadv_orig_node_vlan into the transglobal
> +                        * for the originating node), crc check will fail
> +                        * here. To circumvent this issue, we skip the
> +                        * verification for the vlan if the crc is
> +                        * equal to 0x00000000.
> +                        */
> +                       if (tt_vlan_tmp->crc == 0x00000000)
> +                               continue;
> +                       else
> +                               return false;
> +               }

There are some issues with this approach:

 * As you might be aware, a CRC of 0x00000000 is not an invalid or 
uninitialized CRC. That means it is conceivable the CRC over the translation 
table entries and flags results in a CRC of 0x00000000 without any bugs being 
present. Applying this patch would lead to introducing a bug in these 
conditions.

 * The second concern is about how to deal with 'the bug' (referring to your 
patch comment above).  In your case 'the bug' results in missing TT entries 
for a given VLAN plus a CRC of 0x00000000. What if the CRC was 0x00000001 due 
to some other bug ? Then you're check would totally miss the mark and we're 
back at the beginning.
If we wish to deal with such misbehavior on a scale where we assume the code 
should be able to deal with garbage values in some or all fields a different 
approach is needed. Relying on a magic number (CRC 0x00000000) is as bad as 
the current behavior (assuming each VLAN has an entry).

Shortly, I will send a patch to prevent sending an empty VLAN around. Please 
note that this patch still is more of a band-aid. We still don't know how the 
empty VLAN came into being in the first place. Plus, that patch won't protect 
against garbage packets which certainly requires more effort.

Cheers,
Marek


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-06 19:55 [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received Leonardo Mörlein
  2018-05-07  6:32 ` Sven Eckelmann
  2018-05-09 14:08 ` Marek Lindner
@ 2018-05-09 16:06 ` Marek Lindner
  2018-05-09 17:21   ` Sven Eckelmann
  2018-05-10 14:41 ` [B.A.T.M.A.N.] [RFC v2] " Marek Lindner
  3 siblings, 1 reply; 16+ messages in thread
From: Marek Lindner @ 2018-05-09 16:06 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

A translation table TVLV changset sent with an OGM consists
of a number of headers (one per VLAN) plus the changeset
itself (addition and/or deletion of entries).

The per-VLAN headers are used by OGM recipients for consistency
checks. Said consistency check might determine that a full
translation table request is needed to restore consistency. If
the TT sender adds per-VLAN headers of empty VLANs into the OGM,
recipients are led to believe to have reached an inconsistent
state and thus request a full table update. The full table does
not contain empty VLANs (due to missing entries) the cycle
restarts when the next OGM is issued.

Consequently, when the translation table TVLV headers are
composed, empty VLANs are to be excluded.

Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---
 net/batman-adv/translation-table.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 0225616d..def8d109 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 				   s32 *tt_len)
 {
 	u16 num_vlan = 0;
-	u16 num_entries = 0;
+	u16 vlan_entries = 0;
+	u16 total_entries = 0;
 	u16 change_offset;
 	u16 tvlv_len;
 	struct batadv_tvlv_tt_vlan_data *tt_vlan;
@@ -864,8 +865,12 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		num_vlan++;
-		num_entries += atomic_read(&vlan->tt.num_entries);
+		total_entries += vlan_entries;
 	}
 
 	change_offset = sizeof(**tt_data);
@@ -873,7 +878,7 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	/* if tt_len is negative, allocate the space needed by the full table */
 	if (*tt_len < 0)
-		*tt_len = batadv_tt_len(num_entries);
+		*tt_len = batadv_tt_len(total_entries);
 
 	tvlv_len = *tt_len;
 	tvlv_len += change_offset;
@@ -890,6 +895,10 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(*tt_data + 1);
 	hlist_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		tt_vlan->vid = htons(vlan->vid);
 		tt_vlan->crc = htonl(vlan->tt.crc);
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 16:06 ` [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs Marek Lindner
@ 2018-05-09 17:21   ` Sven Eckelmann
  2018-05-09 18:20     ` Sven Eckelmann
  0 siblings, 1 reply; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-09 17:21 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Mittwoch, 9. Mai 2018 18:06:05 CEST Marek Lindner wrote:
[...]
> Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")

I know, my fault but please use 21a57f6e7a3b instead of 7ea7b4a14275. The 
latter is in the official Linux repository and not batman-adv.git.

[...]
> diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> index 0225616d..def8d109 100644
> --- a/net/batman-adv/translation-table.c
> +++ b/net/batman-adv/translation-table.c
> @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,

What about batadv_tt_prepare_tvlv_local_data?

Kind regards,
	Sven


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 17:21   ` Sven Eckelmann
@ 2018-05-09 18:20     ` Sven Eckelmann
  2018-05-09 20:38       ` Sven Eckelmann
  2018-05-10 10:28       ` Marek Lindner
  0 siblings, 2 replies; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-09 18:20 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
[...]
> [...]
> > diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> > index 0225616d..def8d109 100644
> > --- a/net/batman-adv/translation-table.c
> > +++ b/net/batman-adv/translation-table.c
> > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
> 
> What about batadv_tt_prepare_tvlv_local_data?

Let me rephrase that: Are we sure that this is from intermediate nodes or is 
there still the possibility that it is from the actual originator? Regarding 
the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
wait for Leonardo to debug why it happens in the first place?

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 18:20     ` Sven Eckelmann
@ 2018-05-09 20:38       ` Sven Eckelmann
  2018-05-09 21:34         ` lemoer
  2018-05-10  9:12         ` Antonio Quartulli
  2018-05-10 10:28       ` Marek Lindner
  1 sibling, 2 replies; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-09 20:38 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

On Mittwoch, 9. Mai 2018 20:20:58 CEST Sven Eckelmann wrote:
> On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
> [...]
> > [...]
> > > diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> > > index 0225616d..def8d109 100644
> > > --- a/net/batman-adv/translation-table.c
> > > +++ b/net/batman-adv/translation-table.c
> > > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
> > 
> > What about batadv_tt_prepare_tvlv_local_data?
> 
> Let me rephrase that: Are we sure that this is from intermediate nodes or is 
> there still the possibility that it is from the actual originator? Regarding 
> the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
> wait for Leonardo to debug why it happens in the first place?

Wait. When it is a problem of an intermediate node that doesn't have (for 
whatever reason) not a valid copy of the entries, shouldn't the intermediate
node just ignore the request and not answer it? Otherwise the next OGM with 
the data from batadv_tt_prepare_tvlv_local_data from the actual node will 
just trigger the next request by the receiver.

And the next thing is that your commit message is talking about OGMs all the 
time. But the OGM TVLV are not filled by the 
batadv_tt_prepare_tvlv_global_data but by the 
batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
doing here and why you do it.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 20:38       ` Sven Eckelmann
@ 2018-05-09 21:34         ` lemoer
  2018-05-10  9:12         ` Antonio Quartulli
  1 sibling, 0 replies; 16+ messages in thread
From: lemoer @ 2018-05-09 21:34 UTC (permalink / raw)
  To: b.a.t.m.a.n

On 05/09/2018 10:38 PM, Sven Eckelmann wrote:
> Wait. When it is a problem of an intermediate node that doesn't have (for 
> whatever reason) not a valid copy of the entries, shouldn't the intermediate
> node just ignore the request and not answer it?

Yes. It's forwarded here

https://elixir.bootlin.com/linux/v4.16.8/source/net/batman-adv/translation-table.c#L4245

if the check here

https://elixir.bootlin.com/linux/v4.16.8/source/net/batman-adv/translation-table.c#L3199

fails. I also wrote a debug patch which adds counters which
differentiate between crc misses, ttvn misses and other misses. I can
also prepare them, when it is desired.

> And the next thing is that your commit message is talking about OGMs all the 
> time. But the OGM TVLV are not filled by the 
> batadv_tt_prepare_tvlv_global_data but by the 
> batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
> doing here and why you do it.

Yes. IIRC the checksums in the OGMs are actually causing the issue. The
checksums in the TT responses are not evaluated for now.

Kind regards,
Leonardo Mörlein

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 20:38       ` Sven Eckelmann
  2018-05-09 21:34         ` lemoer
@ 2018-05-10  9:12         ` Antonio Quartulli
  1 sibling, 0 replies; 16+ messages in thread
From: Antonio Quartulli @ 2018-05-10  9:12 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking,
	Sven Eckelmann
  Cc: Marek Lindner


[-- Attachment #1.1: Type: text/plain, Size: 1328 bytes --]



On 10/05/18 04:38, Sven Eckelmann wrote:
>> Let me rephrase that: Are we sure that this is from intermediate nodes or is 
>> there still the possibility that it is from the actual originator? Regarding 
>> the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
>> wait for Leonardo to debug why it happens in the first place?
> 
> Wait. When it is a problem of an intermediate node that doesn't have (for 
> whatever reason) not a valid copy of the entries, shouldn't the intermediate
> node just ignore the request and not answer it? Otherwise the next OGM with 
> the data from batadv_tt_prepare_tvlv_local_data from the actual node will 
> just trigger the next request by the receiver.
> 
> And the next thing is that your commit message is talking about OGMs all the 
> time. But the OGM TVLV are not filled by the 
> batadv_tt_prepare_tvlv_global_data but by the 
> batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
> doing here and why you do it.

I believe batadv_tt_prepare_tvlv_local_data() was the actual function
that Marek wanted to modify. But we probably spent too much time digging
into translation-table.c these days and I managed to confuse him enough.


Thanks for pointing this out, Sven!


Cheers,

-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-09 18:20     ` Sven Eckelmann
  2018-05-09 20:38       ` Sven Eckelmann
@ 2018-05-10 10:28       ` Marek Lindner
  1 sibling, 0 replies; 16+ messages in thread
From: Marek Lindner @ 2018-05-10 10:28 UTC (permalink / raw)
  To: Sven Eckelmann; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Thursday, May 10, 2018 2:20:58 AM HKT Sven Eckelmann wrote:
> On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
> [...]
> 
> > [...]
> > 
> > > diff --git a/net/batman-adv/translation-table.c
> > > b/net/batman-adv/translation-table.c index 0225616d..def8d109 100644
> > > --- a/net/batman-adv/translation-table.c
> > > +++ b/net/batman-adv/translation-table.c
> > > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct
> > > batadv_orig_node *orig_node,> 
> > What about batadv_tt_prepare_tvlv_local_data?
> 
> Let me rephrase that: Are we sure that this is from intermediate nodes or is
> there still the possibility that it is from the actual originator?
> Regarding the latter: Should batadv_tt_prepare_tvlv_local_data be modified
> or will you wait for Leonardo to debug why it happens in the first place?

We believe to have an idea why this happens in the first place and I have 
another patch in the works which Leonardo should test in his setup.

Cheers,
Marek

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [B.A.T.M.A.N.] [RFC v2] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-06 19:55 [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received Leonardo Mörlein
                   ` (2 preceding siblings ...)
  2018-05-09 16:06 ` [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs Marek Lindner
@ 2018-05-10 14:41 ` Marek Lindner
  2018-05-10 15:50   ` Sven Eckelmann
  3 siblings, 1 reply; 16+ messages in thread
From: Marek Lindner @ 2018-05-10 14:41 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

A translation table TVLV changset sent with an OGM consists
of a number of headers (one per VLAN) plus the changeset
itself (addition and/or deletion of entries).

The per-VLAN headers are used by OGM recipients for consistency
checks. Said consistency check might determine that a full
translation table request is needed to restore consistency. If
the TT sender adds per-VLAN headers of empty VLANs into the OGM,
recipients are led to believe to have reached an inconsistent
state and thus request a full table update. The full table does
not contain empty VLANs (due to missing entries) the cycle
restarts when the next OGM is issued.

Consequently, when the translation table TVLV headers are
composed, empty VLANs are to be excluded.

Fixes: 21a57f6e7a3b ("batman-adv: make the TT CRC logic VLAN specific")

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---

v2:
 * moved changes into batadv_tt_prepare_tvlv_local_data()
 * updated 'fixes' commit id

 net/batman-adv/translation-table.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 0225616d..b51c034a 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -931,15 +931,20 @@ batadv_tt_prepare_tvlv_local_data(struct batadv_priv *bat_priv,
 	struct batadv_tvlv_tt_vlan_data *tt_vlan;
 	struct batadv_softif_vlan *vlan;
 	u16 num_vlan = 0;
-	u16 num_entries = 0;
+	u16 vlan_entries = 0;
+	u16 total_entries = 0;
 	u16 tvlv_len;
 	u8 *tt_change_ptr;
 	int change_offset;
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		num_vlan++;
-		num_entries += atomic_read(&vlan->tt.num_entries);
+		total_entries += vlan_entries;
 	}
 
 	change_offset = sizeof(**tt_data);
@@ -947,7 +952,7 @@ batadv_tt_prepare_tvlv_local_data(struct batadv_priv *bat_priv,
 
 	/* if tt_len is negative, allocate the space needed by the full table */
 	if (*tt_len < 0)
-		*tt_len = batadv_tt_len(num_entries);
+		*tt_len = batadv_tt_len(total_entries);
 
 	tvlv_len = *tt_len;
 	tvlv_len += change_offset;
@@ -964,6 +969,10 @@ batadv_tt_prepare_tvlv_local_data(struct batadv_priv *bat_priv,
 
 	tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(*tt_data + 1);
 	hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		tt_vlan->vid = htons(vlan->vid);
 		tt_vlan->crc = htonl(vlan->tt.crc);
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC v2] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-10 14:41 ` [B.A.T.M.A.N.] [RFC v2] " Marek Lindner
@ 2018-05-10 15:50   ` Sven Eckelmann
  2018-05-11 15:58     ` Marek Lindner
  0 siblings, 1 reply; 16+ messages in thread
From: Sven Eckelmann @ 2018-05-10 15:50 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: Marek Lindner

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

On Donnerstag, 10. Mai 2018 16:41:37 CEST Marek Lindner wrote:
>         rcu_read_lock();
>         hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) {
> +               vlan_entries = atomic_read(&vlan->tt.num_entries);
> +               if (vlan_entries < 1)
> +                       continue;
> +
>                 num_vlan++;
> -               num_entries += atomic_read(&vlan->tt.num_entries);
> +               total_entries += vlan_entries;
>         }

Please submit it on top of the maint branch before submitting it the next 
time. The rcu_read_lock isn't there anymore.

The rest also applies only with some offsets (which sound like you have some 
other modifications in your branch).

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [B.A.T.M.A.N.] [RFC v2] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
  2018-05-10 15:50   ` Sven Eckelmann
@ 2018-05-11 15:58     ` Marek Lindner
  0 siblings, 0 replies; 16+ messages in thread
From: Marek Lindner @ 2018-05-11 15:58 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Thursday, May 10, 2018 11:50:48 PM HKT Sven Eckelmann wrote:
> Please submit it on top of the maint branch before submitting it the next
> time. The rcu_read_lock isn't there anymore.

I did not intend this patch to go into maint (hence the missing maint subject 
prefix) as I considered that to be more of a sanity check. The "fix adding 
VLANs with partial state" is the real fix.

Though, I am fine with maint too.

Cheers,
Marek

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-05-11 15:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-06 19:55 [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received Leonardo Mörlein
2018-05-07  6:32 ` Sven Eckelmann
2018-05-07  8:06   ` me
2018-05-07 10:23     ` Sven Eckelmann
2018-05-08  8:45       ` Sven Eckelmann
2018-05-09 14:08 ` Marek Lindner
2018-05-09 16:06 ` [B.A.T.M.A.N.] [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs Marek Lindner
2018-05-09 17:21   ` Sven Eckelmann
2018-05-09 18:20     ` Sven Eckelmann
2018-05-09 20:38       ` Sven Eckelmann
2018-05-09 21:34         ` lemoer
2018-05-10  9:12         ` Antonio Quartulli
2018-05-10 10:28       ` Marek Lindner
2018-05-10 14:41 ` [B.A.T.M.A.N.] [RFC v2] " Marek Lindner
2018-05-10 15:50   ` Sven Eckelmann
2018-05-11 15:58     ` Marek Lindner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.