b.a.t.m.a.n.lists.open-mesh.org archive mirror
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
@ 2018-01-22 19:24 Matthias Schiffer
  2018-01-22 20:52 ` Sven Eckelmann
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Schiffer @ 2018-01-22 19:24 UTC (permalink / raw)
  To: b.a.t.m.a.n

skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
result when packets with CHECKSUM_COMPLETE are received (at least in some
setups, e.g. when stacking batman-adv on top of VXLAN).

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
---

I don't know what the exact circumstances are that trigger the log spam,
but it seems this was broken forever (I could also reproduce the issue with
our compat-14 legacy branch)... so please ask David to queue this up for
stable :)


 net/batman-adv/soft-interface.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index c95e2b26..edeffcb9 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -459,13 +459,7 @@ void batadv_interface_rx(struct net_device *soft_iface,
 
 	/* skb->dev & skb->pkt_type are set here */
 	skb->protocol = eth_type_trans(skb, soft_iface);
-
-	/* should not be necessary anymore as we use skb_pull_rcsum()
-	 * TODO: please verify this and remove this TODO
-	 * -- Dec 21st 2009, Simon Wunderlich
-	 */
-
-	/* skb->ip_summed = CHECKSUM_UNNECESSARY; */
+	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
 
 	batadv_inc_counter(bat_priv, BATADV_CNT_RX);
 	batadv_add_counter(bat_priv, BATADV_CNT_RX_BYTES,
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
  2018-01-22 19:24 [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path Matthias Schiffer
@ 2018-01-22 20:52 ` Sven Eckelmann
  2018-01-22 21:18   ` Matthias Schiffer
  2018-01-23 14:25   ` Maximilian Wilhelm
  0 siblings, 2 replies; 6+ messages in thread
From: Sven Eckelmann @ 2018-01-22 20:52 UTC (permalink / raw)
  To: b.a.t.m.a.n
  Cc: Matthias Schiffer, Maximilian Wilhelm, Felix Kaechele, Mephisto

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
> result when packets with CHECKSUM_COMPLETE are received (at least in some
> setups, e.g. when stacking batman-adv on top of VXLAN).

Would be nice to have a better explanation here.

The comment previously assumed that skb_pull_rcsum would be enough. But the 
problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The 
actual pull of the ethernet header (with skb_pull_inline) happens inside 
eth_type_trans. Or did I miss anything?

[...]
> I don't know what the exact circumstances are that trigger the log spam,
> but it seems this was broken forever (I could also reproduce the issue with
> our compat-14 legacy branch)... so please ask David to queue this up for
> stable :)

Yes, this is broken since earliest commits. The most relevant commit in 
batman-adv is:

Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")

But I would propose to use following in the kernel tree:

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")

The 4.15 release will be soon(tm) and Simon is currently on vacation. So we 
will most likely postpone the submission to David until Simon found a way out 
of the snow and after 4.15 is released...

But it would be nice when some people could test the patch [1] (together with 
vxlan?) on batman-adv or batman-adv-legacy. And please provide a 
"Tested-by: Full Name <email@example.org>" [2] reply when it works.

Thanks,
	Sven

[1] https://patchwork.open-mesh.org/patch/17250/
[2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
  2018-01-22 20:52 ` Sven Eckelmann
@ 2018-01-22 21:18   ` Matthias Schiffer
  2018-01-23  9:12     ` Matthias Schiffer
  2018-01-23 14:25   ` Maximilian Wilhelm
  1 sibling, 1 reply; 6+ messages in thread
From: Matthias Schiffer @ 2018-01-22 21:18 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: Maximilian Wilhelm, Felix Kaechele, Mephisto


[-- Attachment #1.1: Type: text/plain, Size: 2454 bytes --]

On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
>> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
>> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
>> result when packets with CHECKSUM_COMPLETE are received (at least in some
>> setups, e.g. when stacking batman-adv on top of VXLAN).
> 
> Would be nice to have a better explanation here.
> 
> The comment previously assumed that skb_pull_rcsum would be enough. But the 
> problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The 
> actual pull of the ethernet header (with skb_pull_inline) happens inside 
> eth_type_trans. Or did I miss anything?

This is correct, eth_type_trans() contains a simple skb_pull(), so the csum
must be adjusted afterwards (grepping the kernel for eth_type_trans will
find a lot of this). I can send a v2 with a better commit message later.

> 
> [...]
>> I don't know what the exact circumstances are that trigger the log spam,
>> but it seems this was broken forever (I could also reproduce the issue with
>> our compat-14 legacy branch)... so please ask David to queue this up for
>> stable :)
> 
> Yes, this is broken since earliest commits. The most relevant commit in 
> batman-adv is:
> 
> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
> 
> But I would propose to use following in the kernel tree:
> 
> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
> 
> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we 
> will most likely postpone the submission to David until Simon found a way out 
> of the snow and after 4.15 is released...
> 
> But it would be nice when some people could test the patch [1] (together with 
> vxlan?) on batman-adv or batman-adv-legacy. And please provide a 
> "Tested-by: Full Name <email@example.org>" [2] reply when it works.
> 
> Thanks,> 	Sven

I've tested this on Kernel 4.14.14 (everything working correctly now) and
4.4.110 (here, there are still checksum errors; it seems on older kernels,
the checksum handling in VXLAN is broken too? Still debugging this...)

Matthias



> 
> [1] https://patchwork.open-mesh.org/patch/17250/
> [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
  2018-01-22 21:18   ` Matthias Schiffer
@ 2018-01-23  9:12     ` Matthias Schiffer
  2018-01-23 21:56       ` Maximilian Wilhelm
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Schiffer @ 2018-01-23  9:12 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: Maximilian Wilhelm, Mephisto, Felix Kaechele


[-- Attachment #1.1: Type: text/plain, Size: 3249 bytes --]

On 01/22/2018 10:18 PM, Matthias Schiffer wrote:
> On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
>> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
>>> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
>>> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
>>> result when packets with CHECKSUM_COMPLETE are received (at least in some
>>> setups, e.g. when stacking batman-adv on top of VXLAN).
>>
>> Would be nice to have a better explanation here.
>>
>> The comment previously assumed that skb_pull_rcsum would be enough. But the 
>> problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The 
>> actual pull of the ethernet header (with skb_pull_inline) happens inside 
>> eth_type_trans. Or did I miss anything?
> 
> This is correct, eth_type_trans() contains a simple skb_pull(), so the csum
> must be adjusted afterwards (grepping the kernel for eth_type_trans will
> find a lot of this). I can send a v2 with a better commit message later.
> 
>>
>> [...]
>>> I don't know what the exact circumstances are that trigger the log spam,
>>> but it seems this was broken forever (I could also reproduce the issue with
>>> our compat-14 legacy branch)... so please ask David to queue this up for
>>> stable :)
>>
>> Yes, this is broken since earliest commits. The most relevant commit in 
>> batman-adv is:
>>
>> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
>>
>> But I would propose to use following in the kernel tree:
>>
>> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
>>
>> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we 
>> will most likely postpone the submission to David until Simon found a way out 
>> of the snow and after 4.15 is released...
>>
>> But it would be nice when some people could test the patch [1] (together with 
>> vxlan?) on batman-adv or batman-adv-legacy. And please provide a 
>> "Tested-by: Full Name <email@example.org>" [2] reply when it works.
>>
>> Thanks,> 	Sven
> 
> I've tested this on Kernel 4.14.14 (everything working correctly now) and
> 4.4.110 (here, there are still checksum errors; it seems on older kernels,
> the checksum handling in VXLAN is broken too? Still debugging this...)

I've found the issue of this other checksum problem: batman-adv
fragmentation code doesn't handle the checksum on reassembly at all. I
think the best option here is to simply set ip_summed to CHECKSUM_NONE on
reassembly, I will send another patch for that.

The IP fragmentation code does more fancy things when all fragments have
CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain
circumstances. This only works because IP fragments are guaranteed to be
split at even byte offsets (multiples of 8, actually); as far as I can
tell, batman-adv allows odd fragment sizes, making it impossible to add up
the 16bit checksums in the general case.

Matthias


> 
> 
> 
>>
>> [1] https://patchwork.open-mesh.org/patch/17250/
>> [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
  2018-01-22 20:52 ` Sven Eckelmann
  2018-01-22 21:18   ` Matthias Schiffer
@ 2018-01-23 14:25   ` Maximilian Wilhelm
  1 sibling, 0 replies; 6+ messages in thread
From: Maximilian Wilhelm @ 2018-01-23 14:25 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: b.a.t.m.a.n, Matthias Schiffer, Maximilian Wilhelm,
	Felix Kaechele, Mephisto

Anno domini 2018 Sven Eckelmann scripsit:

> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
[...]
> > I don't know what the exact circumstances are that trigger the log spam,
> > but it seems this was broken forever (I could also reproduce the issue with
> > our compat-14 legacy branch)... so please ask David to queue this up for
> > stable :)
> 
> Yes, this is broken since earliest commits. The most relevant commit in 
> batman-adv is:
> 
> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
> 
> But I would propose to use following in the kernel tree:
> 
> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
> 
> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we 
> will most likely postpone the submission to David until Simon found a way out 
> of the snow and after 4.15 is released...
> 
> But it would be nice when some people could test the patch [1] (together with 
> vxlan?) on batman-adv or batman-adv-legacy. And please provide a 
> "Tested-by: Full Name <email@example.org>" [2] reply when it works.

I took a Debian Kernel package (4.14.13-1~bpo9+1), applied the patch
and deployed the package on a gateway running BATMAN over VTEPs. The
log messages from previous kernels don't show up anymore <3.

Tested-by: Maximilian Wilhelm <max@sdn.clinic>

Thanks!

Best
Max
-- 
"I have to admit I've always suspected that MTBWTF would be a more useful
 metric of real-world performance."
 -- Valdis Kletnieks on NANOG

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path
  2018-01-23  9:12     ` Matthias Schiffer
@ 2018-01-23 21:56       ` Maximilian Wilhelm
  0 siblings, 0 replies; 6+ messages in thread
From: Maximilian Wilhelm @ 2018-01-23 21:56 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking
  Cc: Sven Eckelmann, Maximilian Wilhelm, Mephisto, Felix Kaechele

Anno domini 2018 Matthias Schiffer scripsit:

Hi,

> On 01/22/2018 10:18 PM, Matthias Schiffer wrote:
> > On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
> >> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
> >>> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
> >>> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
> >>> result when packets with CHECKSUM_COMPLETE are received (at least in some
> >>> setups, e.g. when stacking batman-adv on top of VXLAN).
> >>
> >> Would be nice to have a better explanation here.
> >>
> >> The comment previously assumed that skb_pull_rcsum would be enough. But the 
> >> problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The 
> >> actual pull of the ethernet header (with skb_pull_inline) happens inside 
> >> eth_type_trans. Or did I miss anything?
> > 
> > This is correct, eth_type_trans() contains a simple skb_pull(), so the csum
> > must be adjusted afterwards (grepping the kernel for eth_type_trans will
> > find a lot of this). I can send a v2 with a better commit message later.
> > 
> >>
> >> [...]
> >>> I don't know what the exact circumstances are that trigger the log spam,
> >>> but it seems this was broken forever (I could also reproduce the issue with
> >>> our compat-14 legacy branch)... so please ask David to queue this up for
> >>> stable :)
> >>
> >> Yes, this is broken since earliest commits. The most relevant commit in 
> >> batman-adv is:
> >>
> >> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
> >>
> >> But I would propose to use following in the kernel tree:
> >>
> >> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
> >>
> >> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we 
> >> will most likely postpone the submission to David until Simon found a way out 
> >> of the snow and after 4.15 is released...
> >>
> >> But it would be nice when some people could test the patch [1] (together with 
> >> vxlan?) on batman-adv or batman-adv-legacy. And please provide a 
> >> "Tested-by: Full Name <email@example.org>" [2] reply when it works.
> >>
> >> Thanks,> 	Sven
> > 
> > I've tested this on Kernel 4.14.14 (everything working correctly now) and
> > 4.4.110 (here, there are still checksum errors; it seems on older kernels,
> > the checksum handling in VXLAN is broken too? Still debugging this...)
> 
> I've found the issue of this other checksum problem: batman-adv
> fragmentation code doesn't handle the checksum on reassembly at all. I
> think the best option here is to simply set ip_summed to CHECKSUM_NONE on
> reassembly, I will send another patch for that.
> 
> The IP fragmentation code does more fancy things when all fragments have
> CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain
> circumstances. This only works because IP fragments are guaranteed to be
> split at even byte offsets (multiples of 8, actually); as far as I can
> tell, batman-adv allows odd fragment sizes, making it impossible to add up
> the 16bit checksums in the general case.

And

  Tested-By: Maximilian Wilhelm <max@sdn.clinic>

to the fix for fragmentation.c, too.

Disclaimer: As MTUs are calculated accordingly in our backbone
fragmentation of VXLAN packets isn't an issue and we did not see these
messages before. I can confirm, that I still don't see any now,
meaning the log spam from the previous fix is still fixed and no new
issues have arisen as of now.

Thanks a lot! <3

Best
Max
-- 
"Does is bother me, that people hurt others, because they are to weak to face the truth? Yeah. Sorry 'bout that."
 -- Thirteen, House M.D.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-23 21:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 19:24 [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path Matthias Schiffer
2018-01-22 20:52 ` Sven Eckelmann
2018-01-22 21:18   ` Matthias Schiffer
2018-01-23  9:12     ` Matthias Schiffer
2018-01-23 21:56       ` Maximilian Wilhelm
2018-01-23 14:25   ` Maximilian Wilhelm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).