All of lore.kernel.org
 help / color / mirror / Atom feed
* ebtables: load-on-demand extensions
@ 2020-06-16 14:48 Eugene Crosser
  2020-06-16 15:21 ` Jan Engelhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Eugene Crosser @ 2020-06-16 14:48 UTC (permalink / raw)
  To: netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 1767 bytes --]

Hello all,

we need a custom "extention" module working on Layer 2 (nfproto "bridge"
or "netdev"). Making a kernel module and an xtables "extention" shared
object is simple enough. But then it gets more complicated.

In order to make an "extention" work on Layer 3 (with command
`iptables`) it is sufficient to place the shared object in the directory
with other existing extensions (/usr/lib/x86_64-linux-gnu/xtables in a
typical linux distro), and then existing `iptables` command will load it
on demand if it was specified as a "match extension" option `-m` or used
as a jump target. (Existing third party tools such as iptables-netflow
use this approach).

However such "load on demand" mechanism does not exist for Layer 2
(`ebtables` command). Instead, `ebtables` has a hardcoded list of
extensions that are all loaded at startup time (iptables/xtables-eb.c,
function `ebt_load_match_extensions()`). This makes the use of in-house
extensions impossible without rolling out a modified version of
`ebtables` command.

Questions:

1. Is my analysis correct, and it is not possible to use existing
netfilter control tools to configure and enable custom extensions on
layer 2?

2. Is it correct that "new generation" `nft` filtering infrastructure
does not support dynamically loadable extensions at all? (We need a
custom kernel module because we need access to the fields in the skb
that are not exposed to `nft`, and we need a custom extension to
configure the custom module.)

3. If modification to `ebtables` is indeed inevitable, and we make such
modification to allow on demand loading the same way as in `iptables`,
would upstream be interested to incorporate such patch for some future
relase?

Thank you,

Eugene


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-16 14:48 ebtables: load-on-demand extensions Eugene Crosser
@ 2020-06-16 15:21 ` Jan Engelhardt
  2020-06-16 15:54   ` Eugene Crosser
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Engelhardt @ 2020-06-16 15:21 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: netfilter-devel


On Tuesday 2020-06-16 16:48, Eugene Crosser wrote:
>
>2. Is it correct that "new generation" `nft` filtering infrastructure
>does not support dynamically loadable extensions at all? (We need a
>custom kernel module because we need access to the fields in the skb
>that are not exposed to `nft`, and we need a custom extension to
>configure the custom module.)

Why not make a patch to publicly expose the skb's data via nft_meta?
No more custom modules, no more userspace modifications, that would seem 
to be a win-win situation.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-16 15:21 ` Jan Engelhardt
@ 2020-06-16 15:54   ` Eugene Crosser
  2020-06-16 16:33     ` Jan Engelhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Eugene Crosser @ 2020-06-16 15:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 1056 bytes --]

On 6/16/20 5:21 PM, Jan Engelhardt wrote:

>> 2. Is it correct that "new generation" `nft` filtering infrastructure
>> does not support dynamically loadable extensions at all? (We need a
>> custom kernel module because we need access to the fields in the skb
>> that are not exposed to `nft`, and we need a custom extension to
>> configure the custom module.)
> 
> Why not make a patch to publicly expose the skb's data via nft_meta?
> No more custom modules, no more userspace modifications, that would seem 
> to be a win-win situation.

This looks unrealistic to me, at least at the first glance. For our
particular use case, we are running the skb through the kernel function
`skb_validate_network_len()` with custom mtu size, and make decision
based on the outcome. Who knows what other things other people may need
to do. At least the kernel module seems to be a must.

OTOH if it were possible to configure the module in an "agnostic" way,
without a custom "extension" shared object, that would be a win.

Regards,

Eugene


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-16 15:54   ` Eugene Crosser
@ 2020-06-16 16:33     ` Jan Engelhardt
  2020-06-19 13:45       ` Eugene Crosser
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Engelhardt @ 2020-06-16 16:33 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: netfilter-devel


On Tuesday 2020-06-16 17:54, Eugene Crosser wrote:
>>> 2. Is it correct that "new generation" `nft` filtering infrastructure
>>> does not support dynamically loadable extensions at all? (We need a
>>> custom kernel module because we need access to the fields in the skb
>>> that are not exposed to `nft` [..]
>> 
>> Why not make a patch to publicly expose the skb's data via nft_meta?
>> No more custom modules, no more userspace modifications [..]
>
>For our particular use case, we are running the skb through the kernel
>function `skb_validate_network_len()` with custom mtu size [..]

I find no such function in the current or past kernels. Perhaps you could post
the code of the module(s) you already have, and we can assess if it, or the
upstream ideals, can be massaged to make the code stick.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-16 16:33     ` Jan Engelhardt
@ 2020-06-19 13:45       ` Eugene Crosser
  2020-06-19 15:15         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 16+ messages in thread
From: Eugene Crosser @ 2020-06-19 13:45 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 1490 bytes --]

On 6/16/20 6:33 PM, Jan Engelhardt wrote:

>>> Why not make a patch to publicly expose the skb's data via nft_meta?
>>> No more custom modules, no more userspace modifications [..]
>>
>> For our particular use case, we are running the skb through the kernel
>> function `skb_validate_network_len()` with custom mtu size [..]
> 
> I find no such function in the current or past kernels. Perhaps you could post
> the code of the module(s) you already have, and we can assess if it, or the
> upstream ideals, can be massaged to make the code stick.

I really really don't see our module being useful for anyone else! Even
for us, it's just a stopgap measure, hopefully to be dropped after a few
months. That said, I believe that the company will have no objections
against publishing it. I've uploaded initial (untested) code on github
here https://github.com/crosser/ebt-pmtud, in case anyone is interested.

On the other hand, in my opinion there will always be a use case for
custom kernel modules acting as extensions to netfilter. Alternative, in
most cases, would be a standalone kernel module, not using netfilter
infrastructure. But netfilter infrastructure is so useful and
convenient, it would be a shame to lose it!

Anyway, I would like to suggest a patch to `ebtables-nft` that
introduces `-m` option like in `iptables`. Turns out that it's only a
few lines, including manpage update! I will send the patch in a separate
email.

Regards,

Eugene


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-19 13:45       ` Eugene Crosser
@ 2020-06-19 15:15         ` Pablo Neira Ayuso
  2020-06-19 16:21           ` Eugene Crosser
  2020-06-20 10:34           ` Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions] Eugene Crosser
  0 siblings, 2 replies; 16+ messages in thread
From: Pablo Neira Ayuso @ 2020-06-19 15:15 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: Jan Engelhardt, netfilter-devel

On Fri, Jun 19, 2020 at 03:45:57PM +0200, Eugene Crosser wrote:
> On 6/16/20 6:33 PM, Jan Engelhardt wrote:
>
> >>> Why not make a patch to publicly expose the skb's data via nft_meta?
> >>> No more custom modules, no more userspace modifications [..]
> >>
> >> For our particular use case, we are running the skb through the kernel
> >> function `skb_validate_network_len()` with custom mtu size [..]
> >
> > I find no such function in the current or past kernels. Perhaps you could post
> > the code of the module(s) you already have, and we can assess if it, or the
> > upstream ideals, can be massaged to make the code stick.
>
> I really really don't see our module being useful for anyone else! Even
> for us, it's just a stopgap measure, hopefully to be dropped after a few
> months. That said, I believe that the company will have no objections
> against publishing it. I've uploaded initial (untested) code on github
> here https://github.com/crosser/ebt-pmtud, in case anyone is interested.

I think there is a way to achieve this with nft 0.9.6 ?

commit 2a20b5bdbde8a1b510f75b1522772b07e51a77d7
Author: Michael Braun <...>
Date:   Wed May 6 11:46:23 2020 +0200

    datatype: add frag-needed (ipv4) to reject options

    This enables to send icmp frag-needed messages using reject target.

    I have a bridge with connects an gretap tunnel with some ethernet lan.
    On the gretap device I use ignore-df to avoid packets being lost without
    icmp reject to the sender of the bridged packet.

    Still I want to avoid packet fragmentation with the gretap packets.
    So I though about adding an nftables rule like this:

    nft insert rule bridge filter FORWARD \
      ip protocol tcp \
      ip length > 1400 \
      ip frag-off & 0x4000 != 0 \
      reject with icmp type frag-needed

    This would reject all tcp packets with ip dont-fragment bit set that are
    bigger than some threshold (here 1400 bytes). The sender would then receive
    ICMP unreachable - fragmentation needed and reduce its packet size (as
    defined with PMTU).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ebtables: load-on-demand extensions
  2020-06-19 15:15         ` Pablo Neira Ayuso
@ 2020-06-19 16:21           ` Eugene Crosser
  2020-06-20 10:34           ` Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions] Eugene Crosser
  1 sibling, 0 replies; 16+ messages in thread
From: Eugene Crosser @ 2020-06-19 16:21 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Jan Engelhardt


[-- Attachment #1.1: Type: text/plain, Size: 2429 bytes --]

On 6/19/20 5:15 PM, Pablo Neira Ayuso wrote:

>>>>> Why not make a patch to publicly expose the skb's data via nft_meta?
>>>>> No more custom modules, no more userspace modifications [..]
>>>>
>>>> For our particular use case, we are running the skb through the kernel
>>>> function `skb_validate_network_len()` with custom mtu size [..]
>>>
>>> I find no such function in the current or past kernels. Perhaps you could post
>>> the code of the module(s) you already have, and we can assess if it, or the
>>> upstream ideals, can be massaged to make the code stick.
>>
>> I really really don't see our module being useful for anyone else! Even
>> for us, it's just a stopgap measure, hopefully to be dropped after a few
>> months. That said, I believe that the company will have no objections
>> against publishing it. I've uploaded initial (untested) code on github
>> here https://github.com/crosser/ebt-pmtud, in case anyone is interested.
> 
> I think there is a way to achieve this with nft 0.9.6 ?

But this does not take into account that large gso skbs are acceptable,
does it? What we need is to check `skb->len` (minus mac header) for
_non-gso_ skbs and `skb_gso_validate_network_len()` for _gso_ skbs.

Does such functionality exist? I did not find any reference to gso in
the code.

Regards,

Eugene

> commit 2a20b5bdbde8a1b510f75b1522772b07e51a77d7
> Author: Michael Braun <...>
> Date:   Wed May 6 11:46:23 2020 +0200
> 
>     datatype: add frag-needed (ipv4) to reject options
> 
>     This enables to send icmp frag-needed messages using reject target.
> 
>     I have a bridge with connects an gretap tunnel with some ethernet lan.
>     On the gretap device I use ignore-df to avoid packets being lost without
>     icmp reject to the sender of the bridged packet.
> 
>     Still I want to avoid packet fragmentation with the gretap packets.
>     So I though about adding an nftables rule like this:
> 
>     nft insert rule bridge filter FORWARD \
>       ip protocol tcp \
>       ip length > 1400 \
>       ip frag-off & 0x4000 != 0 \
>       reject with icmp type frag-needed
> 
>     This would reject all tcp packets with ip dont-fragment bit set that are
>     bigger than some threshold (here 1400 bytes). The sender would then receive
>     ICMP unreachable - fragmentation needed and reduce its packet size (as
>     defined with PMTU).
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-19 15:15         ` Pablo Neira Ayuso
  2020-06-19 16:21           ` Eugene Crosser
@ 2020-06-20 10:34           ` Eugene Crosser
  2020-06-20 11:04             ` Florian Westphal
  1 sibling, 1 reply; 16+ messages in thread
From: Eugene Crosser @ 2020-06-20 10:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Jan Engelhardt, netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 3480 bytes --]

On 19/06/2020 17:15, Pablo Neira Ayuso wrote:
>>>>> Why not make a patch to publicly expose the skb's data via nft_meta?
>>>>> No more custom modules, no more userspace modifications [..]
>>>>
>>>> For our particular use case, we are running the skb through the kernel
>>>> function `skb_validate_network_len()` with custom mtu size [..]

(the function name is skb_gso_validate_network_len, my mistake)

I previously expressed strong opinion that our "hack" to send icmp rejects on
Layer 2 will not be useful for anyone else. But the existence of the commit from
Michael Braun proves that I was wrong, and Jan Engelhards was right: it probably
makes sense to implement the functionality that we need within the "new" nft
infrastructure.

As far as I understand, the part that is missing in the existing implementation
is exposure (in some form) of `skb_gso_validate_network_len()` function to
user-configurable filters. Because the kernel does now expose the _size_ under
which a gso skb can be segmented, but only the _boolean_ with the meaning "this
gso skb can fit in mtu that you've specified", I could envision a new match that
could be named like "fits-in-mtu-size" or "segmentable-under". Then an nftables
rule could look roughly like this (for ipv4):

    nft insert rule bridge filter FORWARD \
      ip frag-off & 0x4000 != 0 \
      ip protocol tcp \
      not tcp segmentable-under 1400 \
      reject with icmp type frag-needed

This new function would act the same as "ip len < XXX" for non-gso skbs, and
call skb_gso_validate_network_len(skb, XXX) for gso skbs.

Do you think it makes sense? Shall I try to implement this and submit a patch?

Thank you,

Eugene

>>> I find no such function in the current or past kernels. Perhaps you could post
>>> the code of the module(s) you already have, and we can assess if it, or the
>>> upstream ideals, can be massaged to make the code stick.
>>
>> I really really don't see our module being useful for anyone else! Even
>> for us, it's just a stopgap measure, hopefully to be dropped after a few
>> months. That said, I believe that the company will have no objections
>> against publishing it. I've uploaded initial (untested) code on github
>> here https://github.com/crosser/ebt-pmtud, in case anyone is interested.
> 
> I think there is a way to achieve this with nft 0.9.6 ?
> 
> commit 2a20b5bdbde8a1b510f75b1522772b07e51a77d7
> Author: Michael Braun <...>
> Date:   Wed May 6 11:46:23 2020 +0200
> 
>     datatype: add frag-needed (ipv4) to reject options
> 
>     This enables to send icmp frag-needed messages using reject target.
> 
>     I have a bridge with connects an gretap tunnel with some ethernet lan.
>     On the gretap device I use ignore-df to avoid packets being lost without
>     icmp reject to the sender of the bridged packet.
> 
>     Still I want to avoid packet fragmentation with the gretap packets.
>     So I though about adding an nftables rule like this:
> 
>     nft insert rule bridge filter FORWARD \
>       ip protocol tcp \
>       ip length > 1400 \
>       ip frag-off & 0x4000 != 0 \
>       reject with icmp type frag-needed
> 
>     This would reject all tcp packets with ip dont-fragment bit set that are
>     bigger than some threshold (here 1400 bytes). The sender would then receive
>     ICMP unreachable - fragmentation needed and reduce its packet size (as
>     defined with PMTU).
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-20 10:34           ` Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions] Eugene Crosser
@ 2020-06-20 11:04             ` Florian Westphal
  2020-06-20 21:16               ` Eugene Crosser
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-06-20 11:04 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: Pablo Neira Ayuso, Jan Engelhardt, netfilter-devel

Eugene Crosser <crosser@average.org> wrote:
> On 19/06/2020 17:15, Pablo Neira Ayuso wrote:
> >>>>> Why not make a patch to publicly expose the skb's data via nft_meta?
> >>>>> No more custom modules, no more userspace modifications [..]
> >>>>
> >>>> For our particular use case, we are running the skb through the kernel
> >>>> function `skb_validate_network_len()` with custom mtu size [..]
> 
> (the function name is skb_gso_validate_network_len, my mistake)
> 
> I previously expressed strong opinion that our "hack" to send icmp rejects on
> Layer 2 will not be useful for anyone else. But the existence of the commit from
> Michael Braun proves that I was wrong, and Jan Engelhards was right: it probably
> makes sense to implement the functionality that we need within the "new" nft
> infrastructure.

Yes, just do what Jan suggested and expost this in nft_meta.c

> As far as I understand, the part that is missing in the existing implementation
> is exposure (in some form) of `skb_gso_validate_network_len()` function to
> user-configurable filters.

No, nft already has "< $value" logic.
The only missing piece of the puzzle is a way to populate an nft
register with the "size per segment" value.

> Because the kernel does now expose the _size_ under
> which a gso skb can be segmented, but only the _boolean_ with the meaning "this
> gso skb can fit in mtu that you've specified",

It would be best to remove "static" from skb_gso_network_seglen() and
add an EXPORT_SYMBOL_GPL for it.

Then, extend nft_meta.c to set register content to that for gso
or the ip/ipv6 packet size for !gso.

Then, extend nft to support something like

      nft insert rule bridge filter FORWARD \
        ip frag-off & 0x4000 != 0 \
        ip protocol tcp \
	meta nh_segment_length > 1400 \
        reject with icmp type frag-needed

[ NB: I suck at naming, so feel free to come up
  with somethng more descriptive than "nh_segment_length".
  l3size? nh-len...? ]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-20 11:04             ` Florian Westphal
@ 2020-06-20 21:16               ` Eugene Crosser
  2020-06-21  3:24                 ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Eugene Crosser @ 2020-06-20 21:16 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, Jan Engelhardt, netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 2675 bytes --]

On 20/06/2020 13:04, Florian Westphal wrote:

>>>>>>> Why not make a patch to publicly expose the skb's data via nft_meta?
>>>>>>> No more custom modules, no more userspace modifications [..]
>>>>>>
>>>>>> For our particular use case, we are running the skb through the kernel
>>>>>> function `skb_validate_network_len()` with custom mtu size [..]

>> makes sense to implement the functionality that we need within the "new" nft
>> infrastructure.
> 
> Yes, just do what Jan suggested and expost this in nft_meta.c
> 
>> As far as I understand, the part that is missing in the existing implementation
>> is exposure (in some form) of `skb_gso_validate_network_len()` function to
>> user-configurable filters.
> 
> No, nft already has "< $value" logic.
> The only missing piece of the puzzle is a way to populate an nft
> register with the "size per segment" value.

I don't think that it works. `skb_gso_network_seglen()` gives the (same for all
segments) segment length _only_ when `shinfo->gso_size != GSO_BY_FRAGS`. If we
were to expose maximum segment length for skbs with `gso_size == GSO_BY_FRAGS`,
we'd need a new function that basically replicates the functionality of
`skb_gso_size_check()` and performs `skb_walk_frags()`, only instead of
returning `false` on first violation finds and then returns the maximum
encoutered value.

That means we'd need to introduce a new function for the sole purpose of making
the proposed check fit in the "less-equal-greater" model. And the only practical
use of the feature is to check "fits-doesn't fit" anyway. It would seem logical
to rather use the kernel function that already exists and already has the
"fits-doesn't fit" semantic.

Do you think this is a valid argument to implement a boolean predicate rather
than expose an arithmetic value?

Regards,

Eugene

>> Because the kernel does now expose the _size_ under
>> which a gso skb can be segmented, but only the _boolean_ with the meaning "this
>> gso skb can fit in mtu that you've specified",
> 
> It would be best to remove "static" from skb_gso_network_seglen() and
> add an EXPORT_SYMBOL_GPL for it.
> 
> Then, extend nft_meta.c to set register content to that for gso
> or the ip/ipv6 packet size for !gso.
> 
> Then, extend nft to support something like
> 
>       nft insert rule bridge filter FORWARD \
>         ip frag-off & 0x4000 != 0 \
>         ip protocol tcp \
> 	meta nh_segment_length > 1400 \
>         reject with icmp type frag-needed
> 
> [ NB: I suck at naming, so feel free to come up
>   with somethng more descriptive than "nh_segment_length".
>   l3size? nh-len...? ]
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-20 21:16               ` Eugene Crosser
@ 2020-06-21  3:24                 ` Florian Westphal
  2020-06-21 10:03                   ` Jan Engelhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-06-21  3:24 UTC (permalink / raw)
  To: Eugene Crosser
  Cc: Florian Westphal, Pablo Neira Ayuso, Jan Engelhardt, netfilter-devel

Eugene Crosser <crosser@average.org> wrote:
> > No, nft already has "< $value" logic.
> > The only missing piece of the puzzle is a way to populate an nft
> > register with the "size per segment" value.
> 
> I don't think that it works. `skb_gso_network_seglen()` gives the (same for all
> segments) segment length _only_ when `shinfo->gso_size != GSO_BY_FRAGS`. If we
> were to expose maximum segment length for skbs with `gso_size == GSO_BY_FRAGS`,
> we'd need a new function that basically replicates the functionality of
> `skb_gso_size_check()` and performs `skb_walk_frags()`, only instead of
> returning `false` on first violation finds and then returns the maximum
> encoutered value.

Yes.
 
> That means we'd need to introduce a new function for the sole purpose of making
> the proposed check fit in the "less-equal-greater" model.

Yes and no.

> And the only practical
> use of the feature is to check "fits-doesn't fit" anyway.

Why?  Maybe someone wants to collect statistics on encountered packet
size or something like that.

(Yes, they could also use tcpdump of course).

Or maybe someone wants to do QoS markings on packet sizes, so tehy could
have a map like

{ 0 - 64 : 0x1, 65-1280 : 0x2, 1281-1400 : 0x3 } or whatever.

Point is that nft tries to provide only basic building blocks and allow
users to plumb this together.

> Do you think this is a valid argument to implement a boolean predicate rather
> than expose an arithmetic value?

I would rather see an arithmetic value.

GSO_BY_FRAGS should not occur in forwarding path nornally, but I guess
it might show up with veth coming from a VM, and I indeed forgot about it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-21  3:24                 ` Florian Westphal
@ 2020-06-21 10:03                   ` Jan Engelhardt
  2020-06-21 18:48                     ` Eugene Crosser
  2020-06-21 23:52                     ` Florian Westphal
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Engelhardt @ 2020-06-21 10:03 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eugene Crosser, Pablo Neira Ayuso, netfilter-devel


On Sunday 2020-06-21 05:24, Florian Westphal wrote:

>Eugene Crosser <crosser@average.org> wrote:
>> > No, nft already has "< $value" logic.
>> > The only missing piece of the puzzle is a way to populate an nft
>> > register with the "size per segment" value.
>> 
>> I don't think that it works. `skb_gso_network_seglen()` gives the (same for all
>> segments) segment length _only_ when `shinfo->gso_size != GSO_BY_FRAGS`. If we
>> were to expose maximum segment length for skbs with `gso_size == GSO_BY_FRAGS`,
>> we'd need a new function that basically replicates the functionality of
>> `skb_gso_size_check()` and performs `skb_walk_frags()`, only instead of
>> returning `false` on first violation finds and then returns the maximum
>> encoutered value.
>
>Yes.
> 
>> That means we'd need to introduce a new function for the sole purpose of making
>> the proposed check fit in the "less-equal-greater" model.
>
>Yes and no.
>
>> And the only practical
>> use of the feature is to check "fits-doesn't fit" anyway.
>
>Why?  Maybe someone wants to collect statistics on encountered packet
>size or something like that.

Possibly so, but you would not want to penalize users who do
want the short-circuiting behavior when they are not interested
in the statistics.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-21 10:03                   ` Jan Engelhardt
@ 2020-06-21 18:48                     ` Eugene Crosser
  2020-06-21 23:52                     ` Florian Westphal
  1 sibling, 0 replies; 16+ messages in thread
From: Eugene Crosser @ 2020-06-21 18:48 UTC (permalink / raw)
  To: Jan Engelhardt, Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 1770 bytes --]

On 21/06/2020 12:03, Jan Engelhardt wrote:

>>>> No, nft already has "< $value" logic.
>>>> The only missing piece of the puzzle is a way to populate an nft
>>>> register with the "size per segment" value.
>>>
>>> I don't think that it works. `skb_gso_network_seglen()` gives the (same for all
>>> segments) segment length _only_ when `shinfo->gso_size != GSO_BY_FRAGS`. If we
>>> were to expose maximum segment length for skbs with `gso_size == GSO_BY_FRAGS`,
>>> we'd need a new function that basically replicates the functionality of
>>> `skb_gso_size_check()` and performs `skb_walk_frags()`, only instead of
>>> returning `false` on first violation finds and then returns the maximum
>>> encoutered value.
>>
>> Yes.
>>
>>> That means we'd need to introduce a new function for the sole purpose of making
>>> the proposed check fit in the "less-equal-greater" model.
>>
>> Yes and no.
>>
>>> And the only practical
>>> use of the feature is to check "fits-doesn't fit" anyway.
>>
>> Why?  Maybe someone wants to collect statistics on encountered packet
>> size or something like that.
> 
> Possibly so, but you would not want to penalize users who do
> want the short-circuiting behavior when they are not interested
> in the statistics.

In my opinion, for what it's worth, performance pentalty will likely be
insignificant, and in most cases (`shinfo->gso_size != GSO_BY_FRAGS`) non-existent.

But the thing that makes me feel rather uneasy about the "expose the value" plan
is that the kernel will get two distinct, very similar in their workflow, but
slightly different functions. Next time someone wants to change GSO processing
they will need to take care of both places.

So, which way to go?..

Regards,

Eugene


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-21 10:03                   ` Jan Engelhardt
  2020-06-21 18:48                     ` Eugene Crosser
@ 2020-06-21 23:52                     ` Florian Westphal
  2020-06-22  4:50                       ` Jan Engelhardt
  1 sibling, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-06-21 23:52 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Florian Westphal, Eugene Crosser, Pablo Neira Ayuso, netfilter-devel

Jan Engelhardt <jengelh@inai.de> wrote:
> >Why?  Maybe someone wants to collect statistics on encountered packet
> >size or something like that.
> 
> Possibly so, but you would not want to penalize users who do
> want the short-circuiting behavior when they are not interested
> in the statistics.

What short-circuit behaviour?

The difference we're talking about is:

*reg = get_gso_segment_or_nh_len(skb);

vs.

if (!skb_is_gso(skb) ||
    get_gso_segment_len(skb) <= priv->len))
       regs->verdict.code = NFT_BREAK;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-21 23:52                     ` Florian Westphal
@ 2020-06-22  4:50                       ` Jan Engelhardt
  2020-06-22  7:41                         ` Eugene Crosser
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Engelhardt @ 2020-06-22  4:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eugene Crosser, Pablo Neira Ayuso, netfilter-devel


On Monday 2020-06-22 01:52, Florian Westphal wrote:
>Jan Engelhardt <jengelh@inai.de> wrote:
>> >Why?  Maybe someone wants to collect statistics on encountered packet
>> >size or something like that.
>> 
>> Possibly so, but you would not want to penalize users who do
>> want the short-circuiting behavior when they are not interested
>> in the statistics.
>
>What short-circuit behaviour?
>
>The difference we're talking about is:
>*reg = get_gso_segment_or_nh_len(skb);
>vs.
>if (!skb_is_gso(skb) || get_gso_segment_len(skb) <= priv->len))
>       regs->verdict.code = NFT_BREAK;

I was under the impression the discussion had steered on

  *reg1 = skb_gso_size_check(skb, skb_gso_validate_network_len(skb, priv->len));
  verdict = *reg1 ? NFT_CONTINUE : NFT_BREAK;

vs.

  *reg1 = 0;
  skb_walk_frags(skb, iter)
      *reg1 += seg_len + skb_headlen(iter);
  // and leave reg1 for the next nft op (lt/gt/feeding it to a counter/etc.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions]
  2020-06-22  4:50                       ` Jan Engelhardt
@ 2020-06-22  7:41                         ` Eugene Crosser
  0 siblings, 0 replies; 16+ messages in thread
From: Eugene Crosser @ 2020-06-22  7:41 UTC (permalink / raw)
  To: Jan Engelhardt, Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 1206 bytes --]

On 22/06/2020 06:50, Jan Engelhardt wrote:

>> What short-circuit behaviour?
>>
>> The difference we're talking about is:
>> *reg = get_gso_segment_or_nh_len(skb);
>> vs.
>> if (!skb_is_gso(skb) || get_gso_segment_len(skb) <= priv->len))
>>       regs->verdict.code = NFT_BREAK;
> 
> I was under the impression the discussion had steered on
> 
>   *reg1 = skb_gso_size_check(skb, skb_gso_validate_network_len(skb, priv->len));
>   verdict = *reg1 ? NFT_CONTINUE : NFT_BREAK;
> 
> vs.
> 
>   *reg1 = 0;
>   skb_walk_frags(skb, iter)
>       *reg1 += seg_len + skb_headlen(iter);
>   // and leave reg1 for the next nft op (lt/gt/feeding it to a counter/etc.)

skb_gso_size_check() has skb_walk_frags() inside. This internal skb_walk_frags()
terminates early (is "short-cirquited"?) when a non-compliant segment is
encountered.

If we want to expose the _maximum length_ of the segments, we need another
function that _also_ performs skb_walk_frags() and runs it to the end (does not
"short-circuit").

Performance-wise, this is probably not a significant penalty in most cases. But
it does require a new function that finds the maximum segment length.

Regards,

Eugene


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-06-22  7:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 14:48 ebtables: load-on-demand extensions Eugene Crosser
2020-06-16 15:21 ` Jan Engelhardt
2020-06-16 15:54   ` Eugene Crosser
2020-06-16 16:33     ` Jan Engelhardt
2020-06-19 13:45       ` Eugene Crosser
2020-06-19 15:15         ` Pablo Neira Ayuso
2020-06-19 16:21           ` Eugene Crosser
2020-06-20 10:34           ` Expose skb_gso_validate_network_len() [Was: ebtables: load-on-demand extensions] Eugene Crosser
2020-06-20 11:04             ` Florian Westphal
2020-06-20 21:16               ` Eugene Crosser
2020-06-21  3:24                 ` Florian Westphal
2020-06-21 10:03                   ` Jan Engelhardt
2020-06-21 18:48                     ` Eugene Crosser
2020-06-21 23:52                     ` Florian Westphal
2020-06-22  4:50                       ` Jan Engelhardt
2020-06-22  7:41                         ` Eugene Crosser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.