All of lore.kernel.org
 help / color / mirror / Atom feed
* AF_PACKET fanout not working on MPLS traffic
@ 2019-01-14 18:40 Martijn van Oosterhout
  2019-01-14 20:03 ` Willem de Bruijn
  2019-01-15 17:35 ` Ido Schimmel
  0 siblings, 2 replies; 5+ messages in thread
From: Martijn van Oosterhout @ 2019-01-14 18:40 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

Hi netdev,

We're running into an issue where incoming traffic for Suricata is not
being distributed across the workers despite AF_PACKET with fanout
being used, and it appears to be a kernel issue. Below is a description
of the problem and possible solution.

Seen on version kernel 4.19, but the code on 4.20 seem largely
unchanged.

When a packet needs to be distributed by fanout it calls
net/packet/af_packet.c:fanout_demux_hash which in turns calls
net/core/flow_dissector.c:__skb_get_hash_symmetric which in turn calls
net/core/flow_dissector.c:__skb_flow_dissect. However, if you look at
the code that parses MPLS traffic it looks like so:

--- snip ---
net/core/flow_dissector.c:1023
        case htons(ETH_P_MPLS_UC):
        case htons(ETH_P_MPLS_MC):
                fdret = __skb_flow_dissect_mpls(skb, flow_dissector,
                                                target_container, data,
                                                nhoff, hlen);
                break;
--- snip ---

What's going on here is that the dissector goes to extract the MPLS
flow information and then stops (it returns either GOOD or BAD here).
However because flow_keys_dissector_symmetric does not include
FLOW_DISSECTOR_KEY_MPLS no information is extracted at all, with the
result that the hash is always the same for every packet.

I see a two ways this could be fixed.

Option 1: include FLOW_DISSECTOR_KEY_MPLS in
flow_keys_dissector_symmetric but that seems a big assumption, we don't
do that for VLANs for example.

Option 2: Teach the dissector to, in the case where there is an MPLS
header that is not for entropy, to skip the MPLS header(s) and continue
the dissection on the IP headers that come after the MPLS header.

I think option 2 seems to me the right approach, however the dissector
(AFAICT) is used extensively from many places in the kernel so I'd like
some confirmation before spending too much time on it. It seems like it
could lead to an unexpected performance impact on systems using MPLS.
Or perhaps there is something else going on I missed.

And there is actually another problem: MPLS provides no information
about the next header because it assumes the endpoints in the network
recognise the MPLS headers. Which means you'd have to make a guess
about what the next layer should be.

Any ideas? Is this the right approach?

Thanks in advance,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> The combine: one man, one day, wheat for half a million loaves of bread.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: AF_PACKET fanout not working on MPLS traffic
  2019-01-14 18:40 AF_PACKET fanout not working on MPLS traffic Martijn van Oosterhout
@ 2019-01-14 20:03 ` Willem de Bruijn
  2019-01-15  8:22   ` Martijn van Oosterhout
  2019-01-15 17:35 ` Ido Schimmel
  1 sibling, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2019-01-14 20:03 UTC (permalink / raw)
  To: Martijn van Oosterhout; +Cc: Network Development

On Mon, Jan 14, 2019 at 2:12 PM Martijn van Oosterhout
<kleptog@svana.org> wrote:
>
> Hi netdev,
>
> We're running into an issue where incoming traffic for Suricata is not
> being distributed across the workers despite AF_PACKET with fanout
> being used, and it appears to be a kernel issue. Below is a description
> of the problem and possible solution.
>
> Seen on version kernel 4.19, but the code on 4.20 seem largely
> unchanged.
>
> When a packet needs to be distributed by fanout it calls
> net/packet/af_packet.c:fanout_demux_hash which in turns calls
> net/core/flow_dissector.c:__skb_get_hash_symmetric which in turn calls
> net/core/flow_dissector.c:__skb_flow_dissect. However, if you look at
> the code that parses MPLS traffic it looks like so:
>
> --- snip ---
> net/core/flow_dissector.c:1023
>         case htons(ETH_P_MPLS_UC):
>         case htons(ETH_P_MPLS_MC):
>                 fdret = __skb_flow_dissect_mpls(skb, flow_dissector,
>                                                 target_container, data,
>                                                 nhoff, hlen);
>                 break;
> --- snip ---
>
> What's going on here is that the dissector goes to extract the MPLS
> flow information and then stops (it returns either GOOD or BAD here).
> However because flow_keys_dissector_symmetric does not include
> FLOW_DISSECTOR_KEY_MPLS no information is extracted at all, with the
> result that the hash is always the same for every packet.
>
> I see a two ways this could be fixed.
>
> Option 1: include FLOW_DISSECTOR_KEY_MPLS in
> flow_keys_dissector_symmetric but that seems a big assumption, we don't
> do that for VLANs for example.

This sounds fine to me. Though it will require extra work to make
__skb_get_has_symmetric actually use the entropy. And in practice it's
not clear that this will result in much entropy.

> Option 2: Teach the dissector to, in the case where there is an MPLS
> header that is not for entropy, to skip the MPLS header(s) and continue
> the dissection on the IP headers that come after the MPLS header.
>
> I think option 2 seems to me the right approach, however the dissector
> (AFAICT) is used extensively from many places in the kernel so I'd like
> some confirmation before spending too much time on it. It seems like it
> could lead to an unexpected performance impact on systems using MPLS.
> Or perhaps there is something else going on I missed.
>
> And there is actually another problem: MPLS provides no information
> about the next header because it assumes the endpoints in the network
> recognise the MPLS headers. Which means you'd have to make a guess
> about what the next layer should be.

This is the real issue. I don't think this can be done in general
purpose code. The new BPF flow dissector, however, does allow you to
deploy a custom dissector in environments where the inner protocol is
known.

  https://lwn.net/Articles/764200/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: AF_PACKET fanout not working on MPLS traffic
  2019-01-14 20:03 ` Willem de Bruijn
@ 2019-01-15  8:22   ` Martijn van Oosterhout
  2019-01-15 16:06     ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: Martijn van Oosterhout @ 2019-01-15  8:22 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

Hoi Willem,

On Mon, Jan 14, 2019 at 03:03:42PM -0500, Willem de Bruijn wrote:
> On Mon, Jan 14, 2019 at 2:12 PM Martijn van Oosterhout wrote:
> > I see a two ways this could be fixed.
> >
> > Option 1: include FLOW_DISSECTOR_KEY_MPLS in
> > flow_keys_dissector_symmetric but that seems a big assumption, we don't
> > do that for VLANs for example.
> 
> This sounds fine to me. Though it will require extra work to make
> __skb_get_has_symmetric actually use the entropy. And in practice it's
> not clear that this will result in much entropy.

Ok, I guess this really depends on the kind of traffic. You'd have to
assume that flows in both directions get the same labels, which may or
may not be true. (The assumption here is that you want flows to stay
together, otherwise you may as well use round-robin).

> > And there is actually another problem: MPLS provides no information
> > about the next header because it assumes the endpoints in the network
> > recognise the MPLS headers. Which means you'd have to make a guess
> > about what the next layer should be.
> 
> This is the real issue. I don't think this can be done in general
> purpose code. The new BPF flow dissector, however, does allow you to
> deploy a custom dissector in environments where the inner protocol is
> known.
> 
>   https://lwn.net/Articles/764200/

Thanks for the reference! Is this code considered mature enough for
production?

Also, since this is new, I can't find much info on how to use it. But
if I understand correctly you create a flow dissector in BPF which
extracts the various parts.  There is one such dissector per namespace,
it replaces the internal one completely.  You attach it using
bpftool[1], which loads the program and confiigures where the BPF
program will store various keys.  We can then make a BPF program which
works for our traffic, load it and then the hashing code from AF_PACKET
will start using it instead of the builtin dissector.

Does that sound right?

[1]: https://patchwork.kernel.org/patch/10673201/

Thanks in advance,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> The combine: one man, one day, wheat for half a million loaves of bread.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: AF_PACKET fanout not working on MPLS traffic
  2019-01-15  8:22   ` Martijn van Oosterhout
@ 2019-01-15 16:06     ` Willem de Bruijn
  0 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2019-01-15 16:06 UTC (permalink / raw)
  To: Martijn van Oosterhout; +Cc: Network Development

On Tue, Jan 15, 2019 at 3:22 AM Martijn van Oosterhout
<kleptog@svana.org> wrote:
>
> Hoi Willem,
>
> On Mon, Jan 14, 2019 at 03:03:42PM -0500, Willem de Bruijn wrote:
> > On Mon, Jan 14, 2019 at 2:12 PM Martijn van Oosterhout wrote:
> > > I see a two ways this could be fixed.
> > >
> > > Option 1: include FLOW_DISSECTOR_KEY_MPLS in
> > > flow_keys_dissector_symmetric but that seems a big assumption, we don't
> > > do that for VLANs for example.
> >
> > This sounds fine to me. Though it will require extra work to make
> > __skb_get_has_symmetric actually use the entropy. And in practice it's
> > not clear that this will result in much entropy.
>
> Ok, I guess this really depends on the kind of traffic. You'd have to
> assume that flows in both directions get the same labels, which may or
> may not be true. (The assumption here is that you want flows to stay
> together, otherwise you may as well use round-robin).

That's a good point, and I guess another reason to not enable this by default.

> > > And there is actually another problem: MPLS provides no information
> > > about the next header because it assumes the endpoints in the network
> > > recognise the MPLS headers. Which means you'd have to make a guess
> > > about what the next layer should be.
> >
> > This is the real issue. I don't think this can be done in general
> > purpose code. The new BPF flow dissector, however, does allow you to
> > deploy a custom dissector in environments where the inner protocol is
> > known.
> >
> >   https://lwn.net/Articles/764200/
>
> Thanks for the reference! Is this code considered mature enough for
> production?

It should be. It only became available in the most recent stable
kernel (4.20), so there may be some kinks to iron out.

> Also, since this is new, I can't find much info on how to use it. But
> if I understand correctly you create a flow dissector in BPF which
> extracts the various parts.  There is one such dissector per namespace,
> it replaces the internal one completely.  You attach it using
> bpftool[1], which loads the program and confiigures where the BPF
> program will store various keys.  We can then make a BPF program which
> works for our traffic, load it and then the hashing code from AF_PACKET
> will start using it instead of the builtin dissector.
>
> Does that sound right?

Yep, pretty much :) Besides the LWN article, the commits themselves
may also be informative. Specifically the one that introduced the
selftests: commit 50b3ed57dee9 ("selftests/bpf: test bpf flow
dissection").

> [1]: https://patchwork.kernel.org/patch/10673201/
>
> Thanks in advance,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > The combine: one man, one day, wheat for half a million loaves of bread.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: AF_PACKET fanout not working on MPLS traffic
  2019-01-14 18:40 AF_PACKET fanout not working on MPLS traffic Martijn van Oosterhout
  2019-01-14 20:03 ` Willem de Bruijn
@ 2019-01-15 17:35 ` Ido Schimmel
  1 sibling, 0 replies; 5+ messages in thread
From: Ido Schimmel @ 2019-01-15 17:35 UTC (permalink / raw)
  To: Martijn van Oosterhout; +Cc: netdev, dsahern

On Mon, Jan 14, 2019 at 07:40:46PM +0100, Martijn van Oosterhout wrote:
> Hi netdev,
> 
> We're running into an issue where incoming traffic for Suricata is not
> being distributed across the workers despite AF_PACKET with fanout
> being used, and it appears to be a kernel issue. Below is a description
> of the problem and possible solution.
> 
> Seen on version kernel 4.19, but the code on 4.20 seem largely
> unchanged.
> 
> When a packet needs to be distributed by fanout it calls
> net/packet/af_packet.c:fanout_demux_hash which in turns calls
> net/core/flow_dissector.c:__skb_get_hash_symmetric which in turn calls
> net/core/flow_dissector.c:__skb_flow_dissect. However, if you look at
> the code that parses MPLS traffic it looks like so:
> 
> --- snip ---
> net/core/flow_dissector.c:1023
>         case htons(ETH_P_MPLS_UC):
>         case htons(ETH_P_MPLS_MC):
>                 fdret = __skb_flow_dissect_mpls(skb, flow_dissector,
>                                                 target_container, data,
>                                                 nhoff, hlen);
>                 break;
> --- snip ---
> 
> What's going on here is that the dissector goes to extract the MPLS
> flow information and then stops (it returns either GOOD or BAD here).
> However because flow_keys_dissector_symmetric does not include
> FLOW_DISSECTOR_KEY_MPLS no information is extracted at all, with the
> result that the hash is always the same for every packet.
> 
> I see a two ways this could be fixed.
> 
> Option 1: include FLOW_DISSECTOR_KEY_MPLS in
> flow_keys_dissector_symmetric but that seems a big assumption, we don't
> do that for VLANs for example.
> 
> Option 2: Teach the dissector to, in the case where there is an MPLS
> header that is not for entropy, to skip the MPLS header(s) and continue
> the dissection on the IP headers that come after the MPLS header.
> 
> I think option 2 seems to me the right approach, however the dissector
> (AFAICT) is used extensively from many places in the kernel so I'd like
> some confirmation before spending too much time on it. It seems like it
> could lead to an unexpected performance impact on systems using MPLS.
> Or perhaps there is something else going on I missed.
> 
> And there is actually another problem: MPLS provides no information
> about the next header because it assumes the endpoints in the network
> recognise the MPLS headers. Which means you'd have to make a guess
> about what the next layer should be.

You might want to check mpls_multipath_hash(). I think it could be
converted to use the flow dissector like IPv4 & IPv6

> 
> Any ideas? Is this the right approach?
> 
> Thanks in advance,
> -- 
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > The combine: one man, one day, wheat for half a million loaves of bread.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-15 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-14 18:40 AF_PACKET fanout not working on MPLS traffic Martijn van Oosterhout
2019-01-14 20:03 ` Willem de Bruijn
2019-01-15  8:22   ` Martijn van Oosterhout
2019-01-15 16:06     ` Willem de Bruijn
2019-01-15 17:35 ` Ido Schimmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.