All of lore.kernel.org
 help / color / mirror / Atom feed
* "Forwarding" from TC classifier
@ 2020-05-13 16:40 Lorenz Bauer
  2020-05-13 17:48 ` David Ahern
  2020-05-13 21:23 ` David Ahern
  0 siblings, 2 replies; 10+ messages in thread
From: Lorenz Bauer @ 2020-05-13 16:40 UTC (permalink / raw)
  To: bpf, Networking, David Ahern, Martynas Pumputis, kernel-team

We've recently open sourced a key component of our L4 load balancer:
cls_redirect [1].
In the commit description, I call out the following caveat:

    cls_redirect relies on receiving encapsulated packets directly
from a router. This is
    because we don't have access to the neighbour tables from BPF, yet.

The code in question lives in forward_to_next_hop() [2], and does the following:
1. Swap source and destination MAC of the packet
2. Update source and destination IP address
3. Transmit the packet via bpf_redirect(skb->ifindex, 0)

Really, I'd like to get rid of step 1, and instead rely on the network
stack to switch or route
the packet for me. The bpf_fib_lookup helper is very close to what I need. I've
hacked around a bit, and come up with the following replacement for step 1:

    switch (bpf_fib_lookup(skb, &fib, sizeof(fib), 0)) {
    case BPF_FIB_LKUP_RET_SUCCESS:
        /* There is a cached neighbour, bpf_redirect without going
through the stack. */
        return bpf_redirect(...);

    case BPF_FIB_LKUP_RET_NO_NEIGH:
        /* We have no information about this target. Let the stack handle it. */
        return TC_ACT_OK;

    case BPF_FIB_LKUP_RET_FWD_DISABLED:
        return TC_ACT_SHOT;

    default:
        return TC_ACT_SHOT;
    }

I have a couple of questions:

First, I think I can get BPF_FIB_LKUP_RET_NO_NEIGH if the packet needs
to be routed,
but there is no neighbour entry for the default gateway. Is that correct?

Second, is it possible to originate the packet from the local machine,
instead of keeping
the original source address when passing the packet to the stack on NO_NEIGH?
This is what I get with my current approach:

  IP (tos 0x0, ttl 64, id 25769, offset 0, flags [DF], proto UDP (17),
length 124)
      10.42.0.2.37074 > 10.42.0.4.2483: [bad udp cksum 0x14d3 ->
0x3c0d!] UDP, length 96
  IP (tos 0x0, ttl 63, id 25769, offset 0, flags [DF], proto UDP (17),
length 124)
      10.42.0.2.37074 > 10.42.0.3.2483: [no cksum] UDP, length 96
  IP (tos 0x0, ttl 64, id 51342, offset 0, flags [none], proto ICMP
(1), length 84)
      10.42.0.3 > 10.42.0.2: ICMP echo reply, id 33779, seq 0, length 64

The first and second packet are using our custom GUE header, they
contain an ICMP echo request. Packet three contains the answer to the
request. As you can see, the second packet keeps the 10.42.0.2 source
address instead of using 10.42.0.4.

Third, what effect does BPF_FIB_LOOKUP_OUTPUT have? Seems like I should set it,
but I get somewhat sensible results without it as well. Same for LOOKUP_DIRECT.

1: https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
2: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/test_cls_redirect.c#n509

--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-13 16:40 "Forwarding" from TC classifier Lorenz Bauer
@ 2020-05-13 17:48 ` David Ahern
  2020-05-14 15:41   ` Lorenz Bauer
  2020-05-13 21:23 ` David Ahern
  1 sibling, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-05-13 17:48 UTC (permalink / raw)
  To: Lorenz Bauer, bpf, Networking, Martynas Pumputis, kernel-team

On 5/13/20 10:40 AM, Lorenz Bauer wrote:
> We've recently open sourced a key component of our L4 load balancer:
> cls_redirect [1].
> In the commit description, I call out the following caveat:
> 
>     cls_redirect relies on receiving encapsulated packets directly
> from a router. This is
>     because we don't have access to the neighbour tables from BPF, yet.

Can you explain more about this limitation? Why does access to neighbor
tables solve the problem?

> 
> The code in question lives in forward_to_next_hop() [2], and does the following:
> 1. Swap source and destination MAC of the packet
> 2. Update source and destination IP address
> 3. Transmit the packet via bpf_redirect(skb->ifindex, 0)
> 
> Really, I'd like to get rid of step 1, and instead rely on the network
> stack to switch or route
> the packet for me. The bpf_fib_lookup helper is very close to what I need. I've
> hacked around a bit, and come up with the following replacement for step 1:
> 
>     switch (bpf_fib_lookup(skb, &fib, sizeof(fib), 0)) {
>     case BPF_FIB_LKUP_RET_SUCCESS:
>         /* There is a cached neighbour, bpf_redirect without going
> through the stack. */
>         return bpf_redirect(...);
> 
>     case BPF_FIB_LKUP_RET_NO_NEIGH:
>         /* We have no information about this target. Let the stack handle it. */
>         return TC_ACT_OK;
> 
>     case BPF_FIB_LKUP_RET_FWD_DISABLED:
>         return TC_ACT_SHOT;
> 
>     default:
>         return TC_ACT_SHOT;
>     }
> 
> I have a couple of questions:
> 
> First, I think I can get BPF_FIB_LKUP_RET_NO_NEIGH if the packet needs
> to be routed,
> but there is no neighbour entry for the default gateway. Is that correct?

Correct.

> 
> Second, is it possible to originate the packet from the local machine,
> instead of keeping
> the original source address when passing the packet to the stack on NO_NEIGH?

Network address or MAC address? Swapping the network address is not a
usual part of routing a packet so I presume you mean mac but just making
sure. Swapping mac addresses should be done for all routed packets.

> This is what I get with my current approach:
> 
>   IP (tos 0x0, ttl 64, id 25769, offset 0, flags [DF], proto UDP (17),
> length 124)
>       10.42.0.2.37074 > 10.42.0.4.2483: [bad udp cksum 0x14d3 ->
> 0x3c0d!] UDP, length 96
>   IP (tos 0x0, ttl 63, id 25769, offset 0, flags [DF], proto UDP (17),
> length 124)
>       10.42.0.2.37074 > 10.42.0.3.2483: [no cksum] UDP, length 96
>   IP (tos 0x0, ttl 64, id 51342, offset 0, flags [none], proto ICMP
> (1), length 84)
>       10.42.0.3 > 10.42.0.2: ICMP echo reply, id 33779, seq 0, length 64
> 
> The first and second packet are using our custom GUE header, they
> contain an ICMP echo request. Packet three contains the answer to the
> request. As you can see, the second packet keeps the 10.42.0.2 source
> address instead of using 10.42.0.4.
> 
> Third, what effect does BPF_FIB_LOOKUP_OUTPUT have? Seems like I should set it,
> but I get somewhat sensible results without it as well. Same for LOOKUP_DIRECT.

BPF_FIB_LOOKUP_OUTPUT affects the flow parameters passed to the FIB lookup:
        if (flags & BPF_FIB_LOOKUP_OUTPUT) {
                fl4.flowi4_iif = 1;
                fl4.flowi4_oif = params->ifindex;
        } else {
                fl4.flowi4_iif = params->ifindex;
                fl4.flowi4_oif = 0;
        }

iif / oif set can have an influence on the FIB lookup result - e.g., FIB
rules directing the lookup to a table or requiring the lookup result to
use the specified device.

Usually, 'output' is for locally generated traffic headed out. XDP
programs run on ingress are from an Rx perspective and do the lookup
from the perspective of 'is this forwarded or locally delivered'.

BPF_FIB_LOOKUP_DIRECT is really  only useful for complex FIB setups -
e.g., VRF. It means skip the FIB rules and go direct to the table
associated with the device.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-13 16:40 "Forwarding" from TC classifier Lorenz Bauer
  2020-05-13 17:48 ` David Ahern
@ 2020-05-13 21:23 ` David Ahern
  2020-05-14 15:41   ` Lorenz Bauer
  1 sibling, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-05-13 21:23 UTC (permalink / raw)
  To: Lorenz Bauer, bpf, Networking, Martynas Pumputis, kernel-team

On 5/13/20 10:40 AM, Lorenz Bauer wrote:
> Really, I'd like to get rid of step 1, and instead rely on the network
> stack to switch or route
> the packet for me. The bpf_fib_lookup helper is very close to what I need. I've
> hacked around a bit, and come up with the following replacement for step 1:
> 
>     switch (bpf_fib_lookup(skb, &fib, sizeof(fib), 0)) {
>     case BPF_FIB_LKUP_RET_SUCCESS:
>         /* There is a cached neighbour, bpf_redirect without going
> through the stack. */
>         return bpf_redirect(...);

BTW, as shown in samples/bpf/xdp_fwd_kern.c, you have a bit more work to
do for proper L3 forwarding:

        if (rc == BPF_FIB_LKUP_RET_SUCCESS) {
		...
                if (h_proto == htons(ETH_P_IP))
                        ip_decrease_ttl(iph);
                else if (h_proto == htons(ETH_P_IPV6))
                        ip6h->hop_limit--;

                memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
                memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
                return bpf_redirect_map(&xdp_tx_ports,
fib_params.ifindex, 0);

The ttl / hoplimit decrements assumed you checked it earlier to be > 1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-13 17:48 ` David Ahern
@ 2020-05-14 15:41   ` Lorenz Bauer
  2020-05-14 18:54     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenz Bauer @ 2020-05-14 15:41 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On Wed, 13 May 2020 at 18:48, David Ahern <dsahern@gmail.com> wrote:
>
> On 5/13/20 10:40 AM, Lorenz Bauer wrote:
> > We've recently open sourced a key component of our L4 load balancer:
> > cls_redirect [1].
> > In the commit description, I call out the following caveat:
> >
> >     cls_redirect relies on receiving encapsulated packets directly
> > from a router. This is
> >     because we don't have access to the neighbour tables from BPF, yet.
>
> Can you explain more about this limitation? Why does access to neighbor
> tables solve the problem?

We want to forward the packet to another machine, based on an IP address
stored in our custom encapsulation header.
If we always receive packets from a router we can plug in the new IP, swap
the MAC and send the packet back to the router. Inefficient, but it means we
don't have to deal with MAC addresses ourselves.

I think I use the wrong terminology, sorry. By "access to the neighbour table"
I mean being able to go from IP to MAC address.

>
> >
> > The code in question lives in forward_to_next_hop() [2], and does the following:
> > 1. Swap source and destination MAC of the packet
> > 2. Update source and destination IP address
> > 3. Transmit the packet via bpf_redirect(skb->ifindex, 0)
> >
> > Really, I'd like to get rid of step 1, and instead rely on the network
> > stack to switch or route
> > the packet for me. The bpf_fib_lookup helper is very close to what I need. I've
> > hacked around a bit, and come up with the following replacement for step 1:
> >
> >     switch (bpf_fib_lookup(skb, &fib, sizeof(fib), 0)) {
> >     case BPF_FIB_LKUP_RET_SUCCESS:
> >         /* There is a cached neighbour, bpf_redirect without going
> > through the stack. */
> >         return bpf_redirect(...);
> >
> >     case BPF_FIB_LKUP_RET_NO_NEIGH:
> >         /* We have no information about this target. Let the stack handle it. */
> >         return TC_ACT_OK;
> >
> >     case BPF_FIB_LKUP_RET_FWD_DISABLED:
> >         return TC_ACT_SHOT;
> >
> >     default:
> >         return TC_ACT_SHOT;
> >     }
> >
> > I have a couple of questions:
> >
> > First, I think I can get BPF_FIB_LKUP_RET_NO_NEIGH if the packet needs
> > to be routed,
> > but there is no neighbour entry for the default gateway. Is that correct?
>
> Correct.
>
> >
> > Second, is it possible to originate the packet from the local machine,
> > instead of keeping
> > the original source address when passing the packet to the stack on NO_NEIGH?
>
> Network address or MAC address? Swapping the network address is not a
> usual part of routing a packet so I presume you mean mac but just making
> sure. Swapping mac addresses should be done for all routed packets.

No, I'd like to do network address swapping. The code already swaps MAC.
Basically, I'd like to pretend that I'm outputting a new packet.

Just setting the source network address and then doing TC_ACT_OK doesn't
work due to sysctl accept_local=0.

>
> > This is what I get with my current approach:
> >
> >   IP (tos 0x0, ttl 64, id 25769, offset 0, flags [DF], proto UDP (17),
> > length 124)
> >       10.42.0.2.37074 > 10.42.0.4.2483: [bad udp cksum 0x14d3 ->
> > 0x3c0d!] UDP, length 96
> >   IP (tos 0x0, ttl 63, id 25769, offset 0, flags [DF], proto UDP (17),
> > length 124)
> >       10.42.0.2.37074 > 10.42.0.3.2483: [no cksum] UDP, length 96
> >   IP (tos 0x0, ttl 64, id 51342, offset 0, flags [none], proto ICMP
> > (1), length 84)
> >       10.42.0.3 > 10.42.0.2: ICMP echo reply, id 33779, seq 0, length 64
> >
> > The first and second packet are using our custom GUE header, they
> > contain an ICMP echo request. Packet three contains the answer to the
> > request. As you can see, the second packet keeps the 10.42.0.2 source
> > address instead of using 10.42.0.4.
> >
> > Third, what effect does BPF_FIB_LOOKUP_OUTPUT have? Seems like I should set it,
> > but I get somewhat sensible results without it as well. Same for LOOKUP_DIRECT.
>
> BPF_FIB_LOOKUP_OUTPUT affects the flow parameters passed to the FIB lookup:
>         if (flags & BPF_FIB_LOOKUP_OUTPUT) {
>                 fl4.flowi4_iif = 1;
>                 fl4.flowi4_oif = params->ifindex;
>         } else {
>                 fl4.flowi4_iif = params->ifindex;
>                 fl4.flowi4_oif = 0;
>         }
>
> iif / oif set can have an influence on the FIB lookup result - e.g., FIB
> rules directing the lookup to a table or requiring the lookup result to
> use the specified device.
>
> Usually, 'output' is for locally generated traffic headed out. XDP
> programs run on ingress are from an Rx perspective and do the lookup
> from the perspective of 'is this forwarded or locally delivered'.

What if the XDP encapsulates the packet? At this point I know that I
want to forward it elsewhere. Would that use LOOKUP_OUTPUT?

Thanks!


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-13 21:23 ` David Ahern
@ 2020-05-14 15:41   ` Lorenz Bauer
  0 siblings, 0 replies; 10+ messages in thread
From: Lorenz Bauer @ 2020-05-14 15:41 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On Wed, 13 May 2020 at 22:23, David Ahern <dsahern@gmail.com> wrote:
>
> On 5/13/20 10:40 AM, Lorenz Bauer wrote:
> > Really, I'd like to get rid of step 1, and instead rely on the network
> > stack to switch or route
> > the packet for me. The bpf_fib_lookup helper is very close to what I need. I've
> > hacked around a bit, and come up with the following replacement for step 1:
> >
> >     switch (bpf_fib_lookup(skb, &fib, sizeof(fib), 0)) {
> >     case BPF_FIB_LKUP_RET_SUCCESS:
> >         /* There is a cached neighbour, bpf_redirect without going
> > through the stack. */
> >         return bpf_redirect(...);
>
> BTW, as shown in samples/bpf/xdp_fwd_kern.c, you have a bit more work to
> do for proper L3 forwarding:
>
>         if (rc == BPF_FIB_LKUP_RET_SUCCESS) {
>                 ...
>                 if (h_proto == htons(ETH_P_IP))
>                         ip_decrease_ttl(iph);
>                 else if (h_proto == htons(ETH_P_IPV6))
>                         ip6h->hop_limit--;
>
>                 memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
>                 memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
>                 return bpf_redirect_map(&xdp_tx_ports,
> fib_params.ifindex, 0);
>
> The ttl / hoplimit decrements assumed you checked it earlier to be > 1

Thanks for the pointer :)


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-14 15:41   ` Lorenz Bauer
@ 2020-05-14 18:54     ` David Ahern
  2020-05-15  9:59       ` Lorenz Bauer
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-05-14 18:54 UTC (permalink / raw)
  To: Lorenz Bauer; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On 5/14/20 9:41 AM, Lorenz Bauer wrote:
> On Wed, 13 May 2020 at 18:48, David Ahern <dsahern@gmail.com> wrote:
>>
>> On 5/13/20 10:40 AM, Lorenz Bauer wrote:
>>> We've recently open sourced a key component of our L4 load balancer:
>>> cls_redirect [1].
>>> In the commit description, I call out the following caveat:
>>>
>>>     cls_redirect relies on receiving encapsulated packets directly
>>> from a router. This is
>>>     because we don't have access to the neighbour tables from BPF, yet.
>>
>> Can you explain more about this limitation? Why does access to neighbor
>> tables solve the problem?
> 
> We want to forward the packet to another machine, based on an IP address
> stored in our custom encapsulation header.
> If we always receive packets from a router we can plug in the new IP, swap
> the MAC and send the packet back to the router. Inefficient, but it means we
> don't have to deal with MAC addresses ourselves.

Ok, so swapping source and destination addresses in the IP header, doing
a fib lookup and redirecting to an interface based on the lookup. That
does require a neighbor entry for the dest address. Access to the
neighbor table does not directly solve that problem - if it is not there
for the fib lookup, it won't be there for the straight neigh lookup.

You could let the first packet go up the stack to create and resolve the
neighbor entry. At that point follow on packets will take the fast path.

Alternatively, you can create static entries in the table for known
forwarding addresses or have a process on the server initiate neighbor
resolution for none forwarding addresses.
>>
>> Usually, 'output' is for locally generated traffic headed out. XDP
>> programs run on ingress are from an Rx perspective and do the lookup
>> from the perspective of 'is this forwarded or locally delivered'.
> 
> What if the XDP encapsulates the packet? At this point I know that I
> want to forward it elsewhere. Would that use LOOKUP_OUTPUT?

Yes, if you want the lookup to respond as if it is a locally sent packet
versus a forwarded packet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-14 18:54     ` David Ahern
@ 2020-05-15  9:59       ` Lorenz Bauer
  2020-05-15 14:24         ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenz Bauer @ 2020-05-15  9:59 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On Thu, 14 May 2020 at 19:54, David Ahern <dsahern@gmail.com> wrote:
>
> On 5/14/20 9:41 AM, Lorenz Bauer wrote:
> > On Wed, 13 May 2020 at 18:48, David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 5/13/20 10:40 AM, Lorenz Bauer wrote:
> >>> We've recently open sourced a key component of our L4 load balancer:
> >>> cls_redirect [1].
> >>> In the commit description, I call out the following caveat:
> >>>
> >>>     cls_redirect relies on receiving encapsulated packets directly
> >>> from a router. This is
> >>>     because we don't have access to the neighbour tables from BPF, yet.
> >>
> >> Can you explain more about this limitation? Why does access to neighbor
> >> tables solve the problem?
> >
> > We want to forward the packet to another machine, based on an IP address
> > stored in our custom encapsulation header.
> > If we always receive packets from a router we can plug in the new IP, swap
> > the MAC and send the packet back to the router. Inefficient, but it means we
> > don't have to deal with MAC addresses ourselves.
>
> Ok, so swapping source and destination addresses in the IP header, doing
> a fib lookup and redirecting to an interface based on the lookup. That
> does require a neighbor entry for the dest address. Access to the
> neighbor table does not directly solve that problem - if it is not there
> for the fib lookup, it won't be there for the straight neigh lookup.
>
> You could let the first packet go up the stack to create and resolve the
> neighbor entry. At that point follow on packets will take the fast path.

Yes, but that doesn't play well with changing the source address to
the local machine's, since the upper part of the stack will drop the
packet due to accept_local=0.

For this to work I need to set accept_local=1, which isn't desirable,
or redirect into the output queue of the device, which currently doesn't
trigger neighbour lookup, etc.

To sum it up: fib_lookup enables the fast path, but I don't have a way
to trigger the slow path in the way I want to. Maybe I need to dig into
bpf_redirect some more.

>
> Alternatively, you can create static entries in the table for known
> forwarding addresses or have a process on the server initiate neighbor
> resolution for none forwarding addresses.
> >>
> >> Usually, 'output' is for locally generated traffic headed out. XDP
> >> programs run on ingress are from an Rx perspective and do the lookup
> >> from the perspective of 'is this forwarded or locally delivered'.
> >
> > What if the XDP encapsulates the packet? At this point I know that I
> > want to forward it elsewhere. Would that use LOOKUP_OUTPUT?
>
> Yes, if you want the lookup to respond as if it is a locally sent packet
> versus a forwarded packet.



-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-15  9:59       ` Lorenz Bauer
@ 2020-05-15 14:24         ` David Ahern
  2020-05-18  9:38           ` Lorenz Bauer
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-05-15 14:24 UTC (permalink / raw)
  To: Lorenz Bauer; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On 5/15/20 3:59 AM, Lorenz Bauer wrote:
> 
> Yes, but that doesn't play well with changing the source address to
> the local machine's, since the upper part of the stack will drop the
> packet due to accept_local=0.

Can you defer the source address swap to the Tx path? Let the packet go
up the stack and do the fib lookup again as an skb. neighbor entry does
not exist, so the packet is stashed, neighbor resolution done, once
resolved the packet goes out. tc program on the egress device can flip
the source address, and then subsequent packets take the XDP fast path.

If the next host is on the same LAN I believe the stack will want to
generate an ICMP redirect, but that can be squashed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-15 14:24         ` David Ahern
@ 2020-05-18  9:38           ` Lorenz Bauer
  2020-05-18 14:32             ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenz Bauer @ 2020-05-18  9:38 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On Fri, 15 May 2020 at 15:24, David Ahern <dsahern@gmail.com> wrote:
>
> On 5/15/20 3:59 AM, Lorenz Bauer wrote:
> >
> > Yes, but that doesn't play well with changing the source address to
> > the local machine's, since the upper part of the stack will drop the
> > packet due to accept_local=0.
>
> Can you defer the source address swap to the Tx path? Let the packet go
> up the stack and do the fib lookup again as an skb. neighbor entry does
> not exist, so the packet is stashed, neighbor resolution done, once
> resolved the packet goes out. tc program on the egress device can flip
> the source address, and then subsequent packets take the XDP fast path.

Hm, that's an interesting idea! I guess this means I have to mark the packet
somehow, to make sure I can identify it on the TX path. Plus, in theory
the packet could exit via any interface, so I'd have to attach classifiers to
a bunch of places if I want to be on the safe side.

Upside: this seems doable in current kernels. Downside: seems more fragile
than I'd like.

Thanks for the thought, I'll play around with it :)

>
> If the next host is on the same LAN I believe the stack will want to
> generate an ICMP redirect, but that can be squashed.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "Forwarding" from TC classifier
  2020-05-18  9:38           ` Lorenz Bauer
@ 2020-05-18 14:32             ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2020-05-18 14:32 UTC (permalink / raw)
  To: Lorenz Bauer; +Cc: bpf, Networking, Martynas Pumputis, kernel-team

On 5/18/20 3:38 AM, Lorenz Bauer wrote:
> On Fri, 15 May 2020 at 15:24, David Ahern <dsahern@gmail.com> wrote:
>>
>> On 5/15/20 3:59 AM, Lorenz Bauer wrote:
>>>
>>> Yes, but that doesn't play well with changing the source address to
>>> the local machine's, since the upper part of the stack will drop the
>>> packet due to accept_local=0.
>>
>> Can you defer the source address swap to the Tx path? Let the packet go
>> up the stack and do the fib lookup again as an skb. neighbor entry does
>> not exist, so the packet is stashed, neighbor resolution done, once
>> resolved the packet goes out. tc program on the egress device can flip
>> the source address, and then subsequent packets take the XDP fast path.
> 
> Hm, that's an interesting idea! I guess this means I have to mark the packet
> somehow, to make sure I can identify it on the TX path. Plus, in theory
> the packet could exit via any interface, so I'd have to attach classifiers to
> a bunch of places if I want to be on the safe side.

Shared blocks might save you some overhead. Create a filter block that
is shared across devices.

> 
> Upside: this seems doable in current kernels. Downside: seems more fragile
> than I'd like.
> 
> Thanks for the thought, I'll play around with it :)
> 
>>
>> If the next host is on the same LAN I believe the stack will want to
>> generate an ICMP redirect, but that can be squashed.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-18 14:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 16:40 "Forwarding" from TC classifier Lorenz Bauer
2020-05-13 17:48 ` David Ahern
2020-05-14 15:41   ` Lorenz Bauer
2020-05-14 18:54     ` David Ahern
2020-05-15  9:59       ` Lorenz Bauer
2020-05-15 14:24         ` David Ahern
2020-05-18  9:38           ` Lorenz Bauer
2020-05-18 14:32             ` David Ahern
2020-05-13 21:23 ` David Ahern
2020-05-14 15:41   ` Lorenz Bauer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.