All of lore.kernel.org
 help / color / mirror / Atom feed
* bpf_fib_lookup support for firewall mark
@ 2021-06-08 22:59 Rumen Telbizov
  2021-06-09  1:21 ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-08 22:59 UTC (permalink / raw)
  To: bpf

Dear BPF list,

I am new to eBPF so go easy on me.
It seems to me that currently eBPF has no support for route table
lookups including firewall marks. The bpf_fib_lookup structure itself
has no mark field as per
https://elixir.bootlin.com/linux/v5.10.28/source/include/uapi/linux/bpf.h#L4864

Additionally bpf_fib_lookup() function does not incorporate the
firewall mark in its route lookup. It explicitly sets it to 0 as per
https://elixir.bootlin.com/linux/v5.10.28/source/net/core/filter.c#L5329
along with other fields which are used during the regular routing
policy database lookup.

Thus lookups from within eBPF and outside of it result in different
outcomes if there are rules directing traffic based on fwmark.
Can you please advise what the rationale for this is or if there
anything that I might be missing.

Let me know if I can provide any further information.

Cheers,
Rumen Telbizov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-08 22:59 bpf_fib_lookup support for firewall mark Rumen Telbizov
@ 2021-06-09  1:21 ` David Ahern
  2021-06-09 18:30   ` Rumen Telbizov
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2021-06-09  1:21 UTC (permalink / raw)
  To: Rumen Telbizov, bpf

On 6/8/21 4:59 PM, Rumen Telbizov wrote:
> Dear BPF list,
> 
> I am new to eBPF so go easy on me.
> It seems to me that currently eBPF has no support for route table
> lookups including firewall marks. The bpf_fib_lookup structure itself
> has no mark field as per
> https://elixir.bootlin.com/linux/v5.10.28/source/include/uapi/linux/bpf.h#L4864
> 
> Additionally bpf_fib_lookup() function does not incorporate the
> firewall mark in its route lookup. It explicitly sets it to 0 as per
> https://elixir.bootlin.com/linux/v5.10.28/source/net/core/filter.c#L5329
> along with other fields which are used during the regular routing
> policy database lookup.
> 
> Thus lookups from within eBPF and outside of it result in different
> outcomes if there are rules directing traffic based on fwmark.
> Can you please advise what the rationale for this is or if there
> anything that I might be missing.
> 
> Let me know if I can provide any further information.
> 

The API (struct bpf_fib_lookup) is constrained to 64B for performance.
It is not possible to support all of the policy routing options that
Linux has in 64B. Choices had to be made.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-09  1:21 ` David Ahern
@ 2021-06-09 18:30   ` Rumen Telbizov
  2021-06-09 21:56     ` Rumen Telbizov
  0 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-09 18:30 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf

Hi David,

Thanks for the quick response. I appreciate it.
A couple of quick follow up questions:
1. Do you have any performance data that would indicate how much of a
performance drop adding an extra 4 or 8 bytes to the structure would
cause?
2. If I patch locally the structure in libc and the kernel by adding
an extra _u32 mark member is there anything that such a modification
would break?

Regards,
Rumen Telbizov


On Tue, Jun 8, 2021 at 6:21 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 6/8/21 4:59 PM, Rumen Telbizov wrote:
> > Dear BPF list,
> >
> > I am new to eBPF so go easy on me.
> > It seems to me that currently eBPF has no support for route table
> > lookups including firewall marks. The bpf_fib_lookup structure itself
> > has no mark field as per
> > https://elixir.bootlin.com/linux/v5.10.28/source/include/uapi/linux/bpf.h#L4864
> >
> > Additionally bpf_fib_lookup() function does not incorporate the
> > firewall mark in its route lookup. It explicitly sets it to 0 as per
> > https://elixir.bootlin.com/linux/v5.10.28/source/net/core/filter.c#L5329
> > along with other fields which are used during the regular routing
> > policy database lookup.
> >
> > Thus lookups from within eBPF and outside of it result in different
> > outcomes if there are rules directing traffic based on fwmark.
> > Can you please advise what the rationale for this is or if there
> > anything that I might be missing.
> >
> > Let me know if I can provide any further information.
> >
>
> The API (struct bpf_fib_lookup) is constrained to 64B for performance.
> It is not possible to support all of the policy routing options that
> Linux has in 64B. Choices had to be made.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-09 18:30   ` Rumen Telbizov
@ 2021-06-09 21:56     ` Rumen Telbizov
  2021-06-09 22:08       ` Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-09 21:56 UTC (permalink / raw)
  To: David Ahern; +Cc: bpf

List,

For what it's worth I patched the structure locally by introducing a
new __u32 mark field
to the structure and adding the proper assignment of the field in
filter.c. Recompiled without any issues.
With that patch a bpf lookup matches ip rule that contains fwmark.

Still interested to know how much of a performance penalty adding an 4
bytes to the
structure brings. I'd certainly vote for adding at least the firewall
mark to the set of fields used in the lookup.

Cheers

On Wed, Jun 9, 2021 at 11:30 AM Rumen Telbizov
<rumen.telbizov@menlosecurity.com> wrote:
>
> Hi David,
>
> Thanks for the quick response. I appreciate it.
> A couple of quick follow up questions:
> 1. Do you have any performance data that would indicate how much of a
> performance drop adding an extra 4 or 8 bytes to the structure would
> cause?
> 2. If I patch locally the structure in libc and the kernel by adding
> an extra _u32 mark member is there anything that such a modification
> would break?
>
> Regards,
> Rumen Telbizov
>
>
> On Tue, Jun 8, 2021 at 6:21 PM David Ahern <dsahern@gmail.com> wrote:
> >
> > On 6/8/21 4:59 PM, Rumen Telbizov wrote:
> > > Dear BPF list,
> > >
> > > I am new to eBPF so go easy on me.
> > > It seems to me that currently eBPF has no support for route table
> > > lookups including firewall marks. The bpf_fib_lookup structure itself
> > > has no mark field as per
> > > https://elixir.bootlin.com/linux/v5.10.28/source/include/uapi/linux/bpf.h#L4864
> > >
> > > Additionally bpf_fib_lookup() function does not incorporate the
> > > firewall mark in its route lookup. It explicitly sets it to 0 as per
> > > https://elixir.bootlin.com/linux/v5.10.28/source/net/core/filter.c#L5329
> > > along with other fields which are used during the regular routing
> > > policy database lookup.
> > >
> > > Thus lookups from within eBPF and outside of it result in different
> > > outcomes if there are rules directing traffic based on fwmark.
> > > Can you please advise what the rationale for this is or if there
> > > anything that I might be missing.
> > >
> > > Let me know if I can provide any further information.
> > >
> >
> > The API (struct bpf_fib_lookup) is constrained to 64B for performance.
> > It is not possible to support all of the policy routing options that
> > Linux has in 64B. Choices had to be made.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-09 21:56     ` Rumen Telbizov
@ 2021-06-09 22:08       ` Daniel Borkmann
  2021-06-10 14:37         ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2021-06-09 22:08 UTC (permalink / raw)
  To: Rumen Telbizov, David Ahern; +Cc: bpf

Hi Rumen, hi David,

(please avoid top-posting)

On 6/9/21 11:56 PM, Rumen Telbizov wrote:
> List,
> 
> For what it's worth I patched the structure locally by introducing a
> new __u32 mark field
> to the structure and adding the proper assignment of the field in
> filter.c. Recompiled without any issues.
> With that patch a bpf lookup matches ip rule that contains fwmark.
> 
> Still interested to know how much of a performance penalty adding an 4
> bytes to the
> structure brings. I'd certainly vote for adding at least the firewall
> mark to the set of fields used in the lookup.

I agree with David here that performance of the helper is paramount.
As a side-note, we should probably add a build_bug_on() to ensure that
the size of struct bpf_fib_lookup will stay at 64b / one cacheline.

That said, given h_vlan_proto/h_vlan_TCI are both output parameters,
maybe we could just union the two fields with a __u32 mark extension
that we then transfer into the flowi{4,6}?

Thanks,
Daniel

> On Wed, Jun 9, 2021 at 11:30 AM Rumen Telbizov
> <rumen.telbizov@menlosecurity.com> wrote:
>>
>> Hi David,
>>
>> Thanks for the quick response. I appreciate it.
>> A couple of quick follow up questions:
>> 1. Do you have any performance data that would indicate how much of a
>> performance drop adding an extra 4 or 8 bytes to the structure would
>> cause?
>> 2. If I patch locally the structure in libc and the kernel by adding
>> an extra _u32 mark member is there anything that such a modification
>> would break?
>>
>> Regards,
>> Rumen Telbizov
>>
>>
>> On Tue, Jun 8, 2021 at 6:21 PM David Ahern <dsahern@gmail.com> wrote:
>>>
>>> On 6/8/21 4:59 PM, Rumen Telbizov wrote:
>>>> Dear BPF list,
>>>>
>>>> I am new to eBPF so go easy on me.
>>>> It seems to me that currently eBPF has no support for route table
>>>> lookups including firewall marks. The bpf_fib_lookup structure itself
>>>> has no mark field as per
>>>> https://elixir.bootlin.com/linux/v5.10.28/source/include/uapi/linux/bpf.h#L4864
>>>>
>>>> Additionally bpf_fib_lookup() function does not incorporate the
>>>> firewall mark in its route lookup. It explicitly sets it to 0 as per
>>>> https://elixir.bootlin.com/linux/v5.10.28/source/net/core/filter.c#L5329
>>>> along with other fields which are used during the regular routing
>>>> policy database lookup.
>>>>
>>>> Thus lookups from within eBPF and outside of it result in different
>>>> outcomes if there are rules directing traffic based on fwmark.
>>>> Can you please advise what the rationale for this is or if there
>>>> anything that I might be missing.
>>>>
>>>> Let me know if I can provide any further information.
>>>>
>>>
>>> The API (struct bpf_fib_lookup) is constrained to 64B for performance.
>>> It is not possible to support all of the policy routing options that
>>> Linux has in 64B. Choices had to be made.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-09 22:08       ` Daniel Borkmann
@ 2021-06-10 14:37         ` David Ahern
  2021-06-10 17:41           ` Rumen Telbizov
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2021-06-10 14:37 UTC (permalink / raw)
  To: Daniel Borkmann, Rumen Telbizov; +Cc: bpf, Jesper Dangaard Brouer

On 6/9/21 4:08 PM, Daniel Borkmann wrote:
> Hi Rumen, hi David,
> 
> (please avoid top-posting)
> 
> On 6/9/21 11:56 PM, Rumen Telbizov wrote:
>> List,
>>
>> For what it's worth I patched the structure locally by introducing a
>> new __u32 mark field
>> to the structure and adding the proper assignment of the field in
>> filter.c. Recompiled without any issues.
>> With that patch a bpf lookup matches ip rule that contains fwmark.
>>
>> Still interested to know how much of a performance penalty adding an 4
>> bytes to the
>> structure brings. I'd certainly vote for adding at least the firewall
>> mark to the set of fields used in the lookup.
> 
> I agree with David here that performance of the helper is paramount.
> As a side-note, we should probably add a build_bug_on() to ensure that
> the size of struct bpf_fib_lookup will stay at 64b / one cacheline.

that's the key point on performance - crossing a cacheline. I do not
have performance data at hand, but it is a substantial hit. That is why
the struct is so overloaded (and complicated for a uapi) with the input
vs output setting.

Presumably you are parsing the packet to id a flow to find the mark that
should be used with the FIB lookup. correct? Hardware hints will make
that part easier whenever that feature actually lands and if the NIC can
add the mark.

> 
> That said, given h_vlan_proto/h_vlan_TCI are both output parameters,
> maybe we could just union the two fields with a __u32 mark extension
> that we then transfer into the flowi{4,6}?

That is one option.

I would go for a union on sport and/or dport. It is a fair tradeoff to
request users to pick one - policy routing based on L4 ports or fwmark.
A bit harder to do with a straight up union at this point, but we could
also limit the supported fwmark to 16-bits. Hard choices have to be made.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-10 14:37         ` David Ahern
@ 2021-06-10 17:41           ` Rumen Telbizov
  2021-06-10 18:52             ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-10 17:41 UTC (permalink / raw)
  To: David Ahern; +Cc: Daniel Borkmann, bpf, Jesper Dangaard Brouer

> that's the key point on performance - crossing a cacheline. I do not
> have performance data at hand, but it is a substantial hit. That is why
> the struct is so overloaded (and complicated for a uapi) with the input
> vs output setting.

Makes perfect sense now. Thanks for clarifying David and Daniel.

> Presumably you are parsing the packet to id a flow to find the mark that
> should be used with the FIB lookup. correct?

Let me briefly present my high-level use case here to give more colour.
What I am building is an overlay network based on geneve. I have multiple
sites, each of which is going to be represented by a separate routing table.
The selection of the destination site (routing table) is based on firewall marks
and the original packet is preserved intact, encapsulated in geneve. I have a
TC/eBPF program running on the geneve interface which has to query the
appropriate routing table based on the firewall mark and use the
returned next hop
as the tunnel key in the skb. Also worth mentioning is that those routing tables
contain multiple (default) routes as I use ECMP to balance traffic/provide HA
between sites,

> > That said, given h_vlan_proto/h_vlan_TCI are both output parameters,
> > maybe we could just union the two fields with a __u32 mark extension
> > that we then transfer into the flowi{4,6}?
>
> That is one option.
>
> I would go for a union on sport and/or dport. It is a fair tradeoff to
> request users to pick one - policy routing based on L4 ports or fwmark.
> A bit harder to do with a straight up union at this point, but we could
> also limit the supported fwmark to 16-bits. Hard choices have to be made.

A couple of comments on those two options: if the union is between the ports
and the mark then a user of the function would have to choose between
src+dst port or the mark in lookup, correct? If so wouldn't that
result in a loss
of the ability to use multipathing - since the hashing would be static? In my
case that would certainly be another significant drawback.

Having said that, what Daniel suggests looks very interesting to me.
If I understand
it correctly there are 32 bits in h_vlan_proto+h_vlan_TCI that are used only for
output today so if they are merged in a union with a 32 bit mark then we'd stay
at 64B structure and we can pass a full 32 bit mark.

So something like this?
union {
    /* input */
    __u32 mark;

    /* output */
    __be16 h_vlan_proto;
    __be16 h_vlan_TCI;
}

Moreover, there are 12 extra bytes used only as output for the smac/dmac.
If the above works then maybe this opens up the opportunity to incorporate
even more input parameters that way?

Thank you once again for your time and suggestions.

Regards,
Rumen Telbizov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-10 17:41           ` Rumen Telbizov
@ 2021-06-10 18:52             ` David Ahern
  2021-06-10 20:58               ` Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2021-06-10 18:52 UTC (permalink / raw)
  To: Rumen Telbizov; +Cc: Daniel Borkmann, bpf, Jesper Dangaard Brouer

On 6/10/21 11:41 AM, Rumen Telbizov wrote:
>> that's the key point on performance - crossing a cacheline. I do not
>> have performance data at hand, but it is a substantial hit. That is why
>> the struct is so overloaded (and complicated for a uapi) with the input
>> vs output setting.
> 
> Makes perfect sense now. Thanks for clarifying David and Daniel.
> 
>> Presumably you are parsing the packet to id a flow to find the mark that
>> should be used with the FIB lookup. correct?
> 
> Let me briefly present my high-level use case here to give more colour.
> What I am building is an overlay network based on geneve. I have multiple
> sites, each of which is going to be represented by a separate routing table.
> The selection of the destination site (routing table) is based on firewall marks

To show my bias here - VRF is better than firewall marks for selecting
routing tables. :-)


> and the original packet is preserved intact, encapsulated in geneve. I have a
> TC/eBPF program running on the geneve interface which has to query the
> appropriate routing table based on the firewall mark and use the
> returned next hop
> as the tunnel key in the skb. Also worth mentioning is that those routing tables
> contain multiple (default) routes as I use ECMP to balance traffic/provide HA
> between sites,

thanks for explaining the use case

> 
>>> That said, given h_vlan_proto/h_vlan_TCI are both output parameters,
>>> maybe we could just union the two fields with a __u32 mark extension
>>> that we then transfer into the flowi{4,6}?
>>
>> That is one option.
>>
>> I would go for a union on sport and/or dport. It is a fair tradeoff to
>> request users to pick one - policy routing based on L4 ports or fwmark.
>> A bit harder to do with a straight up union at this point, but we could
>> also limit the supported fwmark to 16-bits. Hard choices have to be made.
> 
> A couple of comments on those two options: if the union is between the ports
> and the mark then a user of the function would have to choose between
> src+dst port or the mark in lookup, correct? If so wouldn't that
> result in a loss
> of the ability to use multipathing - since the hashing would be static? In my
> case that would certainly be another significant drawback.

yes, good point.

> 
> Having said that, what Daniel suggests looks very interesting to me.
> If I understand
> it correctly there are 32 bits in h_vlan_proto+h_vlan_TCI that are used only for
> output today so if they are merged in a union with a 32 bit mark then we'd stay
> at 64B structure and we can pass a full 32 bit mark.
> 
> So something like this?
> union {
>     /* input */
>     __u32 mark;
> 
>     /* output */
>     __be16 h_vlan_proto;
>     __be16 h_vlan_TCI;
> }

I think more like this:

	union {
		/* input: fwmark to use in lookup */
		__u32 fwmark;

		/* output: vlan information if egress is on a vlan */
		struct {
			__be16  h_vlan_proto;
			__be16  h_vlan_TCI;
		};
	};

But, I do not think the vlan data should be overloaded right now. We
still have an open design issue around supporting vlans on ingress
(XDP). One option is to allow the lookup to take the vlan as an input,
have the the bpf helper lookup the vlan device that goes with the
{device index, vlan} pair and use that as the input device. If we
overload the vlan_TCI with fwmark that prohibits this option.

> 
> Moreover, there are 12 extra bytes used only as output for the smac/dmac.
> If the above works then maybe this opens up the opportunity to incorporate
> even more input parameters that way?
> 

I think that's going to be tricky since the macs are 6-byte arrays.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-10 18:52             ` David Ahern
@ 2021-06-10 20:58               ` Daniel Borkmann
  2021-06-11  1:41                 ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2021-06-10 20:58 UTC (permalink / raw)
  To: David Ahern, Rumen Telbizov; +Cc: bpf, Jesper Dangaard Brouer

On 6/10/21 8:52 PM, David Ahern wrote:
> On 6/10/21 11:41 AM, Rumen Telbizov wrote:
>>> that's the key point on performance - crossing a cacheline. I do not
>>> have performance data at hand, but it is a substantial hit. That is why
>>> the struct is so overloaded (and complicated for a uapi) with the input
>>> vs output setting.
>>
>> Makes perfect sense now. Thanks for clarifying David and Daniel.
>>
>>> Presumably you are parsing the packet to id a flow to find the mark that
>>> should be used with the FIB lookup. correct?
>>
>> Let me briefly present my high-level use case here to give more colour.
>> What I am building is an overlay network based on geneve. I have multiple
>> sites, each of which is going to be represented by a separate routing table.
>> The selection of the destination site (routing table) is based on firewall marks
> 
> To show my bias here - VRF is better than firewall marks for selecting
> routing tables. :-)
> 
>> and the original packet is preserved intact, encapsulated in geneve. I have a
>> TC/eBPF program running on the geneve interface which has to query the
>> appropriate routing table based on the firewall mark and use the
>> returned next hop
>> as the tunnel key in the skb. Also worth mentioning is that those routing tables
>> contain multiple (default) routes as I use ECMP to balance traffic/provide HA
>> between sites,
> 
> thanks for explaining the use case

+1

>>>> That said, given h_vlan_proto/h_vlan_TCI are both output parameters,
>>>> maybe we could just union the two fields with a __u32 mark extension
>>>> that we then transfer into the flowi{4,6}?
>>>
>>> That is one option.
>>>
>>> I would go for a union on sport and/or dport. It is a fair tradeoff to
>>> request users to pick one - policy routing based on L4 ports or fwmark.
>>> A bit harder to do with a straight up union at this point, but we could
>>> also limit the supported fwmark to 16-bits. Hard choices have to be made.
>>
>> A couple of comments on those two options: if the union is between the ports
>> and the mark then a user of the function would have to choose between
>> src+dst port or the mark in lookup, correct? If so wouldn't that
>> result in a loss
>> of the ability to use multipathing - since the hashing would be static? In my
>> case that would certainly be another significant drawback.
> 
> yes, good point.
> 
>> Having said that, what Daniel suggests looks very interesting to me.
>> If I understand
>> it correctly there are 32 bits in h_vlan_proto+h_vlan_TCI that are used only for
>> output today so if they are merged in a union with a 32 bit mark then we'd stay
>> at 64B structure and we can pass a full 32 bit mark.
>>
>> So something like this?
>> union {
>>      /* input */
>>      __u32 mark;
>>
>>      /* output */
>>      __be16 h_vlan_proto;
>>      __be16 h_vlan_TCI;
>> }
> 
> I think more like this:
> 
> 	union {
> 		/* input: fwmark to use in lookup */
> 		__u32 fwmark;
> 
> 		/* output: vlan information if egress is on a vlan */
> 		struct {
> 			__be16  h_vlan_proto;
> 			__be16  h_vlan_TCI;
> 		};
> 	};

Agree.

> But, I do not think the vlan data should be overloaded right now. We
> still have an open design issue around supporting vlans on ingress
> (XDP). One option is to allow the lookup to take the vlan as an input,
> have the the bpf helper lookup the vlan device that goes with the
> {device index, vlan} pair and use that as the input device. If we
> overload the vlan_TCI with fwmark that prohibits this option.

I guess it's not overly pretty, but if all things break down and there's no other
unused space, wouldn't it work if we opt into the vlan as input (instead of mark)
in future via flag?

>> Moreover, there are 12 extra bytes used only as output for the smac/dmac.
>> If the above works then maybe this opens up the opportunity to incorporate
>> even more input parameters that way?
> 
> I think that's going to be tricky since the macs are 6-byte arrays.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-10 20:58               ` Daniel Borkmann
@ 2021-06-11  1:41                 ` David Ahern
  2021-06-11 17:32                   ` Rumen Telbizov
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2021-06-11  1:41 UTC (permalink / raw)
  To: Daniel Borkmann, Rumen Telbizov; +Cc: bpf, Jesper Dangaard Brouer

On 6/10/21 2:58 PM, Daniel Borkmann wrote:
>> But, I do not think the vlan data should be overloaded right now. We
>> still have an open design issue around supporting vlans on ingress
>> (XDP). One option is to allow the lookup to take the vlan as an input,
>> have the the bpf helper lookup the vlan device that goes with the
>> {device index, vlan} pair and use that as the input device. If we
>> overload the vlan_TCI with fwmark that prohibits this option.
> 
> I guess it's not overly pretty, but if all things break down and there's
> no other
> unused space, wouldn't it work if we opt into the vlan as input (instead
> of mark)
> in future via flag?
> 
>>> Moreover, there are 12 extra bytes used only as output for the
>>> smac/dmac.
>>> If the above works then maybe this opens up the opportunity to
>>> incorporate
>>> even more input parameters that way?
>>
>> I think that's going to be tricky since the macs are 6-byte arrays.

This should work (whitespace damaged on paste). It preserves the vlan
for potential later use, and we have 2 more 4-byte holes. Untested, not
even compiled. Can you try it out?

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 418b9b813d65..476bc81f3d04 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5954,8 +5954,20 @@ struct bpf_fib_lookup {
        /* output */
        __be16  h_vlan_proto;
        __be16  h_vlan_TCI;
-       __u8    smac[6];     /* ETH_ALEN */
-       __u8    dmac[6];     /* ETH_ALEN */
+
+       union {
+               /* input */
+               struct {
+                       __u32 fwmark;
+                       /* 2 4-byte holes for input */
+               };
+
+               /* output: source and dest mac */
+               struct {
+                       __u8    smac[6];     /* ETH_ALEN */
+                       __u8    dmac[6];     /* ETH_ALEN */
+               };
+       };
 };

 struct bpf_redir_neigh {

diff --git a/net/core/filter.c b/net/core/filter.c
index 239de1306de9..a9b4fd2a6657 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5303,6 +5303,7 @@ static int bpf_ipv4_fib_lookup(struct net *net,
struct bpf_fib_lookup *params,
        fl4.saddr = params->ipv4_src;
        fl4.fl4_sport = params->sport;
        fl4.fl4_dport = params->dport;
+       fl4.flowi4_mark = params->fwmark;
        fl4.flowi4_multipath_hash = 0;

        if (flags & BPF_FIB_LOOKUP_DIRECT) {
@@ -5315,7 +5316,6 @@ static int bpf_ipv4_fib_lookup(struct net *net,
struct bpf_fib_lookup *params,

                err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
        } else {
-               fl4.flowi4_mark = 0;
                fl4.flowi4_secid = 0;
                fl4.flowi4_tun_key.tun_id = 0;
                fl4.flowi4_uid = sock_net_uid(net, NULL);
@@ -5429,6 +5429,7 @@ static int bpf_ipv6_fib_lookup(struct net *net,
struct bpf_fib_lookup *params,
        fl6.saddr = *src;
        fl6.fl6_sport = params->sport;
        fl6.fl6_dport = params->dport;
+       fl6.flowi6_mark = params->fwmark;

        if (flags & BPF_FIB_LOOKUP_DIRECT) {
                u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
@@ -5441,7 +5442,6 @@ static int bpf_ipv6_fib_lookup(struct net *net,
struct bpf_fib_lookup *params,
                err = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, &res,
                                                   strict);
        } else {
-               fl6.flowi6_mark = 0;
                fl6.flowi6_secid = 0;
                fl6.flowi6_tun_key.tun_id = 0;
                fl6.flowi6_uid = sock_net_uid(net, NULL);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-11  1:41                 ` David Ahern
@ 2021-06-11 17:32                   ` Rumen Telbizov
       [not found]                     ` <CA+FoirA-eAfux3PfxjgyO=--7duWCKuyeJfxWTdW6jiMWzShTw@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-11 17:32 UTC (permalink / raw)
  To: David Ahern; +Cc: Daniel Borkmann, bpf, Jesper Dangaard Brouer

Hi David,

Thanks a lot for the proposed patch.
I see you're overloading the mac addresses and thus this leaves us
with even more space
I will give it a try and will let you know.

Cheers,
Rumen Telbizov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
       [not found]                     ` <CA+FoirA-eAfux3PfxjgyO=--7duWCKuyeJfxWTdW6jiMWzShTw@mail.gmail.com>
@ 2021-06-12  0:00                       ` Rumen Telbizov
  2021-06-12  1:13                       ` David Ahern
  2021-06-29 17:18                       ` Rumen Telbizov
  2 siblings, 0 replies; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-12  0:00 UTC (permalink / raw)
  To: bpf

Hey David,

I just tested your proposed patch and I want to confirm that it works
great for my use-case.
I get the same rule-selected route based on firewall mark as the
outside ip route get ... mark ...
Also since we didn't touch the ports, multipathing works as expected -
gives a different route
based on the hash of the socket tuple.

So having said that I can't help but wonder what the next steps might be.

Are you able/willing to incorporate this work in the kernel?
If yes I also wonder if it would be possible to backport it into 5.10 branch?

Thank you and Daniel for looking into this. I appreciate your efforts.

Cheers,
Rumen Telbizov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
       [not found]                     ` <CA+FoirA-eAfux3PfxjgyO=--7duWCKuyeJfxWTdW6jiMWzShTw@mail.gmail.com>
  2021-06-12  0:00                       ` Rumen Telbizov
@ 2021-06-12  1:13                       ` David Ahern
  2021-06-14 16:53                         ` Rumen Telbizov
  2021-06-29 17:18                       ` Rumen Telbizov
  2 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2021-06-12  1:13 UTC (permalink / raw)
  To: Rumen Telbizov; +Cc: Daniel Borkmann, bpf

On 6/11/21 5:56 PM, Rumen Telbizov wrote:
> Hey David,
> 
> I just tested your proposed patch and I want to confirm that it works
> great for my use-case.
> I get the same rule-selected route based on firewall mark as the outside
> ip route get ... mark ...
> Also since we didn't touch the ports, multipathing works as expected -
> gives a different route
> based on the hash of the socket tuple.

great.

> 
> So having said that I can't help but wonder what the next steps might be.
> 
> Are you able/willing to incorporate this work in the kernel?

I do not have the time or hardware at the moment to do the testing - or
create the selftest for it. If you do have the time for the selftest, we
can tag team it.


> If yes I also wonder if it would be possible to backport it into 5.10
> branch?
> 

5.10 backports are done out of tree, so you are on your own there.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-12  1:13                       ` David Ahern
@ 2021-06-14 16:53                         ` Rumen Telbizov
  0 siblings, 0 replies; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-14 16:53 UTC (permalink / raw)
  To: David Ahern; +Cc: Daniel Borkmann, bpf

> I do not have the time or hardware at the moment to do the testing - or
> create the selftest for it. If you do have the time for the selftest, we
> can tag team it.

Sure I can definitely do what I can but you'll need to walk me
through, since I am not sure what I is needed.
Can you point me in the right direction?

Cheers,
Rumen Telbizov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
       [not found]                     ` <CA+FoirA-eAfux3PfxjgyO=--7duWCKuyeJfxWTdW6jiMWzShTw@mail.gmail.com>
  2021-06-12  0:00                       ` Rumen Telbizov
  2021-06-12  1:13                       ` David Ahern
@ 2021-06-29 17:18                       ` Rumen Telbizov
  2021-06-29 17:21                         ` Greg KH
  2 siblings, 1 reply; 16+ messages in thread
From: Rumen Telbizov @ 2021-06-29 17:18 UTC (permalink / raw)
  To: bpf; +Cc: Daniel Borkmann, David Ahern, Jesper Dangaard Brouer

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Daniel, BPF list,

Over the last week or so David Ahern and I worked on a patchset that solves
the problem discussed here along with a self-test.

Attached here is a patchset of 3 files which covers the following:

[PATCH 1/3] bpf: Add support for mark with bpf_fib_lookup
[PATCH 2/3] tools: Update bpf header
[PATCH 3/3] selftests: Add selftests for fwmark support in bpf_fib_lookup

I have tested those against a very recent clone of the latest 5.13-rc7.
Self-test results look like this:

----
# ./test_bpf_fib_lookup.sh
[test_bpf_fib_lookup.sh] START
- Running test_egress_ipv4_fwmark
  * mark 0: PASS
  * mark 2: PASS
- Running test_egress_ipv6_fwmark
  * mark 0: PASS
  * mark 2: PASS
[test_bpf_fib_lookup.sh] PASS: 4 -- FAIL: 0
# echo $?
0
----

Let me know what you think and if there's anything else needed to incorporate
the patchset into the kernel as well as what you think the next steps should be.

Regards,
Rumen Telbizov

[-- Attachment #2: 0002-tools-Update-bpf-header.patch --]
[-- Type: application/x-patch, Size: 1121 bytes --]

[-- Attachment #3: 0003-selftests-Add-selftests-for-fwmark-support-in-bpf_fi.patch --]
[-- Type: application/x-patch, Size: 10687 bytes --]

[-- Attachment #4: 0001-bpf-Add-support-for-mark-with-bpf_fib_lookup.patch --]
[-- Type: application/x-patch, Size: 2832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bpf_fib_lookup support for firewall mark
  2021-06-29 17:18                       ` Rumen Telbizov
@ 2021-06-29 17:21                         ` Greg KH
  0 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2021-06-29 17:21 UTC (permalink / raw)
  To: Rumen Telbizov; +Cc: bpf, Daniel Borkmann, David Ahern, Jesper Dangaard Brouer

On Tue, Jun 29, 2021 at 10:18:14AM -0700, Rumen Telbizov wrote:
> Daniel, BPF list,
> 
> Over the last week or so David Ahern and I worked on a patchset that solves
> the problem discussed here along with a self-test.
> 
> Attached here is a patchset of 3 files which covers the following:
> 
> [PATCH 1/3] bpf: Add support for mark with bpf_fib_lookup
> [PATCH 2/3] tools: Update bpf header
> [PATCH 3/3] selftests: Add selftests for fwmark support in bpf_fib_lookup
> 
> I have tested those against a very recent clone of the latest 5.13-rc7.
> Self-test results look like this:
> 
> ----
> # ./test_bpf_fib_lookup.sh
> [test_bpf_fib_lookup.sh] START
> - Running test_egress_ipv4_fwmark
>   * mark 0: PASS
>   * mark 2: PASS
> - Running test_egress_ipv6_fwmark
>   * mark 0: PASS
>   * mark 2: PASS
> [test_bpf_fib_lookup.sh] PASS: 4 -- FAIL: 0
> # echo $?
> 0
> ----
> 
> Let me know what you think and if there's anything else needed to incorporate
> the patchset into the kernel as well as what you think the next steps should be.

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch was attached, please place it inline so that it can be
  applied directly from the email message itself.

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-29 17:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 22:59 bpf_fib_lookup support for firewall mark Rumen Telbizov
2021-06-09  1:21 ` David Ahern
2021-06-09 18:30   ` Rumen Telbizov
2021-06-09 21:56     ` Rumen Telbizov
2021-06-09 22:08       ` Daniel Borkmann
2021-06-10 14:37         ` David Ahern
2021-06-10 17:41           ` Rumen Telbizov
2021-06-10 18:52             ` David Ahern
2021-06-10 20:58               ` Daniel Borkmann
2021-06-11  1:41                 ` David Ahern
2021-06-11 17:32                   ` Rumen Telbizov
     [not found]                     ` <CA+FoirA-eAfux3PfxjgyO=--7duWCKuyeJfxWTdW6jiMWzShTw@mail.gmail.com>
2021-06-12  0:00                       ` Rumen Telbizov
2021-06-12  1:13                       ` David Ahern
2021-06-14 16:53                         ` Rumen Telbizov
2021-06-29 17:18                       ` Rumen Telbizov
2021-06-29 17:21                         ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.