* [net v1] fib_rules: interface group matching
@ 2016-09-14 12:40 Vincent Bernat
2016-09-14 12:43 ` Vincent Bernat
2016-09-14 14:15 ` David Ahern
0 siblings, 2 replies; 8+ messages in thread
From: Vincent Bernat @ 2016-09-14 12:40 UTC (permalink / raw)
To: David S. Miller, Nicolas Dichtel, David Ahern, Wilson Kok, netdev
Cc: Vincent Bernat
When a user wants to assign a routing table to a group of incoming
interfaces, the current solutions are:
- one IP rule for each interface (scalability problems)
- use of fwmark and devgroup matcher (don't work with internal route
lookups, used for example by RPF)
- use of VRF devices (more complex)
Each interface can be assigned to a numeric group using IFLA_GROUP. This
commit enables a user to reference such a group into an IP rule. Here is
an example of output of iproute2:
$ ip rule show
0: from all lookup local
32764: from all iifgroup 2 lookup 2
32765: from all iifgroup 1 lookup 1
32766: from all lookup main
32767: from all lookup default
Signed-off-by: Vincent Bernat <vincent@bernat.im>
---
include/net/fib_rules.h | 6 ++++-
include/uapi/linux/fib_rules.h | 2 ++
net/core/fib_rules.c | 57 +++++++++++++++++++++++++++++++++++++++++-
3 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 456e4a6006ab..a96b186ccd02 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -28,6 +28,8 @@ struct fib_rule {
u32 pref;
int suppress_ifgroup;
int suppress_prefixlen;
+ int iifgroup;
+ int oifgroup;
char iifname[IFNAMSIZ];
char oifname[IFNAMSIZ];
struct rcu_head rcu;
@@ -92,7 +94,9 @@ struct fib_rules_ops {
[FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \
[FRA_SUPPRESS_IFGROUP] = { .type = NLA_U32 }, \
[FRA_GOTO] = { .type = NLA_U32 }, \
- [FRA_L3MDEV] = { .type = NLA_U8 }
+ [FRA_L3MDEV] = { .type = NLA_U8 }, \
+ [FRA_IIFGROUP] = { .type = NLA_U32 }, \
+ [FRA_OIFGROUP] = { .type = NLA_U32 }
static inline void fib_rule_get(struct fib_rule *rule)
{
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 14404b3ebb89..0bf5a5e94d9a 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -51,6 +51,8 @@ enum {
FRA_OIFNAME,
FRA_PAD,
FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
+ FRA_IIFGROUP, /* interface group */
+ FRA_OIFGROUP,
__FRA_MAX
};
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index be4629c344a6..f8ed6ba85c72 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -37,6 +37,9 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
r->suppress_prefixlen = -1;
r->suppress_ifgroup = -1;
+ r->iifgroup = -1;
+ r->oifgroup = -1;
+
/* The lock is not required here, the list in unreacheable
* at the moment this function is called */
list_add_tail(&r->list, &ops->rules_list);
@@ -193,6 +196,30 @@ static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops,
if (rule->l3mdev && !l3mdev_fib_rule_match(rule->fr_net, fl, arg))
goto out;
+ if (rule->iifgroup != -1) {
+ struct net_device *dev;
+
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_iif);
+ if (!dev || dev->group != rule->iifgroup) {
+ rcu_read_unlock();
+ goto out;
+ }
+ rcu_read_unlock();
+ }
+
+ if (rule->oifgroup != -1) {
+ struct net_device *dev;
+
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_oif);
+ if (!dev || dev->group != rule->oifgroup) {
+ rcu_read_unlock();
+ goto out;
+ }
+ rcu_read_unlock();
+ }
+
ret = ops->match(rule, fl, flags);
out:
return (rule->flags & FIB_RULE_INVERT) ? !ret : ret;
@@ -305,6 +332,12 @@ static int rule_exists(struct fib_rules_ops *ops, struct fib_rule_hdr *frh,
if (r->l3mdev != rule->l3mdev)
continue;
+ if (r->iifgroup != rule->iifgroup)
+ continue;
+
+ if (r->oifgroup != rule->oifgroup)
+ continue;
+
if (!ops->compare(r, frh, tb))
continue;
return 1;
@@ -391,6 +424,16 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh)
goto errout_free;
}
+ if (tb[FRA_IIFGROUP])
+ rule->iifgroup = nla_get_u32(tb[FRA_IIFGROUP]);
+ else
+ rule->iifgroup = -1;
+
+ if (tb[FRA_OIFGROUP])
+ rule->oifgroup = nla_get_u32(tb[FRA_OIFGROUP]);
+ else
+ rule->oifgroup = -1;
+
rule->action = frh->action;
rule->flags = frh->flags;
rule->table = frh_get_table(frh, tb);
@@ -552,6 +595,14 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh)
(rule->l3mdev != nla_get_u8(tb[FRA_L3MDEV])))
continue;
+ if (tb[FRA_IIFGROUP] &&
+ (rule->iifgroup != nla_get_u32(tb[FRA_IIFGROUP])))
+ continue;
+
+ if (tb[FRA_OIFGROUP] &&
+ (rule->oifgroup != nla_get_u32(tb[FRA_OIFGROUP])))
+ continue;
+
if (!ops->compare(rule, frh, tb))
continue;
@@ -679,7 +730,11 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct fib_rule *rule,
(rule->tun_id &&
nla_put_be64(skb, FRA_TUN_ID, rule->tun_id, FRA_PAD)) ||
(rule->l3mdev &&
- nla_put_u8(skb, FRA_L3MDEV, rule->l3mdev)))
+ nla_put_u8(skb, FRA_L3MDEV, rule->l3mdev)) ||
+ ((rule->iifgroup != -1) &&
+ nla_put_u32(skb, FRA_IIFGROUP, rule->iifgroup)) ||
+ ((rule->oifgroup != -1) &&
+ nla_put_u32(skb, FRA_OIFGROUP, rule->oifgroup)))
goto nla_put_failure;
if (rule->suppress_ifgroup != -1) {
--
2.9.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 12:40 [net v1] fib_rules: interface group matching Vincent Bernat
@ 2016-09-14 12:43 ` Vincent Bernat
2016-09-14 14:15 ` David Ahern
1 sibling, 0 replies; 8+ messages in thread
From: Vincent Bernat @ 2016-09-14 12:43 UTC (permalink / raw)
To: David S. Miller; +Cc: Nicolas Dichtel, David Ahern, Wilson Kok, netdev
❦ 14 septembre 2016 14:40 CEST, Vincent Bernat <vincent@bernat.im> :
> Each interface can be assigned to a numeric group using IFLA_GROUP. This
> commit enables a user to reference such a group into an IP rule. Here is
> an example of output of iproute2:
>
> $ ip rule show
> 0: from all lookup local
> 32764: from all iifgroup 2 lookup 2
> 32765: from all iifgroup 1 lookup 1
> 32766: from all lookup main
> 32767: from all lookup default
The patch for iproute2 is available here (didn't post it inline to avoid
confuse patchwork):
http://paste.debian.net/821247/
--
Say what you mean, simply and directly.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 12:40 [net v1] fib_rules: interface group matching Vincent Bernat
2016-09-14 12:43 ` Vincent Bernat
@ 2016-09-14 14:15 ` David Ahern
2016-09-14 14:25 ` Vincent Bernat
1 sibling, 1 reply; 8+ messages in thread
From: David Ahern @ 2016-09-14 14:15 UTC (permalink / raw)
To: Vincent Bernat, David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
On 9/14/16 6:40 AM, Vincent Bernat wrote:
> When a user wants to assign a routing table to a group of incoming
> interfaces, the current solutions are:
>
> - one IP rule for each interface (scalability problems)
> - use of fwmark and devgroup matcher (don't work with internal route
> lookups, used for example by RPF)
> - use of VRF devices (more complex)
Why do you believe that? A VRF is a formalized grouping of interfaces that includes an API for locally generated traffic to specify which VRF/group to use. And, with the l3mdev rule you only need 1 rule for all VRFs regardless of the number which is the best solution to the scalability problem of adding rules per device/group/VRF.
What use case are trying to solve?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 14:15 ` David Ahern
@ 2016-09-14 14:25 ` Vincent Bernat
2016-09-14 14:39 ` David Ahern
0 siblings, 1 reply; 8+ messages in thread
From: Vincent Bernat @ 2016-09-14 14:25 UTC (permalink / raw)
To: David Ahern; +Cc: David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
❦ 14 septembre 2016 16:15 CEST, David Ahern <dsa@cumulusnetworks.com> :
>> When a user wants to assign a routing table to a group of incoming
>> interfaces, the current solutions are:
>>
>> - one IP rule for each interface (scalability problems)
>> - use of fwmark and devgroup matcher (don't work with internal route
>> lookups, used for example by RPF)
>> - use of VRF devices (more complex)
>
> Why do you believe that? A VRF is a formalized grouping of interfaces
> that includes an API for locally generated traffic to specify which
> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
> all VRFs regardless of the number which is the best solution to the
> scalability problem of adding rules per device/group/VRF.
>
> What use case are trying to solve?
Local processes have to be made aware of the VRF by binding to the
pseudo-device. Some processes may be tricked by LD_PRELOAD but some
won't (like stuff written in Go). Maybe I should just find a better way
to bind a process to a VRF without its cooperation.
--
Instrument your programs. Measure before making "efficiency" changes.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 14:25 ` Vincent Bernat
@ 2016-09-14 14:39 ` David Ahern
2016-09-14 15:14 ` Vincent Bernat
0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2016-09-14 14:39 UTC (permalink / raw)
To: Vincent Bernat; +Cc: David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
On 9/14/16 8:25 AM, Vincent Bernat wrote:
> ❦ 14 septembre 2016 16:15 CEST, David Ahern <dsa@cumulusnetworks.com> :
>
>>> When a user wants to assign a routing table to a group of incoming
>>> interfaces, the current solutions are:
>>>
>>> - one IP rule for each interface (scalability problems)
>>> - use of fwmark and devgroup matcher (don't work with internal route
>>> lookups, used for example by RPF)
>>> - use of VRF devices (more complex)
>>
>> Why do you believe that? A VRF is a formalized grouping of interfaces
>> that includes an API for locally generated traffic to specify which
>> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
>> all VRFs regardless of the number which is the best solution to the
>> scalability problem of adding rules per device/group/VRF.
>>
>> What use case are trying to solve?
>
> Local processes have to be made aware of the VRF by binding to the
> pseudo-device. Some processes may be tricked by LD_PRELOAD but some
> won't (like stuff written in Go). Maybe I should just find a better way
> to bind a process to a VRF without its cooperation.
>
What API are you using for interface groups? How does an app tell the kernel to use interface group 1 versus group 2?
LD_PRELOAD and overloading socket is an ad-hoc hack at best with many holes - as you have found.
We (Cumulus Linux) are using this cgroups patch:
http://www.mail-archive.com/netdev@vger.kernel.org/msg93408.html
I want something formal like the cgroups patch or even the first idea of adding a default sk_bound_dev_if to the task struct:
https://github.com/dsahern/linux/commit/b3e5ccc291505c8a503edb20ea2c2b5e86bed96f
Parent-to-child inheritance of the setting is a requirement as is the setting getting applied to all IPv4/v6 sockets without action by the process itself.
Still have some work to do to get a solution into the kernel.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 14:39 ` David Ahern
@ 2016-09-14 15:14 ` Vincent Bernat
2016-09-14 15:25 ` David Ahern
0 siblings, 1 reply; 8+ messages in thread
From: Vincent Bernat @ 2016-09-14 15:14 UTC (permalink / raw)
To: David Ahern; +Cc: David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
❦ 14 septembre 2016 16:39 CEST, David Ahern <dsa@cumulusnetworks.com> :
>>>> When a user wants to assign a routing table to a group of incoming
>>>> interfaces, the current solutions are:
>>>>
>>>> - one IP rule for each interface (scalability problems)
>>>> - use of fwmark and devgroup matcher (don't work with internal route
>>>> lookups, used for example by RPF)
>>>> - use of VRF devices (more complex)
>>>
>>> Why do you believe that? A VRF is a formalized grouping of interfaces
>>> that includes an API for locally generated traffic to specify which
>>> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
>>> all VRFs regardless of the number which is the best solution to the
>>> scalability problem of adding rules per device/group/VRF.
>>>
>>> What use case are trying to solve?
>>
>> Local processes have to be made aware of the VRF by binding to the
>> pseudo-device. Some processes may be tricked by LD_PRELOAD but some
>> won't (like stuff written in Go). Maybe I should just find a better way
>> to bind a process to a VRF without its cooperation.
>
> What API are you using for interface groups? How does an app tell the
> kernel to use interface group 1 versus group 2?
In my testbed, I have only one local application which is dnsmasq as a
DHCP server. It sends back the answer to the physical interface (with
sendmsg() and auxillary data). So it makes my argument a bit moot as the
situation is in fact worse without VRF. :-/
My testbed is here (with use of VRF, more recent commits just use plain
ip rules):
https://github.com/vincentbernat/network-lab/blob/d86e9ed658863ef0f51d7b853d0dc9f8b7427b21/lab-l3-hyperv/setup
I could just give more time to VRF. I also had some concerns over
performance with the way Netfilter integration is done, but I understand
that I could just stay away from POSTROUTING rules which is the only
hook executed twice?
--
All things that are, are with more spirit chased than enjoyed.
-- Shakespeare, "Merchant of Venice"
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 15:14 ` Vincent Bernat
@ 2016-09-14 15:25 ` David Ahern
2016-09-14 16:01 ` Vincent Bernat
0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2016-09-14 15:25 UTC (permalink / raw)
To: Vincent Bernat; +Cc: David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
On 9/14/16 9:14 AM, Vincent Bernat wrote:
> I could just give more time to VRF. I also had some concerns over
> performance with the way Netfilter integration is done, but I understand
> that I could just stay away from POSTROUTING rules which is the only
> hook executed twice?
>
With the changes that were committed this past weekend, the VRF code is now setup where I can set a flag on a per VRF basis to disable the extra rx and tx processing - ie., no network taps, no netfilter, no qdisc, etc. Drops the overhead of VRF to ~3% maybe a bit less. I need to think about the user api a bit more and formalize the patch. Given my other commitments that probably won't happen until mid-October. But in terms of a building block, the overhead of VRF is continuing to drop.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [net v1] fib_rules: interface group matching
2016-09-14 15:25 ` David Ahern
@ 2016-09-14 16:01 ` Vincent Bernat
0 siblings, 0 replies; 8+ messages in thread
From: Vincent Bernat @ 2016-09-14 16:01 UTC (permalink / raw)
To: David Ahern; +Cc: David S. Miller, Nicolas Dichtel, Wilson Kok, netdev
❦ 14 septembre 2016 17:25 CEST, David Ahern <dsa@cumulusnetworks.com> :
>> I could just give more time to VRF. I also had some concerns over
>> performance with the way Netfilter integration is done, but I understand
>> that I could just stay away from POSTROUTING rules which is the only
>> hook executed twice?
> With the changes that were committed this past weekend, the VRF code
> is now setup where I can set a flag on a per VRF basis to disable the
> extra rx and tx processing - ie., no network taps, no netfilter, no
> qdisc, etc. Drops the overhead of VRF to ~3% maybe a bit less. I need
> to think about the user api a bit more and formalize the patch. Given
> my other commitments that probably won't happen until mid-October. But
> in terms of a building block, the overhead of VRF is continuing to
> drop.
Fine by me. We can drop my patch.
Thanks!
--
Program defensively.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-14 16:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 12:40 [net v1] fib_rules: interface group matching Vincent Bernat
2016-09-14 12:43 ` Vincent Bernat
2016-09-14 14:15 ` David Ahern
2016-09-14 14:25 ` Vincent Bernat
2016-09-14 14:39 ` David Ahern
2016-09-14 15:14 ` Vincent Bernat
2016-09-14 15:25 ` David Ahern
2016-09-14 16:01 ` Vincent Bernat
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.