From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: iproute2 mpls max labels Date: Fri, 22 Jul 2016 14:20:31 -0500 Message-ID: <87r3alv5s0.fsf@x220.int.ebiederm.org> References: <578A7BF0.2020107@nordu.net> <57911A26.3080203@cumulusnetworks.com> <8737n23goi.fsf@x220.int.ebiederm.org> <5791BA22.7050309@cumulusnetworks.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Magnus Bergroth , netdev@vger.kernel.org, Robert Shearman To: Roopa Prabhu Return-path: Received: from out03.mta.xmission.com ([166.70.13.233]:48701 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751928AbcGVTdb (ORCPT ); Fri, 22 Jul 2016 15:33:31 -0400 In-Reply-To: <5791BA22.7050309@cumulusnetworks.com> (Roopa Prabhu's message of "Thu, 21 Jul 2016 23:16:02 -0700") Sender: netdev-owner@vger.kernel.org List-ID: Roopa Prabhu writes: > On 7/21/16, 1:00 PM, Eric W. Biederman wrote: >> Roopa Prabhu writes: >> >>> On 7/16/16, 11:24 AM, Magnus Bergroth wrote: >>>> Wanted to use more than the default maximum of 8 mpls labels. Max labels >>>> seems to be hardcode to 8 in two places. >>>> >>>> --- iproute2-4.6.0/lib/utils.c 2016-05-18 20:56:02.000000000 +0200 >>>> +++ iproute2-4.6.0-bergroth/lib/utils.c 2016-07-16 20:12:10.714958071 >>>> +0200 >>>> @@ -476,7 +476,7 @@ >>>> addr->bytelen = 4; >>>> addr->bitlen = 20; >>>> /* How many bytes do I need? */ >>>> - for (i = 0; i < 8; i++) { >>>> + for (i = 0; i < MPLS_MAX_LABELS; i++) { >>>> if (ntohl(addr->data[i]) & MPLS_LS_S_MASK) { >>>> addr->bytelen = (i + 1)*4; >>>> break; >>>> >>>> >>>> --- iproute2-4.6.0/include/utils.h 2016-05-18 20:56:02.000000000 +0200 >>>> +++ iproute2-4.6.0-bergroth/include/utils.h 2016-07-15 >>>> 11:55:57.297681742 +0200 >>>> @@ -54,6 +54,9 @@ >>>> #define NEXT_ARG_FWD() do { argv++; argc--; } while(0) >>>> #define PREV_ARG() do { argv--; argc++; } while(0) >>>> >>>> +/* Maximum number of labels the mpls helpers support */ >>>> +#define MPLS_MAX_LABELS 8 >>>> + >>>> typedef struct >>>> { >>>> __u16 flags; >>>> @@ -61,7 +64,7 @@ >>>> __s16 bitlen; >>>> /* These next two fields match rtvia */ >>>> __u16 family; >>>> - __u32 data[8]; >>>> + __u32 data[MPLS_MAX_LABELS]; >>>> } inet_prefix; >>>> >> This structure is not MPLS specific so that is not appropriate to use >> MPLS_MAX_LABELS when definiting the structure. Likewise changing this >> structure and limiting the changes to mpls parts of the code is not >> appropriate. >> >>>> #define PREFIXLEN_SPECIFIED 1 >>>> @@ -88,9 +91,6 @@ >>>> # define AF_MPLS 28 >>>> #endif >>>> >>>> -/* Maximum number of labels the mpls helpers support */ >>>> -#define MPLS_MAX_LABELS 8 >>>> - >>>> __u32 get_addr32(const char *name); >>>> int get_addr_1(inet_prefix *dst, const char *arg, int family); >>>> int get_prefix_1(inet_prefix *dst, char *arg, int family); >>>> >>>> >>> I did not realize it is hardcoded to 8 in iproute2. Because kernel has a hard coded limit of >>> 2. >>> I think we need to fix it in a few places: >>> a) we should move the kernel #define to a uapi header file which iproute2 can use >>> b) there has been a general ask to bump the kernel MAX_LABELS from 2 and I don't see >>> a problem with it yet. so, we could bump it to 8. >>> >>> Were you planning to post patches for one or both of the above ?. >>> >>> I can post them too. Let me know. >> a) I just looked and the kernel netlink protocol does not have a limit. >> The kernel does have a limit but the netlink protocol does not so >> there is no point in exporting a limit in a uapi header, it will >> just be out of date and wrong. > sure, if you have concerns about making it part of uapi, we can > separately maintain the same limit in iproute2 and kernel. The different tools already have different limits and it is not a problem. The important thing is for the userspace tool to have the larger limit. >> b) I can see in principle bumping up the kernels MAX_LABELS past two >> although I haven't heard those requests, or understand the use cases. >> I don't recall seeing any ducumentation on cases where it is >> desirable to push a lot of labels at once. (Do hardware >> implementations support pushing a lot of labels at once?) > I don't know of any use cases either. But i have received multiple requests > on bumping the current limit of two >> >> Bumping past 8 seems quite a lot. That starts feeling like people >> trying to break other peoples mpls stacks. That is asking for more >> packet space for labels than ipv6 uses for addresses and ipv6 is way >> oversized. The commonly agreed wisdom is the world only needs 40 to >> 48 bits to route on to reach the entire world. >> >> I can completely understand a few specialty labels going beyond what >> is needed for general purpose routing but pushing more that 8 at >> once seems huge. Especially since you can recirculate packets if >> you really need to and push more labels that way. > > I don't think there is an ask for going more than 8. anything greater than > current 2 is good. Except the patch that got all of this started. >> Add to that for a software implementation we have these pesky things >> called cache lines. I can see in the kernel pushing struct >> mpls_route towards the size of a full cacheline. Today we are at 52 >> bytes not counting the via adress. With the via address we are at 56 >> (ipv4), 58 (ethernet), and 60 (ipv6) bytes. Which means in we have >> to make the kernel data structures smarter or we risk messing up the >> performance of the common case. >> >> Also we do need some kind of limit in the kernel to protect against >> insane inputs. >> >> So while I can imagine there are reasonable cases for bumping up the >> maximum number of labels in the kernel I think we need to be smart if >> we ware going to do that. Which probably means we will want a >> __mpls_nh_label helper function. >> > sure, yes, the current static label array works well for the common case > of 2 labels. does it make sense for it to be configurable > with the default being 2 and max something like 8 ? We have two structures both with one byte holes: struct mpls_route { /* next hop label forwarding entry */ struct rcu_head rt_rcu; u8 rt_protocol; u8 rt_payload_type; u8 rt_max_alen; unsigned int rt_nhn; unsigned int rt_nhn_alive; struct mpls_nh rt_nh[0]; }; struct mpls_nh { /* next hop label forwarding entry */ struct net_device __rcu *nh_dev; unsigned int nh_flags; u32 nh_label[MAX_NEW_LABELS]; u8 nh_labels; u8 nh_via_alen; u8 nh_via_table; }; If we were to define them as: struct mpls_route { /* next hop label forwarding entry */ struct rcu_head rt_rcu; u8 rt_protocol; u8 rt_payload_type; u8 rt_max_alen; u8 rt_max_labels; unsigned int rt_nhn; unsigned int rt_nhn_alive; struct mpls_nh rt_nh[0]; }; struct mpls_nh { /* next hop label forwarding entry */ struct net_device __rcu *nh_dev; unsigned int nh_flags; u8 nh_labels; u8 nh_via_alen; u8 nh_via_table; }; static 32 *__mpls_nh_labels(struct mpls_route *rt, struct mpls_nh *nh) { u32 *nh0_labels = PTR_ALIGN((u32 *)&rt->rt_nh[rt->rt_nhn], sizeof(u32)); int nh_index = nh - rt->rt_nh; return nh0_labels + rt->rt_max_labels * nh_index; } static u8 *__mpls_nh_via(struct mpls_route *rt, struct mpls_nh *nh) { u8 *nh0_via = PTR_ALIGN((u8 *)(&rt->rt_nh[rt->rt_nhn] + (sizeof(u32) *rt->max_labels * rt->nhn)), VIA_ALEN_ALIGN); int nh_index = nh - rt->rt_nh; return nh0_via + rt->rt_max_alen * nh_index; } Ugh. I just noticed we have a nasty 4 byte gap in the mpls_route by having both rt_nhn and rt_nhn_alive in there. As rt_nh[0] has pointer alignment. Anyway something like the above should allow us to remove the limit of the number of labels from the implementation and still fit everything in a cache line in the common case, as the change above doesn't take up any extra space in struct mpls_route. Then we just pick a reasonable maximum and set MAX_NEW_LABELS to that. That will change struct mpls_route_config. So we need a small enough value that putting struct mpls_route_config continues to make sense. I propose 8 for MAX_NEW_LABELS after such a change. It looks pretty straighforward on the kernel side. Eric