All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roopa Prabhu <roopa@cumulusnetworks.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Magnus Bergroth <bergroth@nordu.net>,
	netdev@vger.kernel.org, Robert Shearman <rshearma@brocade.com>,
	olivier.dugeon@orange.com
Subject: Re: iproute2 mpls max labels
Date: Sat, 23 Jul 2016 16:03:21 -0700	[thread overview]
Message-ID: <5793F7B9.5060603@cumulusnetworks.com> (raw)
In-Reply-To: <87r3alv5s0.fsf@x220.int.ebiederm.org>

On 7/22/16, 12:20 PM, Eric W. Biederman wrote:
> Roopa Prabhu <roopa@cumulusnetworks.com> writes:
>
>> On 7/21/16, 1:00 PM, Eric W. Biederman wrote:
>>> Roopa Prabhu <roopa@cumulusnetworks.com> writes:
>>>
>>>

[snip]
>>> I did not realize it is hardcoded to 8 in iproute2. Because kernel has a hard coded limit of
>>> 2.
>>> I think we need to fix it in a few places:
>>> a) we should move the kernel #define to a uapi header file which iproute2 can use
>>> b) there has been a general ask to bump the kernel MAX_LABELS from 2 and I don't see
>>> a problem with it yet. so, we could bump it to 8.
>>>
>>> Were you planning to post patches for one or both of the above ?.
>>>
>>> I can post them too. Let me know.
>>> a) I just looked and the kernel netlink protocol does not have a limit.
>>>    The kernel does have a limit but the netlink protocol does not  so
>>>    there is no point in exporting a limit in a uapi header,  it will
>>>    just be out of date and wrong.
>> sure, if you have concerns about making it part of uapi, we can
>> separately maintain the same limit in iproute2 and kernel.
> The different tools already have different limits and it is not
> a problem.  The important thing is for the userspace tool to have
> the larger limit.
>>> b) I can see in principle bumping up the kernels MAX_LABELS past two
>>>    although I haven't heard those requests, or understand the use cases.
>>>    I don't recall seeing any ducumentation on cases where it is
>>>    desirable to push a lot of labels at once.  (Do hardware
>>>    implementations support pushing a lot of labels at once?)
>> I don't know of any use cases either. But i have received multiple requests
>> on bumping the current limit of two
>>>    Bumping past 8 seems quite a lot.  That starts feeling like people
>>>    trying to break other peoples mpls stacks.  That is asking for more
>>>    packet space for labels than ipv6 uses for addresses and ipv6 is way
>>>    oversized.  The commonly agreed wisdom is the world only needs 40 to
>>>    48 bits to route on to reach the entire world.  
>>>
>>>    I can completely understand a few specialty labels going beyond what
>>>    is needed for general purpose routing but pushing more that 8 at
>>>    once seems huge.  Especially since you can recirculate packets if
>>>    you really need to and push more labels that way.
>> I don't think there is an ask for going more than 8. anything greater than
>> current 2 is good.
> Except the patch that got all of this started.

ok, missed that. yesterday I also received some info on a segment routing use-case
where there is an ongoing study which is currently leaning towards a max label stack
depth of 17.

>
>>>    Add to that for a software implementation we have these pesky things
>>>    called cache lines.  I can see in the kernel pushing struct
>>>    mpls_route towards the size of a full cacheline.  Today we are at 52
>>>    bytes not counting the via adress.  With the via address we are at 56
>>>    (ipv4), 58 (ethernet), and 60 (ipv6) bytes.  Which means in we have
>>>    to make the kernel data structures smarter or we risk messing up the
>>>    performance of the common case.
>>>
>>>    Also we do need some kind of limit in the kernel to protect against
>>>    insane inputs.
>>>    
>>>    So while I can imagine there are reasonable cases for bumping up the
>>>    maximum number of labels in the kernel I think we need to be smart if
>>>    we ware going to do that.  Which probably means we will want a
>>>    __mpls_nh_label helper function.
>>>
>> sure, yes, the current static label array works well for the common case
>> of 2 labels. does it make sense for it to be configurable
>> with the default being 2 and max something like 8 ?
> We have two structures both with one byte holes:
> struct mpls_route { /* next hop label forwarding entry */
> 	struct rcu_head		rt_rcu;
> 	u8			rt_protocol;
> 	u8			rt_payload_type;
> 	u8			rt_max_alen;
> 	unsigned int		rt_nhn;
> 	unsigned int		rt_nhn_alive;
> 	struct mpls_nh		rt_nh[0];
> };
>
> struct mpls_nh { /* next hop label forwarding entry */
> 	struct net_device __rcu *nh_dev;
> 	unsigned int		nh_flags;
> 	u32			nh_label[MAX_NEW_LABELS];
> 	u8			nh_labels;
> 	u8			nh_via_alen;
> 	u8			nh_via_table;
> };
>
> If we were to define them as:
> struct mpls_route { /* next hop label forwarding entry */
> 	struct rcu_head		rt_rcu;
> 	u8			rt_protocol;
> 	u8			rt_payload_type;
> 	u8			rt_max_alen;
>         u8			rt_max_labels;
> 	unsigned int		rt_nhn;
> 	unsigned int		rt_nhn_alive;
> 	struct mpls_nh		rt_nh[0];
> };
>
> struct mpls_nh { /* next hop label forwarding entry */
> 	struct net_device __rcu *nh_dev;
> 	unsigned int		nh_flags;
> 	u8			nh_labels;
> 	u8			nh_via_alen;
> 	u8			nh_via_table;
> };
>
> static 32 *__mpls_nh_labels(struct mpls_route *rt, struct mpls_nh *nh)
> {
> 	u32 *nh0_labels = PTR_ALIGN((u32 *)&rt->rt_nh[rt->rt_nhn], sizeof(u32));
> 	int nh_index = nh - rt->rt_nh;
>
> 	return nh0_labels + rt->rt_max_labels * nh_index;
> }
>
> static u8 *__mpls_nh_via(struct mpls_route *rt, struct mpls_nh *nh)
> {
> 	u8 *nh0_via = PTR_ALIGN((u8 *)(&rt->rt_nh[rt->rt_nhn] + (sizeof(u32) *rt->max_labels * rt->nhn)), VIA_ALEN_ALIGN);
> 	int nh_index = nh - rt->rt_nh;
>
> 	return nh0_via + rt->rt_max_alen * nh_index;
> }
>
> Ugh.  I just noticed we have a nasty 4 byte gap in the mpls_route by
> having both rt_nhn and rt_nhn_alive in there.  As rt_nh[0] has pointer
> alignment.
>
> Anyway something like the above should allow us to remove the limit
> of the number of labels from the implementation and still fit everything
> in a cache line in the common case, as the change above doesn't take up
> any extra space in struct mpls_route.
>
> Then we just pick a reasonable maximum and set MAX_NEW_LABELS to that.
> That will change struct mpls_route_config.  So we need a small enough
> value that putting struct mpls_route_config continues to make sense.
> I propose 8 for MAX_NEW_LABELS after such a change.
>
> It looks pretty straighforward on the kernel side.
I like it. It follows how via is handled today and I agree seems like the best way to
represent varying number of labels without affecting the common case.

thanks for the suggestion.

      reply	other threads:[~2016-07-23 23:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-16 18:24 iproute2 mpls max labels Magnus Bergroth
2016-07-21 18:53 ` Roopa Prabhu
2016-07-21 19:43   ` Magnus Bergroth
2016-07-23 23:04     ` Roopa Prabhu
2016-07-21 20:00   ` Eric W. Biederman
2016-07-21 21:08     ` Magnus Bergroth
2016-07-22 19:24       ` Eric W. Biederman
2016-07-22  6:16     ` Roopa Prabhu
2016-07-22 19:20       ` Eric W. Biederman
2016-07-23 23:03         ` Roopa Prabhu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5793F7B9.5060603@cumulusnetworks.com \
    --to=roopa@cumulusnetworks.com \
    --cc=bergroth@nordu.net \
    --cc=ebiederm@xmission.com \
    --cc=netdev@vger.kernel.org \
    --cc=olivier.dugeon@orange.com \
    --cc=rshearma@brocade.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.