netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?)
@ 2014-05-27 21:29 sowmini varadhan
  2014-05-28  1:41 ` Eric Dumazet
  2014-05-29  6:34 ` Julian Anastasov
  0 siblings, 2 replies; 13+ messages in thread
From: sowmini varadhan @ 2014-05-27 21:29 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Eric Dumazet, Niels Möller, netdev, Jonas Bonn

On Sat, May 24, 2014 at 8:06 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 05/23/14 10:14, Eric Dumazet wrote:
>
>> Use the batch mode, and it will be much faster than ifconfig, as
>> ifconfig does not support this mode (you need one fork()/exec() per IP
>> address)
>>
>> ip -batch filename
>>
>
> The address dumping algorithm is a very likely contributor as well.
> It tries to remember indices and then skips on the next iteration
> all the way to where it left off.... has never been a big deal until
> someone tries a substantial number of addresses.
>
> cheers,
> jamal

Niels (nisse@southpole.se) reported:

   I've done a simple benchmark with a script assigning n addresses
   using "ip address add", and this seems to have O(n^2) complexity.
   E.g, assigning n=25500 addresses took 26 s, and doubling n, assigning
   51000 addresses, took 122 s, 4.6 times longer. Which isn't
   necessarily a problems once all the addresses are assigned, but it
   sounds a bit like there's a linear datastructure in there, not
   intended for a large number of addresses.

And this bothered me, since the suggested workaround of
"ip -b", plus the comment about slow address dumping algorithm
are both saying that there may be some fundamental scaling
issues here.

Also, my earlier comment about netlink vs ioctl was possibly
a red-herring- when I compared my experiment with what Niels is
trying to do, the experiment was different- I was adding
an address to a (newly created) tunnel interface (thus
explodes both number of interfaces and addresses), whereas
Niels is addign all addresses to the same interface.

So I looked at Niels' test script with perf. Some observations:

perf tells me:

   80.13%       ip  [other]
                 |
                 |--30.12%-- fib_sync_up
                 |          |
                 |           --30.12%-- fib_inetaddr_event
                 |                     notifier_call_chain
                 |                     __blocking_notifier_call_chain
                 |                     blocking_notifier_call_chain
                 |                     __inet_insert_ifa
                 |                     inet_rtm_newaddr
                 |                     rtnetlink_rcv_msg
                 |                     netlink_rcv_skb
                 |                     rtnetlink_rcv
                 |                     netlink_unicast
                 |                     netlink_sendmsg
                 |                     sock_sendmsg
                 |                     ___sys_sendmsg
                 |                     __sys_sendmsg
                 |                     SyS_sendmsg
                 |                     SyS_socketcall
                 |                     syscall_call

thus fib_sync_up() itself doesn't scale very well. Not sure
how much tweak-potential exists here.

Further, in __inet_insert_ifa, we walk the ifa_list at least once
(which is probably unavoidable),

static int __inet_insert_ifa( /* .. */
                             u32 portid)
{

        /* ... */
       for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
             ifap = &ifa1->ifa_next) {
        /* ... */
       blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);

       return (0);
}

But in addition, The fib callback: fib_inetaddr_event() has another
potential ifa_list walk for SECONDARY addresses.

        switch (event) {
        case NETDEV_UP:
                fib_add_ifaddr(ifa);
#ifdef CONFIG_IP_ROUTE_MULTIPATH
                fib_sync_up(dev);
#endif

For Niels script, since there are many addresses in the same
subnet, we'll have a lot of cases of an IFA_F_SECONDARY address,
so fib_add_ifaddr will then do another walk of the ifa_list.

Has anyone looked at consolidating some of this?
All of this could easily become a factor when the system
has a large number of interfaces and addresses, and the
control plane only wants to modify a very small subset of
that state.

--Sowmini

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-05-29 16:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-27 21:29 Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?) sowmini varadhan
2014-05-28  1:41 ` Eric Dumazet
2014-05-28 10:01   ` sowmini varadhan
2014-05-28 11:23     ` Jamal Hadi Salim
2014-05-28 11:54       ` sowmini varadhan
2014-05-28 12:18   ` sowmini varadhan
2014-05-28 13:44     ` Eric Dumazet
2014-05-28 14:48       ` Eric Dumazet
2014-05-28 16:00         ` Eric Dumazet
2014-05-28 17:18         ` sowmini varadhan
2014-05-29  6:34 ` Julian Anastasov
2014-05-29 16:11   ` sowmini varadhan
2014-05-29 16:19     ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).