* [PATCH net] nexthop: Fix performance regression in nexthop deletion
@ 2020-10-16 17:29 Ido Schimmel
2020-10-16 21:46 ` Jesse Brandeburg
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Ido Schimmel @ 2020-10-16 17:29 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, dsahern, nikolay, mlxsw, Ido Schimmel
From: Ido Schimmel <idosch@nvidia.com>
While insertion of 16k nexthops all using the same netdev ('dummy10')
takes less than a second, deletion takes about 130 seconds:
# time -p ip -b nexthop.batch
real 0.29
user 0.01
sys 0.15
# time -p ip link set dev dummy10 down
real 131.03
user 0.06
sys 0.52
This is because of repeated calls to synchronize_rcu() whenever a
nexthop is removed from a nexthop group:
# /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
...
b'finish_task_switch'
b'schedule'
b'schedule_timeout'
b'wait_for_completion'
b'__wait_rcu_gp'
b'synchronize_rcu.part.0'
b'synchronize_rcu'
b'__remove_nexthop'
b'remove_nexthop'
b'nexthop_flush_dev'
b'nh_netdev_event'
b'raw_notifier_call_chain'
b'call_netdevice_notifiers_info'
b'__dev_notify_flags'
b'dev_change_flags'
b'do_setlink'
b'__rtnl_newlink'
b'rtnl_newlink'
b'rtnetlink_rcv_msg'
b'netlink_rcv_skb'
b'rtnetlink_rcv'
b'netlink_unicast'
b'netlink_sendmsg'
b'____sys_sendmsg'
b'___sys_sendmsg'
b'__sys_sendmsg'
b'__x64_sys_sendmsg'
b'do_syscall_64'
b'entry_SYSCALL_64_after_hwframe'
- ip (277)
126554955
Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().
With this patch deletion of 16k nexthops takes less than a second:
# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04
Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:
# ./fib_nexthops.sh
...
Tests passed: 134
Tests failed: 0
Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv4/nexthop.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 8c0f17c6863c..0dc43ad28eb9 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -845,7 +845,7 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
remove_nh_grp_entry(net, nhge, nlinfo);
/* make sure all see the newly published array before releasing rtnl */
- synchronize_rcu();
+ synchronize_net();
}
static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net] nexthop: Fix performance regression in nexthop deletion
2020-10-16 17:29 [PATCH net] nexthop: Fix performance regression in nexthop deletion Ido Schimmel
@ 2020-10-16 21:46 ` Jesse Brandeburg
2020-10-17 4:37 ` David Ahern
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Jesse Brandeburg @ 2020-10-16 21:46 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, kuba, dsahern, nikolay, mlxsw, Ido Schimmel, paulmck
Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
>
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
>
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52
snip...
> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
>
> With this patch deletion of 16k nexthops takes less than a second:
>
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04
That's a nice result! Well done! I can't really speak to whether or not
there is any horrible side effect of using synchronize_rcu_expedited(),
but FWIW:
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
>
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed: 0
>
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Nice fix, good commit message, thanks!
> ---
> net/ipv4/nexthop.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 8c0f17c6863c..0dc43ad28eb9 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -845,7 +845,7 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
> remove_nh_grp_entry(net, nhge, nlinfo);
>
> /* make sure all see the newly published array before releasing rtnl */
> - synchronize_rcu();
> + synchronize_net();
> }
>
> static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] nexthop: Fix performance regression in nexthop deletion
2020-10-16 17:29 [PATCH net] nexthop: Fix performance regression in nexthop deletion Ido Schimmel
2020-10-16 21:46 ` Jesse Brandeburg
@ 2020-10-17 4:37 ` David Ahern
2020-10-17 9:16 ` Nikolay Aleksandrov
2020-10-20 3:10 ` Jakub Kicinski
3 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2020-10-17 4:37 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: davem, kuba, nikolay, mlxsw, Ido Schimmel
On 10/16/20 11:29 AM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
>
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
>
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52
>
> This is because of repeated calls to synchronize_rcu() whenever a
> nexthop is removed from a nexthop group:
>
> # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
> ...
> b'finish_task_switch'
> b'schedule'
> b'schedule_timeout'
> b'wait_for_completion'
> b'__wait_rcu_gp'
> b'synchronize_rcu.part.0'
> b'synchronize_rcu'
> b'__remove_nexthop'
> b'remove_nexthop'
> b'nexthop_flush_dev'
> b'nh_netdev_event'
> b'raw_notifier_call_chain'
> b'call_netdevice_notifiers_info'
> b'__dev_notify_flags'
> b'dev_change_flags'
> b'do_setlink'
> b'__rtnl_newlink'
> b'rtnl_newlink'
> b'rtnetlink_rcv_msg'
> b'netlink_rcv_skb'
> b'rtnetlink_rcv'
> b'netlink_unicast'
> b'netlink_sendmsg'
> b'____sys_sendmsg'
> b'___sys_sendmsg'
> b'__sys_sendmsg'
> b'__x64_sys_sendmsg'
> b'do_syscall_64'
> b'entry_SYSCALL_64_after_hwframe'
> - ip (277)
> 126554955
>
> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
>
> With this patch deletion of 16k nexthops takes less than a second:
>
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04
>
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
>
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed: 0
>
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> net/ipv4/nexthop.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Thanks for finding this, Ido.
Reviewed-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] nexthop: Fix performance regression in nexthop deletion
2020-10-16 17:29 [PATCH net] nexthop: Fix performance regression in nexthop deletion Ido Schimmel
2020-10-16 21:46 ` Jesse Brandeburg
2020-10-17 4:37 ` David Ahern
@ 2020-10-17 9:16 ` Nikolay Aleksandrov
2020-10-20 3:10 ` Jakub Kicinski
3 siblings, 0 replies; 5+ messages in thread
From: Nikolay Aleksandrov @ 2020-10-17 9:16 UTC (permalink / raw)
To: idosch, netdev; +Cc: dsahern, mlxsw, davem, kuba, Ido Schimmel
On Fri, 2020-10-16 at 20:29 +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
>
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
>
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52
>
> This is because of repeated calls to synchronize_rcu() whenever a
> nexthop is removed from a nexthop group:
>
> # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
> ...
> b'finish_task_switch'
> b'schedule'
> b'schedule_timeout'
> b'wait_for_completion'
> b'__wait_rcu_gp'
> b'synchronize_rcu.part.0'
> b'synchronize_rcu'
> b'__remove_nexthop'
> b'remove_nexthop'
> b'nexthop_flush_dev'
> b'nh_netdev_event'
> b'raw_notifier_call_chain'
> b'call_netdevice_notifiers_info'
> b'__dev_notify_flags'
> b'dev_change_flags'
> b'do_setlink'
> b'__rtnl_newlink'
> b'rtnl_newlink'
> b'rtnetlink_rcv_msg'
> b'netlink_rcv_skb'
> b'rtnetlink_rcv'
> b'netlink_unicast'
> b'netlink_sendmsg'
> b'____sys_sendmsg'
> b'___sys_sendmsg'
> b'__sys_sendmsg'
> b'__x64_sys_sendmsg'
> b'do_syscall_64'
> b'entry_SYSCALL_64_after_hwframe'
> - ip (277)
> 126554955
>
> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
>
> With this patch deletion of 16k nexthops takes less than a second:
>
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04
>
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
>
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed: 0
>
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> net/ipv4/nexthop.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
Looks good.
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] nexthop: Fix performance regression in nexthop deletion
2020-10-16 17:29 [PATCH net] nexthop: Fix performance regression in nexthop deletion Ido Schimmel
` (2 preceding siblings ...)
2020-10-17 9:16 ` Nikolay Aleksandrov
@ 2020-10-20 3:10 ` Jakub Kicinski
3 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2020-10-20 3:10 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, dsahern, nikolay, mlxsw, Ido Schimmel
On Fri, 16 Oct 2020 20:29:14 +0300 Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
Applied, thank you!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-10-20 3:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 17:29 [PATCH net] nexthop: Fix performance regression in nexthop deletion Ido Schimmel
2020-10-16 21:46 ` Jesse Brandeburg
2020-10-17 4:37 ` David Ahern
2020-10-17 9:16 ` Nikolay Aleksandrov
2020-10-20 3:10 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).