Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Vlad Buslov <vladbu@mellanox.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Ido Schimmel <idosch@idosch.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"jhs@mojatatu.com" <jhs@mojatatu.com>,
	"jiri@resnulli.us" <jiri@resnulli.us>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>
Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex
Date: Tue, 19 Feb 2019 15:20:37 +0000
Message-ID: <vbfzhqrq01r.fsf@mellanox.com> (raw)
In-Reply-To: <CAM_iQpU=sFqoSZUZaeRM7d4ZFMqiEaY2mn1G_gB5KvkzLdcQgw@mail.gmail.com>


On Tue 19 Feb 2019 at 05:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Feb 15, 2019 at 2:02 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> I looked at the code and problem seems to be matchall classifier
>> specific. My implementation of unlocked cls API assumes that concurrent
>> insertions are possible and checks for it when deleting "empty" tp.
>> Since classifiers don't expose number of elements, the only way to test
>> this is to do tp->walk() on them and assume that walk callback is called
>> once per filter on every classifier. In your example new tp is created
>> for second filter, filter insertion fails, number of elements on newly
>> created tp is checked with tp->walk() before deleting it. However,
>> matchall classifier always calls the tp->walk() callback once, even when
>> it doesn't have a valid filter (in this case with NULL filter pointer).
>
> Again, this can be eliminated by just switching to normal
> non-retry logic. This is yet another headache to review this
> kind of unlock-and-retry logic, I have no idea why you are such
> a big fan of it.

The retry approach was suggested to me multiple times by Jiri on
previous code reviews so I assumed it is preferred approach in such
cases. I don't have a strong preference in this regard, but locking
whole tp on filter update will remove any parallelism when updating same
classifier instance concurrently. The goal of these changes is to allow
parallel rule update and to achieve that I had to introduce some
complexity into the code.

Now let me explain why these two approaches result completely different
performance in this case. Lets start with a list of most CPU-consuming
parts in new filter creation process in descending order (raw data at
the end of this mail):

1) Hardware offload - if available and no skip_hw.
2) Exts (actions) initalization - most expensive part even with single
action, CPU usage increases with number of actions per filter.
3) cls API.
4) Flower classifier data structure initialization.

Note that 1)+2) is ~80% of cost of creating a flower filter. So if we
just lock the whole flower classifier instance during rule update we
serialize 1, 2 and 4, and only cls API (~13% of CPU cost) can be
executed concurrently. However, in proposed flower implementation hw
offloading and action initialization code is called without any locks
and tp->lock is only obtained when modifying flower data structures,
which means that only 3) is serialized and everything else (87% of CPU
cost) can be executed in parallel.

First page of profiling data:

Samples: 100K of event 'cycles:ppp', Event count (approx.): 11191878316
  Children      Self  Command  Shared Object       Symbol
+   84.71%     0.08%  tc       [kernel.vmlinux]    [k] entry_SYSCALL_64_after_hwframe
+   84.62%     0.06%  tc       [kernel.vmlinux]    [k] do_syscall_64
+   82.63%     0.01%  tc       libc-2.25.so        [.] __libc_sendmsg
+   82.37%     0.00%  tc       [kernel.vmlinux]    [k] __sys_sendmsg
+   82.37%     0.00%  tc       [kernel.vmlinux]    [k] ___sys_sendmsg
+   82.34%     0.00%  tc       [kernel.vmlinux]    [k] sock_sendmsg
+   82.34%     0.01%  tc       [kernel.vmlinux]    [k] netlink_sendmsg
+   82.15%     0.15%  tc       [kernel.vmlinux]    [k] netlink_unicast
+   82.10%     0.11%  tc       [kernel.vmlinux]    [k] netlink_rcv_skb
+   80.76%     0.22%  tc       [kernel.vmlinux]    [k] rtnetlink_rcv_msg
+   80.10%     0.24%  tc       [kernel.vmlinux]    [k] tc_new_tfilter
+   69.30%     2.11%  tc       [cls_flower]        [k] fl_change
+   33.56%     0.05%  tc       [kernel.vmlinux]    [k] tcf_exts_validate
+   33.50%     0.12%  tc       [kernel.vmlinux]    [k] tcf_action_init
+   33.30%     0.10%  tc       [kernel.vmlinux]    [k] tcf_action_init_1
+   32.78%     0.11%  tc       [act_gact]          [k] tcf_gact_init
+   30.93%     0.16%  tc       [kernel.vmlinux]    [k] tc_setup_cb_call
+   29.96%     0.60%  tc       [mlx5_core]         [k] mlx5e_configure_flower
+   27.62%     0.23%  tc       [mlx5_core]         [k] mlx5e_tc_add_nic_flow
+   27.31%     0.45%  tc       [kernel.vmlinux]    [k] tcf_idr_create
+   25.45%     1.75%  tc       [kernel.vmlinux]    [k] pcpu_alloc
+   16.33%     0.07%  tc       [mlx5_core]         [k] mlx5_cmd_exec
+   16.26%     1.96%  tc       [mlx5_core]         [k] cmd_exec
+   14.28%     1.05%  tc       [mlx5_core]         [k] mlx5_add_flow_rules
+   14.02%     0.26%  tc       [kernel.vmlinux]    [k] pcpu_alloc_area
+   13.09%     0.13%  tc       [mlx5_core]         [k] mlx5_fc_create
+    9.77%     0.30%  tc       [mlx5_core]         [k] add_rule_fg.isra.28
+    9.08%     0.84%  tc       [mlx5_core]         [k] mlx5_cmd_set_fte
+    8.90%     0.09%  tc       [mlx5_core]         [k] mlx5_cmd_fc_alloc
+    7.90%     0.12%  tc       [kernel.vmlinux]    [k] tfilter_notify
+    7.34%     0.61%  tc       [kernel.vmlinux]    [k] __queue_work
+    7.25%     0.26%  tc       [kernel.vmlinux]    [k] tcf_fill_node
+    6.73%     0.23%  tc       [kernel.vmlinux]    [k] wait_for_completion_timeout
+    6.67%     0.20%  tc       [cls_flower]        [k] fl_dump
+    6.52%     5.93%  tc       [kernel.vmlinux]    [k] memset_erms
+    5.77%     0.49%  tc       [kernel.vmlinux]    [k] schedule_timeout
+    5.57%     1.29%  tc       [kernel.vmlinux]    [k] try_to_wake_up
+    5.50%     0.11%  tc       [kernel.vmlinux]    [k] pcpu_block_update_hint_alloc
+    5.40%     0.85%  tc       [kernel.vmlinux]    [k] pcpu_block_refresh_hint
+    5.28%     0.11%  tc       [kernel.vmlinux]    [k] queue_work_on
+    5.19%     4.96%  tc       [kernel.vmlinux]    [k] find_next_bit
+    4.77%     0.11%  tc       [kernel.vmlinux]    [k] idr_alloc_u32
+    4.71%     0.10%  tc       [kernel.vmlinux]    [k] schedule
+    4.62%     0.30%  tc       [kernel.vmlinux]    [k] __sched_text_start
+    4.48%     4.41%  tc       [kernel.vmlinux]    [k] idr_get_free
+    4.19%     0.04%  tc       [kernel.vmlinux]    [k] tcf_idr_check_alloc

  reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  8:55 [PATCH net-next v4 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 01/17] net: sched: protect block state with mutex Vlad Buslov
2019-02-11 14:15   ` Jiri Pirko
2019-02-11  8:55 ` [PATCH net-next v4 02/17] net: sched: protect chain->explicitly_created with block->lock Vlad Buslov
2019-02-11 14:15   ` Jiri Pirko
2019-02-11  8:55 ` [PATCH net-next v4 03/17] net: sched: refactor tc_ctl_chain() to use block->lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 04/17] net: sched: protect block->chain0 with block->lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Vlad Buslov
2019-02-15 22:21   ` Cong Wang
2019-02-18 10:07     ` Vlad Buslov
2019-02-18 18:26       ` Cong Wang
2019-02-19 16:04         ` Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 06/17] net: sched: protect chain template accesses with block lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Vlad Buslov
2019-02-14 18:24   ` Ido Schimmel
2019-02-15 10:02     ` Vlad Buslov
2019-02-15 11:30       ` Ido Schimmel
2019-02-15 12:11         ` [PATCH] net: sched: matchall: verify that filter is not NULL in mall_walk() Vlad Buslov
2019-02-15 13:47           ` Ido Schimmel
2019-02-16  0:24           ` Cong Wang
2019-02-18 12:00             ` Vlad Buslov
2019-02-17 21:27           ` David Miller
2019-02-15 12:15         ` [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Vlad Buslov
2019-02-15 15:35         ` Vlad Buslov
2019-02-19  5:26           ` Cong Wang
2019-02-19 12:31             ` Vlad Buslov
2019-02-20 22:43               ` Cong Wang
2019-02-21 15:49                 ` Vlad Buslov
2019-02-19  5:08       ` Cong Wang
2019-02-19 15:20         ` Vlad Buslov [this message]
2019-02-20 23:00           ` Cong Wang
2019-02-21 17:11             ` Vlad Buslov
2019-02-15 22:35   ` Cong Wang
2019-02-18 11:06     ` Vlad Buslov
2019-02-18 18:31       ` Cong Wang
2019-02-11  8:55 ` [PATCH net-next v4 08/17] net: sched: introduce reference counting for tcf_proto Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto() Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 10/17] net: sched: refactor tp insert/delete for concurrent execution Vlad Buslov
2019-02-15 23:17   ` Cong Wang
2019-02-18 11:19     ` Vlad Buslov
2019-02-18 19:55       ` Cong Wang
2019-02-19 10:25         ` Vlad Buslov
2019-02-18 19:53   ` Cong Wang
2019-02-11  8:55 ` [PATCH net-next v4 11/17] net: sched: prevent insertion of new classifiers during chain flush Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 12/17] net: sched: track rtnl lock status when validating extensions Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 13/17] net: sched: extend proto ops with 'put' callback Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 14/17] net: sched: extend proto ops to support unlocked classifiers Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 15/17] net: sched: add flags to Qdisc class ops struct Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 16/17] net: sched: refactor tcf_block_find() into standalone functions Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 17/17] net: sched: unlock rules update API Vlad Buslov
2019-02-18 18:56   ` Cong Wang
2019-02-12 18:42 ` [PATCH net-next v4 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock David Miller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfzhqrq01r.fsf@mellanox.com \
    --to=vladbu@mellanox.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=idosch@idosch.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox