Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Vlad Buslov <vladbu@mellanox.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"jhs@mojatatu.com" <jhs@mojatatu.com>,
	"xiyou.wangcong@gmail.com" <xiyou.wangcong@gmail.com>,
	"jiri@resnulli.us" <jiri@resnulli.us>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>
Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex
Date: Fri, 15 Feb 2019 15:35:10 +0000
Message-ID: <vbfbm3djcd4.fsf@mellanox.com> (raw)
In-Reply-To: <20190215113041.GA10511@splinter>


On Fri 15 Feb 2019 at 11:30, Ido Schimmel <idosch@idosch.org> wrote:
> On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote:
>>
>> On Thu 14 Feb 2019 at 18:24, Ido Schimmel <idosch@idosch.org> wrote:
>> > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote:
>> >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
>> >> when accessing filter_chain list, instead of relying on rtnl lock.
>> >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to
>> >> verify that all users of chain_list have the lock taken.
>> >>
>> >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
>> >> all necessary code while holding chain lock in order to prevent
>> >> invalidation of chain_info structure by potential concurrent change. This
>> >> also serializes calls to tcf_chain0_head_change(), which allows head change
>> >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl
>> >> mutex.
>> >>
>> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> >
>> > With this sequence [1] I get the following trace [2]. Bisected it to
>> > this patch. Note that second filter is rejected by the device driver
>> > (that's the intention). I guess it is not properly removed from the
>> > filter chain in the error path?
>> >
>> > Thanks
>> >
>> > [1]
>> > # tc qdisc add dev swp3 clsact
>> > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \
>> > 	action mirred egress mirror dev swp7
>> > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \
>> > 	action mirred egress mirror dev swp7
>> > RTNETLINK answers: File exists
>> > We have an error talking to the kernel, -1
>> > # tc qdisc del dev swp3 clsact
>> >
>> > [2]
>> > [   70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access
>> > [   70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
>> > [   70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304
>> > [   70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
>> > [   70.580204] RIP: 0010:mall_reoffload+0x14a/0x760
>> > [   70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83
>> > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd
>> > [   70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207
>> > [   70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000
>> > [   70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040
>> > [   70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa
>> > [   70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000
>> > [   70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010
>> > [   70.652163] FS:  00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000
>> > [   70.661218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [   70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0
>> > [   70.675633] Call Trace:
>> > [   70.693046]  tcf_block_playback_offloads+0x94/0x230
>> > [   70.710617]  __tcf_block_cb_unregister+0xf7/0x2d0
>> > [   70.734143]  mlxsw_sp_setup_tc+0x20f/0x660
>> > [   70.738739]  tcf_block_offload_unbind+0x22b/0x350
>> > [   70.748898]  __tcf_block_put+0x203/0x630
>> > [   70.769700]  tcf_block_put_ext+0x2f/0x40
>> > [   70.774098]  clsact_destroy+0x7a/0xb0
>> > [   70.782604]  qdisc_destroy+0x11a/0x5c0
>> > [   70.786810]  qdisc_put+0x5a/0x70
>> > [   70.790435]  notify_and_destroy+0x8e/0xa0
>> > [   70.794933]  qdisc_graft+0xbb7/0xef0
>> > [   70.809009]  tc_get_qdisc+0x518/0xa30
>> > [   70.821530]  rtnetlink_rcv_msg+0x397/0xa20
>> > [   70.835510]  netlink_rcv_skb+0x132/0x380
>> > [   70.848419]  netlink_unicast+0x4c0/0x690
>> > [   70.866019]  netlink_sendmsg+0x929/0xe10
>> > [   70.882134]  sock_sendmsg+0xc8/0x110
>> > [   70.886144]  ___sys_sendmsg+0x77a/0x8f0
>> > [   70.924064]  __sys_sendmsg+0xf7/0x250
>> > [   70.955727]  do_syscall_64+0x14d/0x610
>>
>> Hi Ido,
>>
>> Thanks for reporting this.
>>
>> I looked at the code and problem seems to be matchall classifier
>> specific. My implementation of unlocked cls API assumes that concurrent
>> insertions are possible and checks for it when deleting "empty" tp.
>> Since classifiers don't expose number of elements, the only way to test
>> this is to do tp->walk() on them and assume that walk callback is called
>> once per filter on every classifier. In your example new tp is created
>> for second filter, filter insertion fails, number of elements on newly
>> created tp is checked with tp->walk() before deleting it. However,
>> matchall classifier always calls the tp->walk() callback once, even when
>> it doesn't have a valid filter (in this case with NULL filter pointer).
>>
>> Possible ways to fix this:
>>
>> 1) Check for NULL filter pointer in tp->walk() callback and ignore it
>> when counting filters on tp - will work but I don't like it because I
>> don't think it is ever correct to call tp->walk() callback with NULL
>> filter pointer.
>>
>> 2) Fix matchall to not call tp->walk() callback with NULL filter
>> pointers - my preferred simple solution.
>>
>> 3) Extend tp API to have direct way to count filters by implementing
>> tp->count - requires change to every classifier. Or maybe add it as
>> optional API that only unlocked classifiers are required to implement
>> and just delete rtnl lock dependent tp's without checking for concurrent
>> insertions.
>>
>> What do you think?
>
> Since the problem is matchall-specific, then it makes sense to fix it in
> matchall like you suggested in option #2.
>
> Can you please use this opportunity and audit other classifiers to
> confirm problem is indeed specific to matchall?
>

Turns out cls_cgroup has the same problem.

Another problem that I found in cls_fw and cls_route is that they set
arg->stop when empty. Both of them have code unchanged since it was
committed initially in 2005 so I assume this convention is no longer
relevant because all other classifiers don't do that (they only set
arg->stop when arg->fn returns negative value).

I sent fixes for all of those cases.

  parent reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  8:55 [PATCH net-next v4 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 01/17] net: sched: protect block state with mutex Vlad Buslov
2019-02-11 14:15   ` Jiri Pirko
2019-02-11  8:55 ` [PATCH net-next v4 02/17] net: sched: protect chain->explicitly_created with block->lock Vlad Buslov
2019-02-11 14:15   ` Jiri Pirko
2019-02-11  8:55 ` [PATCH net-next v4 03/17] net: sched: refactor tc_ctl_chain() to use block->lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 04/17] net: sched: protect block->chain0 with block->lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Vlad Buslov
2019-02-15 22:21   ` Cong Wang
2019-02-18 10:07     ` Vlad Buslov
2019-02-18 18:26       ` Cong Wang
2019-02-19 16:04         ` Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 06/17] net: sched: protect chain template accesses with block lock Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Vlad Buslov
2019-02-14 18:24   ` Ido Schimmel
2019-02-15 10:02     ` Vlad Buslov
2019-02-15 11:30       ` Ido Schimmel
2019-02-15 12:11         ` [PATCH] net: sched: matchall: verify that filter is not NULL in mall_walk() Vlad Buslov
2019-02-15 13:47           ` Ido Schimmel
2019-02-16  0:24           ` Cong Wang
2019-02-18 12:00             ` Vlad Buslov
2019-02-17 21:27           ` David Miller
2019-02-15 12:15         ` [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Vlad Buslov
2019-02-15 15:35         ` Vlad Buslov [this message]
2019-02-19  5:26           ` Cong Wang
2019-02-19 12:31             ` Vlad Buslov
2019-02-20 22:43               ` Cong Wang
2019-02-21 15:49                 ` Vlad Buslov
2019-02-19  5:08       ` Cong Wang
2019-02-19 15:20         ` Vlad Buslov
2019-02-20 23:00           ` Cong Wang
2019-02-21 17:11             ` Vlad Buslov
2019-02-15 22:35   ` Cong Wang
2019-02-18 11:06     ` Vlad Buslov
2019-02-18 18:31       ` Cong Wang
2019-02-11  8:55 ` [PATCH net-next v4 08/17] net: sched: introduce reference counting for tcf_proto Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto() Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 10/17] net: sched: refactor tp insert/delete for concurrent execution Vlad Buslov
2019-02-15 23:17   ` Cong Wang
2019-02-18 11:19     ` Vlad Buslov
2019-02-18 19:55       ` Cong Wang
2019-02-19 10:25         ` Vlad Buslov
2019-02-18 19:53   ` Cong Wang
2019-02-11  8:55 ` [PATCH net-next v4 11/17] net: sched: prevent insertion of new classifiers during chain flush Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 12/17] net: sched: track rtnl lock status when validating extensions Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 13/17] net: sched: extend proto ops with 'put' callback Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 14/17] net: sched: extend proto ops to support unlocked classifiers Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 15/17] net: sched: add flags to Qdisc class ops struct Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 16/17] net: sched: refactor tcf_block_find() into standalone functions Vlad Buslov
2019-02-11  8:55 ` [PATCH net-next v4 17/17] net: sched: unlock rules update API Vlad Buslov
2019-02-18 18:56   ` Cong Wang
2019-02-12 18:42 ` [PATCH net-next v4 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock David Miller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfbm3djcd4.fsf@mellanox.com \
    --to=vladbu@mellanox.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=idosch@idosch.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox