From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3B33C169C4 for ; Mon, 11 Feb 2019 08:56:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BC95A2146F for ; Mon, 11 Feb 2019 08:56:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727273AbfBKI44 (ORCPT ); Mon, 11 Feb 2019 03:56:56 -0500 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:60036 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726997AbfBKI4h (ORCPT ); Mon, 11 Feb 2019 03:56:37 -0500 Received: from Internal Mail-Server by MTLPINE1 (envelope-from vladbu@mellanox.com) with ESMTPS (AES256-SHA encrypted); 11 Feb 2019 10:56:31 +0200 Received: from reg-r-vrt-018-180.mtr.labs.mlnx. (reg-r-vrt-018-180.mtr.labs.mlnx [10.213.18.180]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x1B8uVBU020653; Mon, 11 Feb 2019 10:56:31 +0200 From: Vlad Buslov To: netdev@vger.kernel.org Cc: jhs@mojatatu.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net, Vlad Buslov Subject: [PATCH net-next v4 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Date: Mon, 11 Feb 2019 10:55:31 +0200 Message-Id: <20190211085548.7190-1-vladbu@mellanox.com> X-Mailer: git-send-email 2.13.6 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of classifiers API that prevent its control path from being executed concurrently, and completes refactoring of cls API rules update handlers by removing rtnl lock dependency from code that handles chains and classifiers. Rules update handlers are registered with RTNL_FLAG_DOIT_UNLOCKED flag. This patch set substitutes global rtnl lock dependency on rules update path in cls API by extending its data structures with following locks: - tcf_block with 'lock' mutex. It is used to protect block state and life-time management fields of chains on the block (chain->refcnt, chain->action_refcnt, chain->explicitly crated, etc.). - tcf_chain with 'filter_chain_lock' mutex, that is used to protect list of classifier instances attached to chain. chain0->filter_chain_lock serializes calls to head change callbacks and allows them to rely on filter_chain_lock for serialization instead of rtnl lock. - tcf_proto with 'lock' spinlock that is intended to be used to synchronize access to classifiers that support unlocked execution. Classifiers are extended with reference counting to accommodate parallel access by unlocked cls API. Classifier ops structure is extended with additional 'put' function to allow reference counting of filters and intended to be used by classifiers that implement rtnl-unlocked API. Users of classifiers and individual filter instances are modified to always hold reference while working with them. Classifiers that support unlocked execution still need to know the status of rtnl lock, so their API is extended with additional 'rtnl_held' argument that is used to indicate that caller holds rtnl lock. Cls API propagates rtnl lock status across its helper functions and passes it to classifier. Changes from V3 to V4: - Patch 1: - Extract code that manages chain 'explicitly_created' flag into standalone patch. - Patch 2 - new. Changes from V2 to V3: - Change block->lock and chain->filter_chain_lock type to mutex. This removes the need for async miniqp refactoring and allows calling sleeping functions while holding the block->lock and chain->filter_chain_lock locks. - Previous patch 1 - async miniqp is no longer needed, remove the patch. - Patch 1: - Change block->lock type to mutex. - Implement tcf_block_destroy() helper function that destroys block->lock mutex before deallocating the block. - Revert GFP_KERNEL->GFP_ATOMIC memory allocation flags of tcf_chain which is no longer needed after block->lock type change. - Patch 6: - Change chain->filter_chain_lock type to mutex. - Assume chain0->filter_chain_lock synchronizations instead of rtnl lock in mini_qdisc_pair_swap() function that is called from head change callback of ingress Qdisc. With filter_chain_lock type changed to mutex it is now possible to call sleeping function while holding it, so it is now used instead of async implementation from previous versions of this patch set. - Patch 7: - Add local tp_next var to tcf_chain_flush() and use it to store tp->next pointer dereferenced with rcu_dereference_protected() to satisfy kbuild test robot. - Reset tp pointer to NULL at the beginning of tc_new_tfilter() to prevent its uninitialized usage in error handling code. This code was already implemented in patch 10, but must be in patch 8 to preserve code bisectability. - Put parent chain in tcf_proto_destroy(). In previous version this code was implemented in patch 1 which was removed in V3. Changes from V1 to V2: - Patch 1: - Use smp_store_release() instead of xchg() for setting miniqp->tp_head. - Move Qdisc deallocation to tc_proto_wq ordered workqueue that is used to destroy tcf proto instances. This is necessary to ensure that Qdisc is destroyed after all instances of chain/proto that it contains in order to prevent use-after-free error in tc_chain_notify_delete(). - Cache parent net device ifindex in block to prevent use-after-free of dev queue in tc_chain_notify_delete(). - Patch 2: - Use lockdep_assert_held() instead of spin_is_locked() for assertion. - Use 'free_block' argument in tcf_chain_destroy() instead of checking block's reference count and chain_list for second time. - Patch 7: - Refactor tcf_chain0_head_change_cb_add() to not take block->lock and chain0->filter_chain_lock in correct order. - Patch 10: - Always set 'tp_created' flag when creating tp to prevent releasing the chain twice when tp with same priority was inserted concurrently. - Patch 11: - Add additional check to prevent creation of new proto instance when parent chain is being flushed to reduce CPU usage. - Don't call tcf_chain_delete_empty() if tp insertion failed. - Patch 16 - new. - Patch 17: - Refactor to only lock take rtnl lock once (at the beginning of rule update handlers). - Always release rtnl mutex in the same function that locked it. Remove unlock code from tcf_block_release(). Github: [https://github.com/vbuslov/linux/tree/cls_api_rtnl_lock_remove_for_cong_v4] Vlad Buslov (17): net: sched: protect block state with mutex net: sched: protect chain->explicitly_created with block->lock net: sched: refactor tc_ctl_chain() to use block->lock net: sched: protect block->chain0 with block->lock net: sched: traverse chains in block with tcf_get_next_chain() net: sched: protect chain template accesses with block lock net: sched: protect filter_chain list with filter_chain_lock mutex net: sched: introduce reference counting for tcf_proto net: sched: traverse classifiers in chain with tcf_get_next_proto() net: sched: refactor tp insert/delete for concurrent execution net: sched: prevent insertion of new classifiers during chain flush net: sched: track rtnl lock status when validating extensions net: sched: extend proto ops with 'put' callback net: sched: extend proto ops to support unlocked classifiers net: sched: add flags to Qdisc class ops struct net: sched: refactor tcf_block_find() into standalone functions net: sched: unlock rules update API include/net/pkt_cls.h | 6 +- include/net/sch_generic.h | 68 ++- net/sched/cls_api.c | 1214 ++++++++++++++++++++++++++++++++++----------- net/sched/cls_basic.c | 14 +- net/sched/cls_bpf.c | 15 +- net/sched/cls_cgroup.c | 13 +- net/sched/cls_flow.c | 15 +- net/sched/cls_flower.c | 16 +- net/sched/cls_fw.c | 15 +- net/sched/cls_matchall.c | 16 +- net/sched/cls_route.c | 14 +- net/sched/cls_rsvp.h | 16 +- net/sched/cls_tcindex.c | 17 +- net/sched/cls_u32.c | 14 +- net/sched/sch_api.c | 10 +- net/sched/sch_generic.c | 6 +- 16 files changed, 1103 insertions(+), 366 deletions(-) -- 2.13.6