[PATCH net-next v2 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock

* [PATCH net-next v2 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock
@ 2018-12-11 10:11 Vlad Buslov
  2018-12-11 10:11 ` [PATCH net-next v2 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
                   ` (16 more replies)
  0 siblings, 17 replies; 30+ messages in thread
From: Vlad Buslov @ 2018-12-11 10:11 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently, and completes refactoring of cls API rules update handlers
by removing rtnl lock dependency from code that handles chains and
classifiers. Rules update handlers are registered with
RTNL_FLAG_DOIT_UNLOCKED flag.

This patch set substitutes global rtnl lock dependency on rules update
path in cls API by extending its data structures with following locks:
- tcf_block with 'lock' spinlock. It is used to protect block state, and
  life-time management fields of chains on the block (chain->refcnt,
  chain->action_refcnt, chain->explicitly crated, etc.).
- tcf_chain with 'filter_chain_lock' spinlock, that is used to protect
  list of classifier instances attached to chain.
- tcf_proto with 'lock' spinlock that is intended to be used to
  synchronize access to classifiers that support unlocked execution.

Chain0 head change callbacks can be sleeping and cannot be protected by
block spinlock. To solve this issue, sleeping miniqp swap function (used
as head change callback by ingress and clsact Qdiscs) is refactored to
offload sleeping operations to workqueue. New ordered workqueue
'tc_proto_workqueue' is created in cls_api to be used by miniqp and for
tcf_proto deallocation, which is also moved to workqueue to prevent
deallocation of tp's that are still in use by block. Performing both
miniqp swap and tp deallocation on same ordered workqueue ensures that
any pending head change requests involving tp are completed before the
tp is deallocated.

Classifiers are extended with reference counting to accommodate parallel
access by unlocked cls API. Classifier ops structure is extended with
additional 'put' function to allow reference counting of filters and
intended to be used by classifiers that implement rtnl-unlocked API.
Users of classifiers and individual filter instances are modified to
always hold reference while working with them.

Classifiers that support unlocked execution still need to know the
status of rtnl lock, so their API is extended with additional
'rtnl_held' argument that is used to indicate that caller holds rtnl
lock.

Changes from V1 to V2:
- Patch 1:
  - Use smp_store_release() instead of xchg() for setting
    miniqp->tp_head.
  - Move Qdisc deallocation to tc_proto_wq ordered workqueue that is
    used to destroy tcf proto instances. This is necessary to ensure
    that Qdisc is destroyed after all instances of chain/proto that it
    contains in order to prevent use-after-free error in
    tc_chain_notify_delete().
  - Cache parent net device ifindex in block to prevent use-after-free
    of dev queue in tc_chain_notify_delete().
- Patch 2:
  - Use lockdep_assert_held() instead of spin_is_locked() for assertion.
  - Use 'free_block' argument in tcf_chain_destroy() instead of checking
    block's reference count and chain_list for second time.
- Patch 7:
  - Refactor tcf_chain0_head_change_cb_add() to not take block->lock and
    chain0->filter_chain_lock in correct order.
- Patch 10:
  - Always set 'tp_created' flag when creating tp to prevent releasing
    the chain twice when tp with same priority was inserted
    concurrently.
- Patch 11:
  - Add additional check to prevent creation of new proto instance when
    parent chain is being flushed to reduce CPU usage.
  - Don't call tcf_chain_delete_empty() if tp insertion failed.
- Patch 16 - new.
- Patch 17:
  - Refactor to only lock take rtnl lock once (at the beginning of rule
    update handlers).
  - Always release rtnl mutex in the same function that locked it.
    Remove unlock code from tcf_block_release().

Vlad Buslov (17):
  net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  net: sched: protect block state with spinlock
  net: sched: refactor tc_ctl_chain() to use block->lock
  net: sched: protect block->chain0 with block->lock
  net: sched: traverse chains in block with tcf_get_next_chain()
  net: sched: protect chain template accesses with block lock
  net: sched: lock the chain when accessing filter_chain list
  net: sched: introduce reference counting for tcf proto
  net: sched: traverse classifiers in chain with tcf_get_next_proto()
  net: sched: refactor tp insert/delete for concurrent execution
  net: sched: prevent insertion of new classifiers during chain flush
  net: sched: track rtnl lock status when validating extensions
  net: sched: extend proto ops with 'put' callback
  net: sched: extend proto ops to support unlocked classifiers
  net: sched: add flags to Qdisc class ops struct
  net: sched: refactor tcf_block_find() into standalone functions
  net: sched: unlock rules update API

 include/net/pkt_cls.h     |    6 +-
 include/net/sch_generic.h |   81 ++-
 net/sched/cls_api.c       | 1213 ++++++++++++++++++++++++++++++++++-----------
 net/sched/cls_basic.c     |   14 +-
 net/sched/cls_bpf.c       |   15 +-
 net/sched/cls_cgroup.c    |   13 +-
 net/sched/cls_flow.c      |   15 +-
 net/sched/cls_flower.c    |   16 +-
 net/sched/cls_fw.c        |   15 +-
 net/sched/cls_matchall.c  |   16 +-
 net/sched/cls_route.c     |   14 +-
 net/sched/cls_rsvp.h      |   16 +-
 net/sched/cls_tcindex.c   |   17 +-
 net/sched/cls_u32.c       |   14 +-
 net/sched/sch_api.c       |   10 +-
 net/sched/sch_generic.c   |   56 ++-
 net/sched/sch_ingress.c   |    4 +
 17 files changed, 1175 insertions(+), 360 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 30+ messages in thread