From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=iwK6=QZ=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 43873C43381
	for <netdev@archiver.kernel.org>; Mon, 18 Feb 2019 18:26:18 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 163C5217D7
	for <netdev@archiver.kernel.org>; Mon, 18 Feb 2019 18:26:18 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="X+Dp+/Fy"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732957AbfBRS0R (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Mon, 18 Feb 2019 13:26:17 -0500
Received: from mail-pf1-f196.google.com ([209.85.210.196]:34830 "EHLO
        mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1731147AbfBRS0Q (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 18 Feb 2019 13:26:16 -0500
Received: by mail-pf1-f196.google.com with SMTP id j5so4478854pfa.2
        for <netdev@vger.kernel.org>; Mon, 18 Feb 2019 10:26:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=wy3pM4PW2bKRTcSXw52TtPj6wA756bGq47svwZ9OKaw=;
        b=X+Dp+/FyH1bSrYE3aDtzQ1hVdbezlfzHBY/hWFzCj+hyYiHH3e5w7Z4l4w+SKl3EMQ
         WGe080BdtC1KTpXnmTTsVAFxRiX8dpwxc4j9wnPi5MCVlg3SlX7fGyLeSoQLyBxJslHr
         qZ8pXMfzOFpH533typWGffDpv4sa8dU9U+YOSeb3T2bRsUSfoeJxzT/ULm+HJixOlb1r
         /PbWyNj17T456G2ludIBrY9QFICtaAsj3/kUlqTKgTMoCWVDhZpcS8fRoBOWwuvWALHm
         QXdejvl+hk0c4uI5Sna/hXc309ZOBkfQf5X5N85xwTQPQFRVVnpHZWeBMmZjM4y1yB0w
         qMNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=wy3pM4PW2bKRTcSXw52TtPj6wA756bGq47svwZ9OKaw=;
        b=qKmq2Ug86r87iUhF5/0V2SMXH3FuwXm/QqVNljAaaoZ7d87ritcNMugpoUC/zyK4Xp
         QwcmTiC9kmI8O/JRJGW0I4GayV5HPgqRWB/B6kE4zLY+VdGZ4F623JxFki464ONPK0Ro
         g466ikrSFcgrkYF5X9ch2jfG15eOLRkg182U2/8xz7Q611cKzLqXl2+JeL6JTROJZdiz
         2LyTMnppKNQWE+66zSKddoYRYQeQz5NwsJJuNwNpN8ztnOv/UHdSvya5AcZrEUkRXf+W
         CiOpMfEkuHKB3csKxhhEUE2r4Rx1N/hgkboM/S/oY54MZ6B4LqC2dnGqUi6vhSDedwx5
         UmRg==
X-Gm-Message-State: AHQUAua6Z/PJgiyh7KTUfccdQ6jI3JPi6cy0urWIqeV5hGBb5btVQOc7
        QiV/BkjtYVQ7+YLTWvxB6UPgTPVzqTqJmxpyiE/drnXZ
X-Google-Smtp-Source: AHgI3IYklV/T6t2xbEXupBDqmK5KBCPioEj8oNWmwTXAGfzn/leb5y+fBHv7wRkmDkGNsCyLMSeNVZXMVfCVGtsQ1xM=
X-Received: by 2002:a62:70c9:: with SMTP id l192mr16327700pfc.207.1550514375633;
 Mon, 18 Feb 2019 10:26:15 -0800 (PST)
MIME-Version: 1.0
References: <20190211085548.7190-1-vladbu@mellanox.com> <20190211085548.7190-6-vladbu@mellanox.com>
 <CAM_iQpUnmKPh+RCi-JyouKz7PrAiWPp30Ro2qZFrR=MjVHogHA@mail.gmail.com> <vbflg2dqukb.fsf@mellanox.com>
In-Reply-To: <vbflg2dqukb.fsf@mellanox.com>
From:   Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon, 18 Feb 2019 10:26:03 -0800
Message-ID: <CAM_iQpVCP=6f4iRVqbgHxZcNHgmsdDJmuUQLk-9uPZar2xTGfw@mail.gmail.com>
Subject: Re: [PATCH net-next v4 05/17] net: sched: traverse chains in block
 with tcf_get_next_chain()
To:     Vlad Buslov <vladbu@mellanox.com>
Cc:     Linux Kernel Network Developers <netdev@vger.kernel.org>,
        Jamal Hadi Salim <jhs@mojatatu.com>,
        Jiri Pirko <jiri@resnulli.us>,
        David Miller <davem@davemloft.net>,
        Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>
Content-Type: text/plain; charset="UTF-8"
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Mon, Feb 18, 2019 at 2:07 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Hi Cong,
>
> Thanks for reviewing!
>
> On Fri 15 Feb 2019 at 22:21, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > (Sorry for joining this late.)
> >
> > On Mon, Feb 11, 2019 at 12:56 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> @@ -2432,7 +2474,11 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
> >>         index_start = cb->args[0];
> >>         index = 0;
> >>
> >> -       list_for_each_entry(chain, &block->chain_list, list) {
> >> +       for (chain = __tcf_get_next_chain(block, NULL);
> >> +            chain;
> >> +            chain_prev = chain,
> >> +                    chain = __tcf_get_next_chain(block, chain),
> >> +                    tcf_chain_put(chain_prev)) {
> >
> > Why do you want to take the block->lock in each iteration
> > of the loop rather than taking once for the whole loop?
>
> This loop calls classifier ops callback in tc_chain_fill_node(). I don't
> call any classifier ops callbacks while holding block or chain lock in
> this change because the goal is to achieve fine-grained locking for data
> structures used by filter update path. Locking per-block or per-chain is
> much coarser than taking reference counters to parent structures and
> allowing classifiers to implement their own locking.

That is the problem, when we have N filter chains in a block, you
lock and unlock mutex N times... And what __tcf_get_next_chain()
does is basically just retrieving the next entry in the list, so the
overhead of mutex is likely more than the list operation itself in
contention situation.

Now I can see why you complained about mutex before, it is
how you use it, not actually its own problem. :)

>
> In this case call to ops->tmplt_dump() is probably quite fast and its
> execution time doesn't depend on number of filters on the classifier, so
> releasing block->lock on each iteration doesn't provide much benefit, if
> at all. However, it is easier for me to reason about locking correctness
> in this refactoring by following a simple rule that no locks (besides
> rtnl mutex) can be held when calling classifier ops callbacks.

Well, for me, a hierarchy locking is always simple when you take
them in the right order, that is locking the larger-scope lock first
and then smaller-scope one.

The way you use the locking here is actually harder for me to
review, because it is hard to valid its atomicity when you unlock
the larger scope lock and re-take the smaller scope lock. You
use refcnt to ensure it will not go way, but that is still far from
guarantee of the atomicity.

For example, tp->ops->change() which changes an existing
filter, I don't see you lock either block->lock or
chain->filter_chain_lock when calling it. How does it even work?

Thanks.