From mboxrd@z Thu Jan  1 00:00:00 1970
From: Vlad Buslov <vladbu@mellanox.com>
Subject: Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
Date: Fri, 10 Aug 2018 13:29:11 +0300
Message-ID: <vbfeff6bjm0.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>
References: <1530800673-12280-1-git-send-email-vladbu@mellanox.com> <1530800673-12280-11-git-send-email-vladbu@mellanox.com> <CAM_iQpU+tWo4HVmQV+UCcLCmULd=kfsPZg+M+2rVvMbHN=2jww@mail.gmail.com> <vbfpnyt2hbw.fsf@reg-r-vrt-018-180.mtr.labs.mlnx> <CAM_iQpU=+gSrmV5+bSekr5SD+PgrYHN6YXsLpRC=_p09m0a0ew@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
        David Miller <davem@davemloft.net>,
        Jamal Hadi Salim <jhs@mojatatu.com>,
        Jiri Pirko <jiri@resnulli.us>,
        Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Yevgeny Kliteynik <kliteyn@mellanox.com>,
        Jiri Pirko <jiri@mellanox.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-eopbgr00041.outbound.protection.outlook.com ([40.107.0.41]:11440
        "EHLO EUR02-AM5-obe.outbound.protection.outlook.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1727209AbeHJM6n (ORCPT <rfc822;netdev@vger.kernel.org>);
        Fri, 10 Aug 2018 08:58:43 -0400
In-reply-to: <CAM_iQpU=+gSrmV5+bSekr5SD+PgrYHN6YXsLpRC=_p09m0a0ew@mail.gmail.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>


On Thu 09 Aug 2018 at 23:43, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Aug 8, 2018 at 5:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Wed 08 Aug 2018 at 01:20, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Implement function that atomically checks if action exists and either takes
>> >> reference to it, or allocates idr slot for action index to prevent
>> >> concurrent allocations of actions with same index. Use EBUSY error pointer
>> >> to indicate that idr slot is reserved.
>> >
>> > A dumb question:
>> >
>> > How could "concurrent allocations of actions with same index" happen
>> > as you already take idrinfo->lock for the whole
>> > tcf_idr_check_alloc()??
>>
>> I guess my changelog is not precise enough in this description.
>> Let look into sequence of events of initialization of new action:
>> 1) tcf_idr_check_alloc() is called by action init.
>> 2) idrinfo->lock is taken.
>> 3) Lookup in idr is performed to determine if action with specified
>> index already exists.
>> 4) EBUSY pointer is inserted to indicate that id is taken.
>> 5) idrinfo->lock is released.
>> 6) tcf_idr_check_alloc() returns to action init code.
>> 7) New action is allocated and initialized.
>> 8) tcf_idr_insert() is called.
>> 9) idrinfo->lock is taken.
>> 10) EBUSY pointer is substituted with pointer to new action.
>> 11) idrinfo->lock is released.
>> 12) tcf_idr_insert() returns.
>>
>> So in this case "concurrent allocations of actions with same index"
>> means not the allocation with same index during tcf_idr_check_alloc(),
>> but during the period when idrinfo->lock was released(6-8).
>
> Yes but it is unnecessary:
>
> a) When adding a new action, you can actually allocate and init it before
> touching idrinfo, therefore the check and insert can be done in one step
> instead of breaking down it into multiple steps, which means you can
> acquire idrinfo->lock once.
>
> b) When updating an existing action, it is slightly complicated.
> However, you can still allocate a new one first, then find the old one
> and copy it into the new one and finally replace it.
>
> In summary, we can do the following:
>
> 1. always allocate a new action
> 2. acquire idrinfo->lock
> 3a. if it is an add operation: allocate a new ID and insert the new action
> 3b. if it is a replace operation: find the old one with ID, copy it into the
> new one and replace it
> 4. release idrinfo->lock
> 5. If 3a or 3b fails, free the allocation. Otherwise succeed.
>
> I know, the locking scope is now per netns rather than per action,
> but this can be optimized for replacing, you can hold the old action
> and then release the idrinfo->lock, as idr_replace() later doesn't
> require idrinfo->lock AFAIK.
>
> Is there anything I miss here?

Approach you suggest is valid, but has its own trade-offs:

- As you noted, lock granularity becomes coarse-grained due to per-netns
scope.

- I am not sure it is possible to call idr_replace() without obtaining
idrinfo->lock in this particular case. Concurrent delete of action with
same id is possible and, according to idr_replace() description,
unlocked execution is not supported for such use-case:

/**
 * idr_replace() - replace pointer for given ID.
 * @idr: IDR handle.
 * @ptr: New pointer to associate with the ID.
 * @id: ID to change.
 *
 * Replace the pointer registered with an ID and return the old value.
 * This function can be called under the RCU read lock concurrently with
 * idr_alloc() and idr_remove() (as long as the ID being removed is not
 * the one being replaced!).
 *
 * Returns: the old value on success.  %-ENOENT indicates that @id was not
 * found.  %-EINVAL indicates that @ptr was not valid.
 */

- High rate or replace request will generate a lot of unnecessary memory
allocations and deallocations.

>
>
>>
>> >
>> > For me, it should be only one allocation could succeed, all others
>> > should fail.
>>
>> Correct! And this change is made specifically to enforce that rule.
>>
>> Otherwise, multiple processes could try to create new action with same
>> id at the same time, and all processes that executed 3, before any
>> process reached 10, will "succeed" by overwriting each others action in
>> idr. (and leak memory while doing so)
>
> I know but again it doesn't look necessary to achieve a same goal.
>
>
>>
>> >
>> > Maybe you are trying to prevent others treat it like existing one,
>> > but in that case you can just hold the idinfo->lock for all idr operations.
>> >
>> > And more importantly, upper layer is able to tell it is a creation or
>> > just replace, you don't have to check this in this complicated way.
>> >
>> > IOW, all of these complicated code should not exist.
>>
>> Original code was simpler and didn't involve temporary EBUSY pointer.
>> This change was made according to Jiri's request. He wanted to have
>> unified API to be used by all actions and suggested this approach
>> specifically.
>
> I will work on this, as this is aligned to my work to make
> it RCU-complete.