From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FE72C433F5 for ; Thu, 5 May 2022 01:16:36 +0000 (UTC) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 890C54068A; Thu, 5 May 2022 03:16:35 +0200 (CEST) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id 262F940689; Thu, 5 May 2022 03:16:34 +0200 (CEST) Received: from kwepemi500012.china.huawei.com (unknown [172.30.72.55]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Ktwdf56kBzCsXZ; Thu, 5 May 2022 09:11:50 +0800 (CST) Received: from [10.67.103.128] (10.67.103.128) by kwepemi500012.china.huawei.com (7.221.188.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 5 May 2022 09:16:31 +0800 Subject: Re: [PATCH V2 1/4] net/bonding: fix non-active slaves aren't stopped To: Ferruh Yigit , CC: Huisong Li , , Chas Williams , Radu Nicolau , Andrew Rybchenko , Thomas Monjalon References: <20211025063922.34066-1-humin29@huawei.com> <20220324030036.4761-1-humin29@huawei.com> <20220324030036.4761-2-humin29@huawei.com> <212e85ef-228a-d31e-e5a1-8b2331c7eef9@xilinx.com> <447752cb-d01b-df08-b5b9-0d331a20bf26@huawei.com> <0b879860-1465-4bd2-a15a-530306202586@xilinx.com> From: "Min Hu (Connor)" Message-ID: Date: Thu, 5 May 2022 09:16:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: <0b879860-1465-4bd2-a15a-530306202586@xilinx.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.128] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemi500012.china.huawei.com (7.221.188.12) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, Ferruh, 在 2022/5/4 3:04, Ferruh Yigit 写道: > On 5/3/2022 7:54 AM, Min Hu (Connor) wrote: >> Hi, Ferruh, >> >> 在 2022/4/29 21:31, Ferruh Yigit 写道: >>> On 4/29/2022 7:45 AM, Min Hu (Connor) wrote: >>>> Hi, Ferruh, >>>> >>>> 在 2022/4/27 2:19, Ferruh Yigit 写道: >>>>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote: >>>>>> From: Huisong Li >>>>>> >>>>>> When stopping a bonded port, all slaves should be deactivated. But >>>>>> only >>>>> >>>>> s/deactivated/stopped/ ? >>>> not agreed. deactivated and stopped are different state for slave. >>>> >>> >>> Just to clarify the sentences, otherwise I see the 'stopped' and >>> 'deactivated' states are different. >>> Next sentences complains that not all ports are stopped: "But only >>> active slaves are stopped.", so I thought intention in this sentences >>> to claim that all slaves should be stopped (but it mentions all >>> slaves should be 'deactivated'). >>> As long as you address the disconnection between two sentences, I >>> don't mind the wording. >> Actually, there is something wrong with the wording. >> Yes, I should take your advice. >> >>> >>>>> >>>>>> active slaves are stopped. So fix it and do "deactivae_slave()" >>>>>> for active >>>>> >>>>> s/deactivae_slave()/deactivate_slave()/ >>>>> >>>> agreed. >>>> >>>>>> slaves. >>>>> >>>>> Hi Connor, >>>>> >>>>> When a bonding port is closed, is it clear if all slave ports or >>>>> active slave ports should be stopped? >>>> Yes, I think all the slave ports should be stopped(or try to be >>>> stopped). >>>>> >>>>>> >>>>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 >>>>>> port") >>>>>> Cc: stable@dpdk.org >>>>>> >>>>>> Signed-off-by: Huisong Li >>>>>> Signed-off-by: Min Hu (Connor) >>>>>> --- >>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++--------- >>>>>>   1 file changed, 11 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>> index b305b6a35b..469dc71170 100644 >>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev) >>>>>>       internals->link_status_polling_enabled = 0; >>>>>>       for (i = 0; i < internals->slave_count; i++) { >>>>>>           uint16_t slave_id = internals->slaves[i].port_id; >>>>>> + >>>>>> +        internals->slaves[i].last_link_status = 0; >>>>>> +        ret = rte_eth_dev_stop(slave_id); >>>>>> +        if (ret != 0) { >>>>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u", >>>>>> +                     slave_id); >>>>>> +            return ret; >>>>> >>>>> Should it return here or try to stop all ports? >>>>> What about to record the return status, but keep continue to stop >>>>> all ports. And return error if any of the stop failed? >> Well, I am glad you have found something unreasaonable about 'stop'. >> Let us see API 'rte_eth_dev_stop' >> >> rte_eth_dev_stop(dev) >> { >>      .... >>      dev->data->dev_started = 0; >>      ret = (*dev->dev_ops->dev_stop)(dev) >>      retur ret; >> } >> This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail, >> the state 'dev_started ' will always set to be '0'. >> >> But this does not only influence bonding device but other devices like >> eth dev or vdev. >> This is the bug in rte ethdev level. I will send another patch to fix >> it. >> > > Hi Connor, > > I agree this is an issue in the API, cc'ed Andrew and Thomas. > > I vaguely remember that "dev_started = 0" was required for some dev_ops, > but not quite sure, let me check this. > At worst we can do as following to be sure: > >   dev->data->dev_started = 0; >   ret = (*dev->dev_ops->dev_stop)(dev) >   if (ret) >     dev->data->dev_started = 1; > > Also we need to clarify in the API documentation (.h file), what is the > status of the device if 'rte_eth_dev_stop()' returned error. > > > Btw, would you be OK to separate this ethdev patch from your bonding > patch, to not stuck your series because of ethdev one. Yes, this patch can be abandoned from this set. > > >> >>>> I think no need to do this. APP only see the bonded device. If bonded >>>> device stop failed, it means it works failed. And the number of >>>> "stopped" successfully slave does not make any sense. >>>> >>> >>> OK if trying to stop as much as possible 'slave' devices doesn't make >>> sense, we can keep as it is. >>> >>> Btw, when functions fails at this point, bonding device itself >>> already marked as stopped, right? And some of the slave devices may >>> be stopped already before failure. >>> I don't know how confusing this is for the user, that stop() function >>> is failed but bonding device state is 'stopped'. I don't know if >>> function should recover at least bonding device status (back to >>> started) on failure, what do you think? >>> >>>>> >>>>>> +        } >>>>>> + >>>>>> +        /* active slaves need to deactivate. */ >>>>> >>>>> " active slaves need to be deactivated. " ? >>>> agreed. >>>>> >>>>>>           if (find_slave_by_id(internals->active_slaves, >>>>>>                   internals->active_slave_count, slave_id) != >>>>>> -                        internals->active_slave_count) { >>>>>> -            internals->slaves[i].last_link_status = 0; >>>>>> -            ret = rte_eth_dev_stop(slave_id); >>>>>> -            if (ret != 0) { >>>>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port >>>>>> %u", >>>>>> -                         slave_id); >>>>>> -                return ret; >>>>>> -            } >>>>>> +                internals->active_slave_count) >>>>> >>>>> I think original indentation for this line is better. >>>>> >>>> agreed. >>>>>>               deactivate_slave(eth_dev, slave_id); >>>>>> -        } >>>>>>       } >>>>>>       return 0; >>>>> >>>>> . >>> >>> . > > .