From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matan Azrad Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling Date: Thu, 14 Dec 2017 13:07:31 +0000 Message-ID: References: <1509637324-13525-1-git-send-email-matan@mellanox.com> <1513175370-16583-1-git-send-email-matan@mellanox.com> <1513175370-16583-5-git-send-email-matan@mellanox.com> <20171213151641.g42zr7zupbsdgxsv@bidouze.vm.6wind.com> <20171213160916.e3rmxmhfhqz72wco@bidouze.vm.6wind.com> <20171213215545.kywwximn2g5xm5x5@bidouze.vm.6wind.com> <20171214104856.d5qgnawuzb54l36z@bidouze.vm.6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Cc: Adrien Mazarguil , Thomas Monjalon , "dev@dpdk.org" , "stable@dpdk.org" To: =?iso-8859-1?Q?Ga=EBtan_Rivet?= Return-path: In-Reply-To: <20171214104856.d5qgnawuzb54l36z@bidouze.vm.6wind.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Gaetan > -----Original Message----- > From: Ga=EBtan Rivet [mailto:gaetan.rivet@6wind.com] > Sent: Thursday, December 14, 2017 12:49 PM > To: Matan Azrad > Cc: Adrien Mazarguil ; Thomas Monjalon > ; dev@dpdk.org; stable@dpdk.org > Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling >=20 > On Thu, Dec 14, 2017 at 10:40:22AM +0000, Matan Azrad wrote: > > Hi Gaetan > > >=20 > >=20 > > > > > > > > If you add this check in the iterator itself, you would skip > > > > removed devices before attempting operating upon them, right? > > > > > > > > Then it should probably help with your issue, unless you tested it > > > > and verified that it didnt? > > > > > > > > Something like this: > > > > > > > > ---8<--- > > > > > > > > diff --git a/drivers/net/failsafe/failsafe_private.h > > > > b/drivers/net/failsafe/failsafe_private.h > > > > index d81cc3ca6..62ddc0689 100644 > > > > --- a/drivers/net/failsafe/failsafe_private.h > > > > +++ b/drivers/net/failsafe/failsafe_private.h > > > > @@ -316,8 +316,12 @@ fs_find_next(struct rte_eth_dev *dev, > > > > subs =3D PRIV(dev)->subs; > > > > tail =3D PRIV(dev)->subs_tail; > > > > while (sid < tail) { > > > > + if (min_state > DEV_PROBED && > > > > + fs_is_removed(&sub[sid])) > > > > + goto next; > > > > if (subs[sid].state >=3D min_state) > > > > break; > > > > +next: > > > > sid++; > > > > } > > > > *sid_out =3D sid; > > > > > > > > --->8--- > > > > > > > > Only issue being that it is completely racy, but as this MT-unsafe > > > > property is inescapable we might as well ignore it and go for KISS. > > > > > > > > If that's enough, I would prefer instead of having this additional > > > > check added to all rte_eth operations. > > > > > > > > > > Ok, actually you were right here to do it this way. The "is_removed" > > > check needs to happen after the operation attempt to effectively > > > mitigate the possible race. Checking before attempting the call will > > > be much less effective. > > > > > > That being said, would it be cleaner to have eth_dev ops return > > > -ENODEV directly, and check against it within fail-safe? > > > > > > > I think that according to "is_removed" semantic we must return a Boolea= n > value (Each value different from '0' means that the device is removed) li= ke > other functions in c library (for example isspace()). > > >=20 > Sure, I wasn't discussing the interface proposed by > rte_eth_dev_is_removed(). >=20 > What I meant was to ask whether checking rte_eth_dev_is_removed() > would be more interesting in the ethdev layer, making the eth_dev_ops > return -ENODEV regardless of the previous error if this check is supporte= d by > the driver and signal that the port is removed. >=20 > I think this information could be interesting to other systems, not just = fail- > safe. >=20 Ok. Got you now. Interesting approach - plan: 1. update fs_link_update to use rte_eth* functions. 2. maybe -EIO is preferred because -ENODEV is used for no port error? 3. update all relevant rte_eth* to use "is_removed" in error flows(1 patch= for flow APIs and 1 for the others). 4. Change fs checks in error flows to check rte_eth* return values. 5. Remove CC stable from commit massage. What do you think? > -- > Ga=EBtan Rivet > 6WIND