From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: [PATCH v2] Change alarm cancel function to thread-safe: Date: Mon, 29 Sep 2014 07:35:59 -0400 Message-ID: <20140929113559.GA26483@hmsreliant.think-freely.org> References: <20140926150156.GB5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D88E@IRSMSX104.ger.corp.intel.com> <20140926162134.GE5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D95F@IRSMSX104.ger.corp.intel.com> <20140926193905.GH5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com> <20140928204754.GC4012@localhost.localdomain> <2601191342CEEE43887BDE71AB977258213874C5@IRSMSX104.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "dev-VfR2kkLFssw@public.gmane.org" To: "Wodkowski, PawelX" Return-path: Content-Disposition: inline In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote: > > > > > > Image how you will be damned by someone that not even notice you change > > > and he Is managing some kind of resource based on returned number of > > > set/canceled timers. If you suddenly start returning negative values how those > > > application will behave? Silently changing returned value domain is evil in its > > > pure form. > > > > As I can see the impact is very limited. > > It is small impact to DPDK but can be huge to user application: > Ex: > If someone use this kind of expression in callback (skipping user app serialization part): > callback () { > ... > some_simple_semaphore += rte_alarm_cancel(...)); This code would be broken to begin with, as rte_eal_alarm_cancel is already written to return negative return codes. Its not documented as such, but its still the case. Note that if you run an application built against a shared library on BSD, the definition of rte_eal_alarm_cancel returns -ENOTSUP. The above code would be broken because it doesn't account for that. You can argue that the documentation should be updated, but the dpdk in the wild already conforms to the model Konstantin and I are proposing. > ... > } > > Anywhere in the code: > ... > If (some_simple_semapore) { > some_simple_semapore --; > if (rte_eal_alarm_set(...) != 0) > some_simple_semapore ++; > } > ... > > 1. Do you notice the change in cancel function? The application crashes, or otherwise misbehaves. > 2. How many hours you spend to find this issue in case of big app/system? You don't. Such a problem as you describe would very likely result in a semaphore deadlock, as the count would be incorrectly lowered, so you put watches on the variable, note that sometimes the count goes down on a cancel, which is completely counter-intuitive, read the updated documentation that indicates error codes are possible (which you should have been prepared for anyway), and move on with your day. > > > Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside alarm > > callback function might be affected. > > From other side, indeed, there could exist situations, when the caller needs to > > know > > was the alarm successfully cancelled or not. > > And if not by what reason. > > > > I can extend API of rte alarms to add alarm state checking in next patch, but for > now, since this is not urgent I think original patch v2 should be enough. I re-assert my origional argument here, without the above change, you haven't really fixed the race. If you can find another way to do it, thats fine with me, but keep in mind once again, that some implementations of rte_eal_alarm_set already do whats being proposed. Neil > > Pawel >