From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Wodkowski, PawelX" <pawelx.wodkowski-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v2] Change alarm cancel function to
 thread-safe:
Date: Fri, 26 Sep 2014 14:01:05 +0000
Message-ID: <F6F2A6264E145F47A18AB6DF8E87425D12B39F15@IRSMSX102.ger.corp.intel.com>
References: <1411649768-8084-1-git-send-email-michalx.k.jastrzebski@intel.com>
 <20140925150807.GD32725@hmsreliant.think-freely.org>
 <2601191342CEEE43887BDE71AB977258213769DE@IRSMSX105.ger.corp.intel.com>
 <20140925172358.GG32725@hmsreliant.think-freely.org>
 <2601191342CEEE43887BDE71AB97725821378B50@IRSMSX104.ger.corp.intel.com>
 <20140926114630.GA3930@hmsreliant.think-freely.org>
 <F6F2A6264E145F47A18AB6DF8E87425D12B39EB5@IRSMSX102.ger.corp.intel.com>
 <20140926134014.GB3930@hmsreliant.think-freely.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "dev-VfR2kkLFssw@public.gmane.org" <dev-VfR2kkLFssw@public.gmane.org>
To: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Return-path: <dev-bounces-VfR2kkLFssw@public.gmane.org>
In-Reply-To: <20140926134014.GB3930-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
Content-Language: en-US
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev-VfR2kkLFssw@public.gmane.org>
List-Help: <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=subscribe>
Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org
Sender: "dev" <dev-bounces-VfR2kkLFssw@public.gmane.org>

> > > Maybe I don't see something obvious? :)
>=20
> I think you're missing the fact that your patch doesn't do what you asser=
t above
> either :)

Issue is not in setting alarms but canceling it. If you look closer to my p=
atch you
see that it address this issue (look at added *do { lock(); ....; unlock();=
 } while( )*=20
part).

>=20
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in =
this
> alarm implementation, because theres no ability to refer to a specific al=
arm
> from the callers perspective.  When you call rte_eal_alarm_set you get a =
new
> alarm every time.  So I don't really see a race there.  It might not be e=
xactly
> the behavior you want, but its not a race, becuase you're not modifying a=
n
> alarm
> in the middle of execution, you're just creating a new alarm, which is sa=
fe.

OK, it is safe, but this is not the case.

>=20
> There is a race in what you describe above, insofar as its possible that =
you
> might call rte_eal_alarm_cancel and return without having canceled all th=
e
> matching alarms.  I don't see any clear documentation on what the behavio=
r is
> supposed to be, but if you want to ensure that all matching alarms are ca=
ncelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in=
 linux
> API parlance, thats usually denoted as a cancel_sync operation).

Again, look at the patch. I changed documentation to inform about this beha=
vior.

>=20
> For that race condition, you're correct, my patch doesn't address it, I s=
ee that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel=
 from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessec=
ary
> evil, I grant you, but it means that you can't be guaranteed the cancelle=
d and
> complete (cancel_sync) behavior that you want, at least not with the curr=
ent
> api.  If you want that behavior, you need to do one of two things:

This patch does not break any API. It only removes undefined behavior.

>=20
> 1) Modify the api to allow callers to individually reference timer instan=
ces, so
> that when cancelling, we can return an appropriate return code to indicat=
e to
> the caller that this alarm is in-progress.  That way you can guarantee th=
e
> caller that the specific alarm that you cancelled is either complete and =
cancelled
> or currently executing.  Add an api to expicitly wait on a referenced ala=
rm as
> well.  This allows developers to know that, when executing an alarm callb=
ack, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.

This would brake API for sure.