From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Carrillo, Erik G" <erik.g.carrillo@intel.com>
Subject: Re: [PATCH 0/3] *** timer library enhancements ***
Date: Wed, 23 Aug 2017 19:28:23 +0000
Message-ID: <BE54F058557D9A4FAC1D84E2FC6D87570D52F407@fmsmsx115.amr.corp.intel.com>
References: <1503499644-29432-1-git-send-email-erik.g.carrillo@intel.com>
 <3F9B5E47-8083-443E-96EE-CBC41695BE43@intel.com>
 <BE54F058557D9A4FAC1D84E2FC6D87570D52F2CE@fmsmsx115.amr.corp.intel.com>
 <28C555FD-9BAB-4A6D-BB9B-37BD42B750AD@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "rsanford@akamai.com" <rsanford@akamai.com>, "dev@dpdk.org" <dev@dpdk.org>
To: "Wiles, Keith" <keith.wiles@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id DA6557D4F
 for <dev@dpdk.org>; Wed, 23 Aug 2017 21:28:26 +0200 (CEST)
In-Reply-To: <28C555FD-9BAB-4A6D-BB9B-37BD42B750AD@intel.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


> -----Original Message-----
> From: Wiles, Keith
> Sent: Wednesday, August 23, 2017 11:50 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: rsanford@akamai.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
>=20
>=20
> > On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G <erik.g.carrillo@intel.c=
om>
> wrote:
> >
> >
> >
> >> -----Original Message-----
> >> From: Wiles, Keith
> >> Sent: Wednesday, August 23, 2017 10:02 AM
> >> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> >> Cc: rsanford@akamai.com; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements
> >> ***
> >>
> >>
> >>> On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo
> >>> <erik.g.carrillo@intel.com>
> >> wrote:
> >>>
> >>> In the current implementation of the DPDK timer library, timers can
> >>> be created and set to be handled by a target lcore by adding it to a
> >>> skiplist that corresponds to that lcore.  However, if an application
> >>> enables multiple lcores, and each of these lcores repeatedly
> >>> attempts to install timers on the same target lcore, overall
> >>> application throughput will be reduced as all lcores contend to
> >>> acquire the lock guarding the single skiplist of pending timers.
> >>>
> >>> This patchset addresses this scenario by adding an array of
> >>> skiplists to each lcore's priv_timer struct, such that when lcore i
> >>> installs a timer on lcore k, the timer will be added to the ith
> >>> skiplist for lcore k.  If lcore j installs a timer on lcore k
> >>> simultaneously, lcores i and j can both proceed since they will be
> >>> acquiring different locks for different lists.
> >>>
> >>> When lcore k processes its pending timers, it will traverse each
> >>> skiplist in its array and acquire a skiplist's lock while a run list
> >>> is broken out; meanwhile, all other lists can continue to be modified=
.
> >>> Then, all run lists for lcore k are collected and traversed together
> >>> so timers are executed in their global order.
> >>
> >> What is the performance and/or latency added to the timeout now?
> >>
> >> I worry about the case when just about all of the cores are enabled,
> >> which could be as high was 128 or more now.
> >
> > There is a case in the timer_perf_autotest that runs rte_timer_manage
> with zero timers that can give a sense of the added latency.   When run w=
ith
> one lcore, it completes in around 25 cycles.  When run with 43 lcores (th=
e
> highest I have access to at the moment), rte_timer_mange completes in
> around 155 cycles.  So it looks like each added lcore adds around 3 cycle=
s of
> overhead for checking empty lists in my testing.
>=20
> Does this mean we have only 25 cycles on the current design or is the 25
> cycles for the new design?
>=20

Both - when run with one lcore, the new design becomes equivalent to the or=
iginal one.  I tested the current design to confirm.

> If for the new design, then what is the old design cost compared to the n=
ew
> cost.
>=20
> I also think we need the call to a timer function in the calculation, jus=
t to
> make sure we have at least one timer in the list and we account for any s=
hort
> cuts in the code for no timers active.
>=20

Looking at the numbers for non-empty lists in timer_perf_autotest, the over=
head appears to fall away.  Here are some representative runs for timer_per=
f_autotest:

43 lcores enabled, installing 1M timers on an lcore and processing them wit=
h current design:

<...snipped...>
Appending 1000000 timers
Time for 1000000 timers: 424066294 (193ms), Time per timer: 424 (0us)
Time for 1000000 callbacks: 73124504 (33ms), Time per callback: 73 (0us)
Resetting 1000000 timers
Time for 1000000 timers: 1406756396 (641ms), Time per timer: 1406 (1us)
<...snipped...>

43 lcores enabled, installing 1M timers on an lcore and processing them wit=
h proposed design:

<...snipped...>
Appending 1000000 timers
Time for 1000000 timers: 382912762 (174ms), Time per timer: 382 (0us)
Time for 1000000 callbacks: 79194418 (36ms), Time per callback: 79 (0us)
Resetting 1000000 timers
Time for 1000000 timers: 1427189116 (650ms), Time per timer: 1427 (1us)
<...snipped...>

The above are not averages, so the numbers don't really indicate which is f=
aster, but they show that the overhead of the proposed design should not be=
 appreciable.

> >
> >>
> >> One option is to have the lcore j that wants to install a timer on
> >> lcore k to pass a message via a ring to lcore k to add that timer. We
> >> could even add that logic into setting a timer on a different lcore
> >> then the caller in the current API. The ring would be a multi-producer=
 and
> single consumer, we still have the lock.
> >> What am I missing here?
> >>
> >
> > I did try this approach: initially I had a multi-producer single-consum=
er ring
> that would hold requests to add or delete a timer from lcore k's skiplist=
, but it
> didn't really give an appreciable increase in my test application through=
put.
> In profiling this solution, the hotspot had moved from acquiring the skip=
list's
> spinlock to the rte_atomic32_cmpset that the multiple-producer ring code
> uses to manipulate the head pointer.
> >
> > Then, I tried multiple single-producer single-consumer rings per target
> lcore.  This removed the ring hotspot, but the performance didn't increas=
e as
> much as with the proposed solution. These solutions also add overhead to
> rte_timer_manage, as it would have to process the rings and then process
> the skiplists.
> >
> > One other thing to note is that a solution that uses such messages chan=
ges
> the use models for the timer.  One interesting example is:
> > - lcore I enqueues a message to install a timer on lcore k
> > - lcore k runs rte_timer_manage, processes its messages and adds the
> > timer to its list
> > - lcore I then enqueues a message to stop the same timer, now owned by
> > lcore k
> > - lcore k does not run rte_timer_manage again
> > - lcore I wants to free the timer but it might not be safe
>=20
> This case seems like a mistake to me as lcore k should continue to call
> rte_timer_manager() to process any new timers from other lcores not just
> the case where the list becomes empty and lcore k does not add timer to h=
is
> list.
>=20
> >
> > Even though lcore I has successfully enqueued the request to stop the
> timer (and delete it from lcore k's pending list), it hasn't actually bee=
n
> deleted from the list yet,  so freeing it could corrupt the list.  This c=
ase exists
> in the existing timer stress tests.
> >
> > Another interesting scenario is:
> > - lcore I resets a timer to install it on lcore k
> > - lcore j resets the same timer to install it on lcore k
> > - then, lcore k runs timer_manage
>=20
> This one also seems like a mistake, more then one lcore setting the same
> timer seems like a problem and should not be done. A lcore should own a
> timer and no other lcore should be able to change that timer. If multiple
> lcores need a timer then they should not share the same timer structure.
>=20

Both of the above cases exist in the timer library stress tests, so a solut=
ion would presumably need to address them or it would be less flexible.  Th=
e original design passed these tests, as does the proposed one.

> >
> > Lcore j's message obviates lcore i's message, and it would be wasted wo=
rk
> for lcore k to process it, so we should mark it to be skipped over.   Han=
dling all
> the edge cases was more complex than the solution proposed.
>=20
> Hmmm, to me it seems simple here as long as the lcores follow the same
> rules and sharing a timer structure is very risky and avoidable IMO.
>=20
> Once you have lcores adding timers to another lcore then all accesses to =
that
> skip list must be serialized or you get unpredictable results. This shoul=
d also
> fix most of the edge cases you are talking about.
>=20
> Also it seems to me the case with an lcore adding timers to another lcore
> timer list is a specific use case and could be handled by a different set=
 of APIs
> for that specific use case. Then we do not need to change the current des=
ign
> and all of the overhead is placed on the new APIs/design. IMO we are
> turning the current timer design into a global timer design as it really =
is a per
> lcore design today and I beleive that is a mistake.
>=20

Well, the original API explicitly supports installing a timer to be execute=
d on a different lcore, and there are no API changes in the patchset.  Also=
, the proposed design keeps the per-lcore design intact;  it only takes wha=
t used to be one large skiplist that held timers for all installing lcores,=
 and separates it into N skiplists that correspond 1:1 to an installing lco=
re.  When an lcore processes timers on its lists it will still only be mana=
ging timers it owns, and no others. =20

> >
> >>>
> >>> Gabriel Carrillo (3):
> >>> timer: add per-installer pending lists for each lcore
> >>> timer: handle timers installed from non-EAL threads
> >>> doc: update timer lib docs
> >>>
> >>> doc/guides/prog_guide/timer_lib.rst |  19 ++-
> >>> lib/librte_timer/rte_timer.c        | 329 +++++++++++++++++++++++----=
--
> ---
> >> ----
> >>> lib/librte_timer/rte_timer.h        |   9 +-
> >>> 3 files changed, 231 insertions(+), 126 deletions(-)
> >>>
> >>> --
> >>> 2.6.4
> >>>
> >>
> >> Regards,
> >> Keith
>=20
> Regards,
> Keith