From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Van Haaren, Harry" <harry.van.haaren@intel.com>
Subject: Re: Service lcores and Application lcores
Date: Thu, 29 Jun 2017 16:35:05 +0000
Message-ID: <E923DB57A917B54B9182A2E928D00FA640C33FD9@IRSMSX102.ger.corp.intel.com>
References: <E923DB57A917B54B9182A2E928D00FA640C33E88@IRSMSX102.ger.corp.intel.com>
 <7268949.8nYIVvgy1g@xps>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, 'Jerin Jacob'
 <jerin.jacob@caviumnetworks.com>, "Wiles, Keith" <keith.wiles@intel.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>
Return-path: <dev-bounces@dpdk.org>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id 923542C2A
 for <dev@dpdk.org>; Thu, 29 Jun 2017 18:35:09 +0200 (CEST)
In-Reply-To: <7268949.8nYIVvgy1g@xps>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, June 29, 2017 4:16 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org; 'Jerin Jacob' <jerin.jacob@caviumnetworks.com>; Wiles, =
Keith
> <keith.wiles@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: Service lcores and Application lcores
>=20
> 29/06/2017 16:36, Van Haaren, Harry:
> > The topic of discussion in this thread is how we can ensure
> > that application lcores do not interfere with service cores.
>=20
> Please could you give more details on the issue?

Sure, hope I didn't write too much!


> I think you need to clearly explain the 3 types of cores:
> 	- DPDK mainloop
> 	- DPDK services
> 	- not used by DPDK


DPDK cores continue to function as they currently do, with the execption th=
at service-cores are removed from the coremask. Details in 0) below.

DPDK service cores run services - they are not visible to the application. =
(AKA; the application does not perform any remote_launch() on these cores, =
it is handled internally in EAL). Service lcores are just normal lcores, on=
ly the
lcore_config[lcore_id].core_role =3D=3D ROLE_SERVICE instead of ROLE_RTE.

Non DPDK cores are not changed.


I'll run through the following scenarios to detail the problem;

0) Explain where service cores come from in relation to non-DPDK cores
1) Describe the current usage of DPDK cores, and how the eventdev scheduler=
 is used
2) Introduce the a service core only usage of eventdev
3) Introduce the problem: service cores and application cores concurrently =
run a multi-thread unsafe service
4) The proposed solution


0) At application startup, the EAL coremask detects DPDK cores, and "brings=
 them up".
   Service cores are "stolen" from the previous core-mask, so the service-c=
ore mask is a subset of the EAL coremask.
   Service cores are marked as ROLE_SERVICE, and the application "FOR_EACH_=
LCORE" will not use them.
   Non-DPDK cores are not affected - they remain as they were.


1) Currently, a DPDK application directly invokes rte_eventdev_schedule() u=
sing an ordinary app lcore.
   The application is responsible for multiplexing work on cores (assuming =
multiple PMDs are running on one core).
   The app logic *must* be updated if we wish to add more SW workloads (aka=
, using a SW PMD instead of HW acceleration).
   This change in app logic is the workaround to DPDK failing to abstract H=
W / SW PMD requirements.
   Service cores provides the abstraction of environment (SW/HW PMD) differ=
ence to the application.


2) In a service core "only" situation, the SW PMD registers a service. This=
 service is run on a service core.
   The application logic does not have to change, as the service-core runni=
ng the service is not under app control.
   Note that the application does NOT call rte_eventdev_schedule() as in 1)=
 above, since the service core now performs this.


3) The problem;
   If a service core runs the SW PMD schedule() function (option 2) *AND*
   the application lcore runs schedule() func (option 1), the result is tha=
t
   two threads are concurrently running a multi-thread unsafe function.

   The issue is not that the application is wrong: it correctly called rte_=
schedule()
   It is also not that service core infra is wrong: it correctly ran the se=
rvice
   The combination of both (and the un-awareness of eachother) that causes =
the issue.


4) The proposed solution;
   In order to ensure that multiple threads do not concurrently run a multi=
-thread unsafe service function,
   the threads must be aware of runtime of the other threads. The service c=
ore code handles this
   using an atomic operation per service; multiple service cores operate co=
rrectly, no problem.
   The root cause of the issue is that the application cores are not using =
the service atomic.

   The rte_service_run() function, allows the application to be aware of se=
rvice-core runtime habits
   due to calling into the service library, and running the service from th=
ere. With this additional rule,
   all cores (service and application owned) will be aware of eachother, an=
d can run multi-thread unsafe
   services safely, in a co-operative manner.

   In order to allow the application core still run the eventdev PMD "manua=
lly" if it insists,
   I am proposing to allow it to invoke a specific function, which is aware=
 of the service
   atomic. This will result in all cores "playing nice", regardless of if i=
t is app or service owned.

   The rte_service_run() function (which allows an application-lcore to run=
 a service) allows
   much easier porting of applications to the service-core infrastructure. =
It is easier because
   the threading model of the application does not have to change, it looks=
 up the service it
   would like to run, and can repeatedly call the rte_service_run() functio=
n to have the application
   behave in the same way as before the service core addition.


Ok, this got longer than intended - but hopefully clearly articulates the m=
otivation for the rte_service_run() concept.

Regards, -Harry