From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [RFC 0/3] tqs: add thread quiescent state library
Date: Fri, 30 Nov 2018 15:44:17 -0800
Message-ID: <20181130154417.7cf7349b@xeon-e3>
References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com>
 <20181127142803.423c9b00@xeon-e3>
 <E923DB57A917B54B9182A2E928D00FA65E31B1F7@IRSMSX102.ger.corp.intel.com>
 <AM6PR08MB3672F16A8B729EF14E38A2D098D10@AM6PR08MB3672.eurprd08.prod.outlook.com>
 <20181128152351.27fdebe3@xeon-e3>
 <AM6PR08MB36728730D1D606F5B8237EF298D30@AM6PR08MB3672.eurprd08.prod.outlook.com>
 <5c5db46f-e154-7932-0905-031e153c6016@ericsson.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, "Van Haaren, Harry"
 <harry.van.haaren@intel.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
 <nd@arm.com>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>, Malvika Gupta
 <Malvika.Gupta@arm.com>, "Gavin Hu (Arm Technology China)"
 <Gavin.Hu@arm.com>
To: Mattias =?UTF-8?B?UsO2bm5ibG9t?= <mattias.ronnblom@ericsson.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com
 [209.85.214.195]) by dpdk.org (Postfix) with ESMTP id E6D5B4C8D
 for <dev@dpdk.org>; Sat,  1 Dec 2018 00:44:20 +0100 (CET)
Received: by mail-pl1-f195.google.com with SMTP id t13so3514444ply.13
 for <dev@dpdk.org>; Fri, 30 Nov 2018 15:44:20 -0800 (PST)
In-Reply-To: <5c5db46f-e154-7932-0905-031e153c6016@ericsson.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, 30 Nov 2018 21:56:30 +0100
Mattias R=C3=B6nnblom <mattias.ronnblom@ericsson.com> wrote:

> On 2018-11-30 03:13, Honnappa Nagarahalli wrote:
> >>
> >> Reinventing RCU is not helping anyone. =20
> > IMO, this depends on what the rte_tqs has to offer and what the require=
ments are. Before starting this patch, I looked at the liburcu APIs. I have=
 to say, fairly quickly (no offense) I concluded that this does not address=
 DPDK's needs. I took a deeper look at the APIs/code in the past day and I =
still concluded the same. My partial analysis (analysis of more APIs can be=
 done, I do not have cycles at this point) is as follows:
> >=20
> > The reader threads' information is maintained in a linked list[1]. This=
 linked list is protected by a mutex lock[2]. Any additions/deletions/trave=
rsals of this list are blocking and cannot happen in parallel.
> >=20
> > The API, 'synchronize_rcu' [3] (similar functionality to rte_tqs_check =
call) is a blocking call. There is no option provided to make it non-blocki=
ng. The writer spins cycles while waiting for the grace period to get over.
> >  =20
>=20
> Wouldn't the options be call_rcu, which rarely blocks, or defer_rcu()=20
> which never? Why would the average application want to wait for the=20
> grace period to be over anyway?
>=20
> > 'synchronize_rcu' also has grace period lock [4]. If I have multiple wr=
iters running on data plane threads, I cannot call this API to reclaim the =
memory in the worker threads as it will block other worker threads. This me=
ans, there is an extra thread required (on the control plane?) which does g=
arbage collection and a method to push the pointers from worker threads to =
the garbage collection thread. This also means the time duration from delet=
e to free increases putting pressure on amount of memory held up.
> > Since this API cannot be called concurrently by multiple writers, each =
writer has to wait for other writer's grace period to get over (i.e. multip=
le writer threads cannot overlap their grace periods). =20
>=20
> "Real" DPDK applications typically have to interact with the outside=20
> world using interfaces beyond DPDK packet I/O, and this is best done via=
=20
> an intermediate "control plane" thread running in the DPDK application.=20
> Typically, this thread would also be the RCU writer and "garbage=20
> collector", I would say.
>=20
> >=20
> > This API also has to traverse the linked list which is not very well su=
ited for calling on data plane.
> >=20
> > I have not gone too much into rcu_thread_offline[5] API. This again nee=
ds to be used in worker cores and does not look to be very optimal.
> >=20
> > I have glanced at rcu_quiescent_state [6], it wakes up the thread calli=
ng 'synchronize_rcu' which seems good amount of code for the data plane.
> >  =20
>=20
> Wouldn't the typical DPDK lcore worker call rcu_quiescent_state() after=20
> processing a burst of packets? If so, I would more lean toward=20
> "negligible overhead", than "a good amount of code".
>=20
> I must admit I didn't look at your library in detail, but I must still=20
> ask: if TQS is basically RCU, why isn't it called RCU? And why isn't the=
=20
> API calls named in a similar manner?


We used liburcu at Brocade with DPDK. It was just a case of putting rcu_qui=
escent_state in the packet handling
loop. There were a bunch more cases where control thread needed to register=
/unregister as part of RCU.
I think any library would have that issue with user supplied threads.  You =
need a "worry about me" and
a "don't worry about me" API in the library.

There is also a tradeoff between call_rcu and defer_rcu about what context =
the RCU callback happens.
You really need a control thread to handle the RCU cleanup.

The point is that RCU steps into the application design, and liburcu seems =
to be flexible enough
and well documented enough to allow for more options.