From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [RFC 0/3] tqs: add thread quiescent state library Date: Fri, 30 Nov 2018 15:44:17 -0800 Message-ID: <20181130154417.7cf7349b@xeon-e3> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20181127142803.423c9b00@xeon-e3> <20181128152351.27fdebe3@xeon-e3> <5c5db46f-e154-7932-0905-031e153c6016@ericsson.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Honnappa Nagarahalli , "Van Haaren, Harry" , "dev@dpdk.org" , nd , Dharmik Thakkar , Malvika Gupta , "Gavin Hu (Arm Technology China)" To: Mattias =?UTF-8?B?UsO2bm5ibG9t?= Return-path: Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by dpdk.org (Postfix) with ESMTP id E6D5B4C8D for ; Sat, 1 Dec 2018 00:44:20 +0100 (CET) Received: by mail-pl1-f195.google.com with SMTP id t13so3514444ply.13 for ; Fri, 30 Nov 2018 15:44:20 -0800 (PST) In-Reply-To: <5c5db46f-e154-7932-0905-031e153c6016@ericsson.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, 30 Nov 2018 21:56:30 +0100 Mattias R=C3=B6nnblom wrote: > On 2018-11-30 03:13, Honnappa Nagarahalli wrote: > >> > >> Reinventing RCU is not helping anyone. =20 > > IMO, this depends on what the rte_tqs has to offer and what the require= ments are. Before starting this patch, I looked at the liburcu APIs. I have= to say, fairly quickly (no offense) I concluded that this does not address= DPDK's needs. I took a deeper look at the APIs/code in the past day and I = still concluded the same. My partial analysis (analysis of more APIs can be= done, I do not have cycles at this point) is as follows: > >=20 > > The reader threads' information is maintained in a linked list[1]. This= linked list is protected by a mutex lock[2]. Any additions/deletions/trave= rsals of this list are blocking and cannot happen in parallel. > >=20 > > The API, 'synchronize_rcu' [3] (similar functionality to rte_tqs_check = call) is a blocking call. There is no option provided to make it non-blocki= ng. The writer spins cycles while waiting for the grace period to get over. > > =20 >=20 > Wouldn't the options be call_rcu, which rarely blocks, or defer_rcu()=20 > which never? Why would the average application want to wait for the=20 > grace period to be over anyway? >=20 > > 'synchronize_rcu' also has grace period lock [4]. If I have multiple wr= iters running on data plane threads, I cannot call this API to reclaim the = memory in the worker threads as it will block other worker threads. This me= ans, there is an extra thread required (on the control plane?) which does g= arbage collection and a method to push the pointers from worker threads to = the garbage collection thread. This also means the time duration from delet= e to free increases putting pressure on amount of memory held up. > > Since this API cannot be called concurrently by multiple writers, each = writer has to wait for other writer's grace period to get over (i.e. multip= le writer threads cannot overlap their grace periods). =20 >=20 > "Real" DPDK applications typically have to interact with the outside=20 > world using interfaces beyond DPDK packet I/O, and this is best done via= =20 > an intermediate "control plane" thread running in the DPDK application.=20 > Typically, this thread would also be the RCU writer and "garbage=20 > collector", I would say. >=20 > >=20 > > This API also has to traverse the linked list which is not very well su= ited for calling on data plane. > >=20 > > I have not gone too much into rcu_thread_offline[5] API. This again nee= ds to be used in worker cores and does not look to be very optimal. > >=20 > > I have glanced at rcu_quiescent_state [6], it wakes up the thread calli= ng 'synchronize_rcu' which seems good amount of code for the data plane. > > =20 >=20 > Wouldn't the typical DPDK lcore worker call rcu_quiescent_state() after=20 > processing a burst of packets? If so, I would more lean toward=20 > "negligible overhead", than "a good amount of code". >=20 > I must admit I didn't look at your library in detail, but I must still=20 > ask: if TQS is basically RCU, why isn't it called RCU? And why isn't the= =20 > API calls named in a similar manner? We used liburcu at Brocade with DPDK. It was just a case of putting rcu_qui= escent_state in the packet handling loop. There were a bunch more cases where control thread needed to register= /unregister as part of RCU. I think any library would have that issue with user supplied threads. You = need a "worry about me" and a "don't worry about me" API in the library. There is also a tradeoff between call_rcu and defer_rcu about what context = the RCU callback happens. You really need a control thread to handle the RCU cleanup. The point is that RCU steps into the application design, and liburcu seems = to be flexible enough and well documented enough to allow for more options.