From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=CkOB=YD=dpdk.org=dev-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CE246C4360C
	for <dpdk-dev@archiver.kernel.org>; Thu, 10 Oct 2019 15:10:03 +0000 (UTC)
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by mail.kernel.org (Postfix) with ESMTP id 6699B206A1
	for <dpdk-dev@archiver.kernel.org>; Thu, 10 Oct 2019 15:10:03 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6699B206A1
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 2FFFF1E4E2;
	Thu, 10 Oct 2019 17:10:02 +0200 (CEST)
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
 by dpdk.org (Postfix) with ESMTP id 2D7BC1DFEC
 for <dev@dpdk.org>; Thu, 10 Oct 2019 17:09:59 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 10 Oct 2019 08:09:58 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.67,280,1566889200"; d="scan'208";a="207207810"
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159])
 by fmsmga001.fm.intel.com with ESMTP; 10 Oct 2019 08:09:56 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.164]) by
 IRSMSX104.ger.corp.intel.com ([169.254.5.103]) with mapi id 14.03.0439.000;
 Thu, 10 Oct 2019 16:09:56 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>
CC: "Wang, Yipeng1" <yipeng1.wang@intel.com>, "Medvedkin, Vladimir"
 <vladimir.medvedkin@intel.com>, "Ruifeng Wang (Arm Technology China)"
 <Ruifeng.Wang@arm.com>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
 "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, nd <nd@arm.com>
Thread-Topic: [PATCH v3 1/3] lib/ring: add peek API
Thread-Index: AQHVeCGUIgBwOBQCI0OY47a7OfyurKdHoInwgAGlroCABaC3MIACyz6AgAJU8SA=
Date: Thu, 10 Oct 2019 15:09:55 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772580191975145@irsmsx105.ger.corp.intel.com>
References: <20190906094534.36060-1-ruifeng.wang@arm.com>
 <20191001062917.35578-1-honnappa.nagarahalli@arm.com>
 <20191001062917.35578-2-honnappa.nagarahalli@arm.com>
 <2601191342CEEE43887BDE71AB9772580191970014@irsmsx105.ger.corp.intel.com>
 <VE1PR08MB514927AFB3192DCBA63F6B82989F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB9772580191971EBE@irsmsx105.ger.corp.intel.com>
 <VE1PR08MB5149988C9B8B2E9A556C5EAE98950@VE1PR08MB5149.eurprd08.prod.outlook.com>
In-Reply-To: <VE1PR08MB5149988C9B8B2E9A556C5EAE98950@VE1PR08MB5149.eurprd08.prod.outlook.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjU0NmI0NGUtZDFmOS00YTQ2LWEwMWEtOWRjNWY0OTBkYmY3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiSjd2NlEzbHN6cEluNHdOZmhQSFFDZWJlbFR1T0RGQitwR0ZRM200czBqZ2pqc05vSmF2VWFlQTBRV3hNUFwvZ1oifQ==
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.2.0.6
dlp-reaction: no-action
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


> <snip>
>=20
> >
> > >
> > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > >
> > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >
> > > > > The peek API allows fetching the next available object in the rin=
g
> > > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > > objects depend on their value.
> > > > >
> > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 30 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > > void
> > > > **obj_table,
> > > > >  				r->cons.single, available);
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * Peek one object from a ring.
> > > > > + *
> > > > > + * The peek API allows fetching the next available object in the
> > > > > +ring
> > > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > > +respect
> > > > > + * to other consumer threads.
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_p
> > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > + * @return
> > > > > + *   - 0: Success, object available
> > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline int
> > > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > > >
> > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > follow other rte_ring functions naming conventions
> > > > (rte_ring_sc_peek() or so).
> > > Agree
> > >
> > > >
> > > > As a better alternative what do you think about introducing a
> > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > Something like that:
> > > >
> > > > /* same as original ring dequeue, but:
> > > >   * 1) move cons.head only if cons.head =3D=3D const.tail
> > > >   * 2) don't update cons.tail
> > > >   */
> > > > unsigned int
> > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,
> > > >                 unsigned int *available);
> > > >
> > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > > >
> > > > /* resets cons.head to const.tail value */ void
> > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > >
> > > > Then your dq_reclaim cycle function will look like that:
> > > >
> > > > const uint32_t nb_elt =3D  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > uintptr_t elt[nb_elt]; ...
> > > >
> > > > do {
> > > >
> > > >   /* read next elem from the queue */
> > > >   n =3D rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > >   if (n =3D=3D 0)
> > > >       break;
> > > >
> > > >  /* wrong period, keep elem in the queue */  if
> > > > (rte_rcu_qsbr_check(dr->v,
> > > > elt[0]) !=3D 1) {
> > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > >      break;
> > > >   }
> > > >
> > > >   /* can reclaim, remove elem from the queue */
> > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > >
> > > >    /*call reclaim function */
> > > >   dr->f(dr->p, elt);
> > > >
> > > > } while (avl >=3D nb_elt);
> > > >
> > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > As long as actual reclamation callback itself is MT safe of course.
> > >
> > > I think it is a great idea. The other writers would still be polling
> > > for the current writer to update the tail or update the head. This ma=
kes it a
> > blocking solution.
> >
> > Yep, it is a blocking one.
> >
> > > We can make the other threads not poll i.e. they will quit reclaiming=
 if they
> > see that other writers are dequeuing from the queue.
> >
> > Actually didn't think about that possibility, but yes should be possibl=
e to have
> > _try_ semantics too.
> >
> > >The other  way is to use per thread queues.
> > >
> > > The other requirement I see is to support unbounded-size data
> > > structures where in the data structures do not have a pre-determined
> > > number of entries. Also, currently the defer queue size is equal to t=
he total
> > number of entries in a given data structure. There are plans to support
> > dynamically resizable defer queue. This means, memory allocation which =
will
> > affect the lock-free-ness of the solution.
> > >
> > > So, IMO:
> > > 1) The API should provide the capability to support different algorit=
hms -
> > may be through some flags?
> > > 2) The requirements for the ring are pretty unique to the problem we
> > > have here (for ex: move the cons-head only if cons-tail is also the s=
ame, skip
> > polling). So, we should probably implement a ring with-in the RCU libra=
ry?
> >
> > Personally, I think such serialization ring API would be useful for oth=
er cases
> > too.
> > There are few cases when user need to read contents of the queue withou=
t
> > removing elements from it.
> > Let say we do use similar approach inside TLDK to implement TCP transmi=
t
> > queue.
> > If such API would exist in DPDK we can just use it straightway, without
> > maintaining a separate one.
> ok
>=20
> >
> > >
> > > From the timeline perspective, adding all these capabilities would be
> > > difficult to get done with in 19.11 timeline. What I have here
> > > satisfies my current needs. I suggest that we make provisions in APIs=
 now to
> > support all these features, but do the implementation in the coming rel=
eases.
> > Does this sound ok for you?
> >
> > Not sure I understand your suggestion here...
> > Could you explain it a bit more - how new API will look like and what w=
ould
> > be left for the future.
> For this patch, I suggest we do not add any more complexity. If someone w=
ants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
>=20
> We push the following to the future:
> 1) Dynamically size adjustable defer queue. IMO, with this, the lock-free=
/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed=
 in chunks of 'size' elements as the queue grows/shrinks.

That one is fine by me.
In fact I don't know would be there a real use-case for dynamic defer queue=
 for rcu var...
But I suppose that's subject for another discussion.

>=20
> 2) Constant size defer queue with lock-free and block-free reclamation (s=
ingle option). The defer queue will be of fixed length 'size'. If the
> queue gets full an error is returned. The user could provide a 'size' equ=
al to the number of elements in a data structure to ensure queue
> never gets full.

Ok so for 19.11 what enqueue/dequeue model do you plan to support?
- MP/MC
- MP/SC
- SP/SC
- non MT at all (only same single thread can do enqueue and dequeue)

And related question:
What additional rte_ring API you plan to introduce in that case?
- None
- rte_ring_sc_peek()
- rte_ring_serial_dequeue()

>=20
> I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #=
defines, one for dynamically variable size defer queue and the
> other for constant size defer queue.
>=20
> However, IMO, using per thread defer queue is a much simpler way to achie=
ve 2. It does not add any significant burden to the user either.
>=20
> >
> > >
> > > >
> > > > > +{
> > > > > +	uint32_t prod_tail =3D r->prod.tail;
> > > > > +	uint32_t cons_head =3D r->cons.head;
> > > > > +	uint32_t count =3D (prod_tail - cons_head) & r->mask;
> > > > > +	unsigned int n =3D 1;
> > > > > +	if (count) {
> > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return -ENOENT;
> > > > > +}
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > --
> > > > > 2.17.1