All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	Olivier Matz <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [RFC v2 1/1] lib/ring: add scatter gather APIs
Date: Mon, 12 Oct 2020 17:06:30 +0000	[thread overview]
Message-ID: <BYAPR11MB3301191B27890F387D48D34B9A070@BYAPR11MB3301.namprd11.prod.outlook.com> (raw)
In-Reply-To: <DBAPR08MB581406C02A163D1F2601841D98080@DBAPR08MB5814.eurprd08.prod.outlook.com>



 
> <snip>
> 
> > > > > Hi Honnappa,
> > > > >
> > > > > From a quick walkthrough, I have some questions/comments, please
> > > > > see below.
> > > > Hi Olivier, appreciate your input.
> > > >
> > > > >
> > > > > On Tue, Oct 06, 2020 at 08:29:05AM -0500, Honnappa Nagarahalli wrote:
> > > > > > Add scatter gather APIs to avoid intermediate memcpy. Use cases
> > > > > > that involve copying large amount of data to/from the ring can
> > > > > > benefit from these APIs.
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/meson.build        |   3 +-
> > > > > >  lib/librte_ring/rte_ring_elem.h    |   1 +
> > > > > >  lib/librte_ring/rte_ring_peek_sg.h | 552
> > > > > > +++++++++++++++++++++++++++++
> > > > > >  3 files changed, 555 insertions(+), 1 deletion(-)  create mode
> > > > > > 100644 lib/librte_ring/rte_ring_peek_sg.h
> > > > > >
> > > > > > diff --git a/lib/librte_ring/meson.build
> > > > > > b/lib/librte_ring/meson.build index 31c0b4649..377694713 100644
> > > > > > --- a/lib/librte_ring/meson.build
> > > > > > +++ b/lib/librte_ring/meson.build
> > > > > > @@ -12,4 +12,5 @@ headers = files('rte_ring.h',
> > > > > >  		'rte_ring_peek.h',
> > > > > >  		'rte_ring_peek_c11_mem.h',
> > > > > >  		'rte_ring_rts.h',
> > > > > > -		'rte_ring_rts_c11_mem.h')
> > > > > > +		'rte_ring_rts_c11_mem.h',
> > > > > > +		'rte_ring_peek_sg.h')
> > > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > > b/lib/librte_ring/rte_ring_elem.h index 938b398fc..7d3933f15
> > > > > > 100644
> > > > > > --- a/lib/librte_ring/rte_ring_elem.h
> > > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > > @@ -1079,6 +1079,7 @@ rte_ring_dequeue_burst_elem(struct
> > > > > > rte_ring *r, void *obj_table,
> > > > > >
> > > > > >  #ifdef ALLOW_EXPERIMENTAL_API
> > > > > >  #include <rte_ring_peek.h>
> > > > > > +#include <rte_ring_peek_sg.h>
> > > > > >  #endif
> > > > > >
> > > > > >  #include <rte_ring.h>
> > > > > > diff --git a/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > b/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > new file mode 100644
> > > > > > index 000000000..97d5764a6
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > @@ -0,0 +1,552 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + *
> > > > > > + * Copyright (c) 2020 Arm
> > > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > > + * All rights reserved.
> > > > > > + * Derived from FreeBSD's bufring.h
> > > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _RTE_RING_PEEK_SG_H_
> > > > > > +#define _RTE_RING_PEEK_SG_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * @file
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + * It is not recommended to include this file directly.
> > > > > > + * Please include <rte_ring_elem.h> instead.
> > > > > > + *
> > > > > > + * Ring Peek Scatter Gather APIs
> > > > >
> > > > > I am not fully convinced by the API name. To me, "scatter/gather"
> > > > > is associated to iovecs, like for instance in [1]. The wikipedia
> > > > > definition [2] may be closer though.
> > > > >
> > > > > [1]
> > > > >
> > https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGat
> > > > > he
> > > > > r.html
> > > > > [2]
> > > > > https://en.wikipedia.org/wiki/Gather-scatter_(vector_addressing)
> > > > The way I understand scatter-gather is, the data to be sent to
> > > > something (like a device) is coming from multiple sources. It would
> > > > require
> > > copying to put the data together to be contiguous. If the device
> > > supports scatter-gather, such copying is not required. The device can
> > collect data from multiple locations and make it contiguous.
> > > >
> > > > In the case I was looking at, one part of the data was coming from
> > > > the user of the API and another was generated by the API itself. If
> > > these two pieces of information have to be enqueued as a single object
> > > on the ring, I had to create an intermediate copy. But by exposing the ring
> > memory to the user, the intermediate copy is avoided. Hence I called it
> > scatter-gather.
> > > >
> > > > >
> > > > > What about "zero-copy"?
> > > > I think no-copy (nc for short) or user-copy (uc for short) would
> > > > convey the meaning better. These would indicate that the rte_ring
> > > > APIs are
> > > not copying the objects and it is left to the user to do the actual copy.


+1 for _ZC_ in naming.
_NC_ is probably ok too, but sounds really strange to me.

> > > >
> > > > >
> > > > > Also, the "peek" term looks also a bit confusing to me, but it
> > > > > existed before your patch. I understand it for dequeue, but not for
> > enqueue.
> > > > I kept 'peek' there because the API still offers the 'peek' API
> > > > capabilities. I am also not sure on what 'peek' means for enqueue
> > > > API. The
> > > enqueue 'peek' API was provided to be symmetric with dequeue peek API.
> > > >
> > > > >
> > > > > Or, what about replacing the existing experimental peek API by this one?
> > > > > They look quite similar to me.
> > > > I agree, scatter gather APIs provide the peek capability and the no-copy
> > benefits.
> > > > Konstantin, any opinions here?

I am still not very comfortable with API that allows users to access
elems locations directly. I do understand that it could be beneficial in some
special cases (you provided some good examples below), so I don't object to
have it as addon.
But I still think it shouldn't be the _only_ API. 

> >
> > Sorry, didn't have time yet, to look at this RFC properly.
> > Will try to do it next week, as I understand that's for 21.02 anyway?
> This is committed for 20.11. We should be able to get into RC2.

Sounds really tight..., but ok, let's see how it goes.
 
> >
> > > > >
> > > > > > + * Introduction of rte_ring with scatter gather serialized
> > > > > > + producer/consumer
> > > > > > + * (HTS sync mode) makes it possible to split public
> > > > > > + enqueue/dequeue API
> > > > > > + * into 3 phases:
> > > > > > + * - enqueue/dequeue start
> > > > > > + * - copy data to/from the ring
> > > > > > + * - enqueue/dequeue finish
> > > > > > + * Along with the advantages of the peek APIs, these APIs
> > > > > > + provide the ability
> > > > > > + * to avoid copying of the data to temporary area.
> > > > > > + *
> > > > > > + * Note that right now this new API is available only for two sync
> > modes:
> > > > > > + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> > > > > > + * 2) Serialized Producer/Serialized Consumer
> > (RTE_RING_SYNC_MT_HTS).
> > > > > > + * It is a user responsibility to create/init ring with
> > > > > > + appropriate sync
> > > > > > + * modes selected.
> > > > > > + *
> > > > > > + * Example usage:
> > > > > > + * // read 1 elem from the ring:
> > > > >
> > > > > Comment should be "prepare enqueuing 32 objects"
> > > > >
> > > > > > + * n = rte_ring_enqueue_sg_bulk_start(ring, 32, &sgd, NULL);
> > > > > > + * if (n != 0) {
> > > > > > + *	//Copy objects in the ring
> > > > > > + *	memcpy (sgd->ptr1, obj, sgd->n1 * sizeof(uintptr_t));
> > > > > > + *	if (n != sgd->n1)
> > > > > > + *		//Second memcpy because of wrapround
> > > > > > + *		n2 = n - sgd->n1;
> > > > > > + *		memcpy (sgd->ptr2, obj[n2], n2 * sizeof(uintptr_t));
> > > > >
> > > > > Missing { }
> > > > >
> > > > > > + *	rte_ring_dequeue_sg_finish(ring, n);
> > > > >
> > > > > Should be enqueue
> > > > >
> > > > Thanks, will correct them.
> > > >
> > > > > > + * }
> > > > > > + *
> > > > > > + * Note that between _start_ and _finish_ none other thread can
> > > > > > + proceed
> > > > > > + * with enqueue(/dequeue) operation till _finish_ completes.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <rte_ring_peek_c11_mem.h>
> > > > > > +
> > > > > > +/* Rock that needs to be passed between reserve and commit APIs
> > > > > > +*/ struct rte_ring_sg_data {
> > > > > > +	/* Pointer to the first space in the ring */
> > > > > > +	void **ptr1;
> > > > > > +	/* Pointer to the second space in the ring if there is wrap-
> > around */
> > > > > > +	void **ptr2;
> > > > > > +	/* Number of elements in the first pointer. If this is equal to
> > > > > > +	 * the number of elements requested, then ptr2 is NULL.
> > > > > > +	 * Otherwise, subtracting n1 from number of elements
> > requested
> > > > > > +	 * will give the number of elements available at ptr2.
> > > > > > +	 */
> > > > > > +	unsigned int n1;
> > > > > > +};
> > > > >
> > > > > Would it be possible to simply return the offset instead of this structure?
> > > > > The wrap could be managed by a __rte_ring_enqueue_elems()
> > function.
> > > > Trying to use __rte_ring_enqueue_elems() will force temporary copy.
> > See below.
> > > >
> > > > >
> > > > > I mean something like this:
> > > > >
> > > > > 	uint32_t start;
> > > > > 	n = rte_ring_enqueue_sg_bulk_start(ring, 32, &start, NULL);
> > > > > 	if (n != 0) {
> > > > > 		/* Copy objects in the ring. */
> > > > > 		__rte_ring_enqueue_elems(ring, start, obj, sizeof(uintptr_t),
> > > > > n);
> > > > For ex: 'obj' here is temporary copy.
> > > >
> > > > > 		rte_ring_enqueue_sg_finish(ring, n);
> > > > > 	}
> > > > >
> > > > > It would require to slightly change __rte_ring_enqueue_elems() to
> > > > > support to be called with prod_head >= size, and wrap in that case.
> > > > >
> > > > The alternate solution I can think of requires 3 things 1) the base
> > > > address of the ring 2) Index to where to copy 3) the mask. With
> > > > these 3
> > > things one could write the code like below:
> > > > for (i = 0; i < n; i++) {
> > > > 	ring_addr[(index + i) & mask] = obj[i]; // ANDing with mask will take
> > care of wrap-around.
> > > > }
> > > >
> > > > However, I think this does not allow for passing the address in the
> > > > ring to another function/API to copy the data (It is possible, but
> > > > the user
> > > has to calculate the actual address, worry about the wrap-around, second
> > pointer etc).
> > > >
> > > > The current approach hides some details and provides flexibility to the
> > application to use the pointer the way it wants.
> > >
> > > I agree that doing the access + masking manually looks too complex.
> > >
> > > However I'm not sure to get why using __rte_ring_enqueue_elems()
> > would
> > > result in an additional copy. I suppose the object that you want to
> > > enqueue is already stored somewhere?
> I think this is the key. The object is not stored any where (yet), it is getting generated. When it is generated, it should get stored directly into
> the ring. I have provided some examples below.
> 
> > >
> > > For instance, let's say you have 10 objects to enqueue, located at
> > > different places:
> > >
> > > 	void *obj_0_to_3 = <place where objects 0 to 3 are stored>;
> > > 	void *obj_4_to_7 = ...;
> > > 	void *obj_8_to_9 = ...;
> > > 	uint32_t start;
> > > 	n = rte_ring_enqueue_sg_bulk_start(ring, 10, &start, NULL);
> > > 	if (n != 0) {
> > > 		__rte_ring_enqueue_elems(ring, start, obj_0_to_3,
> > > 			sizeof(uintptr_t), 4);
> > > 		__rte_ring_enqueue_elems(ring, start + 4, obj_4_to_7,
> > > 			sizeof(uintptr_t), 4);
> > > 		__rte_ring_enqueue_elems(ring, start + 8, obj_8_to_9,
> > > 			sizeof(uintptr_t), 2);
> > > 		rte_ring_enqueue_sg_finish(ring, 10);
> > > 	}
> > >
> >
> >
> > As I understand, It is not about different objects stored in different places, it
> > is about:
> > a) object is relatively big (16B+ ?)
> > b) You compose objects from values stored in few different places.
> >
> > Let say you have:
> > struct elem_obj {uint64_t a; uint32_t b, c;};
> >
> > And then you'd like to copy 'a' value from one location, 'b' from second, and
> > 'c' from third one.
> >
> > Konstantin
> >
> I think there are multiple use cases. Some I have in mind are:
> 
> 1)
> Code without this patch:
> 
> struct rte_mbuf *pkts_burst[32];
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> /* Pkt I/O core polls packets from the NIC, pkts_burst is the temporary store */
> nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, 32);
> /* Provide packets to the packet processing cores */
> rte_ring_enqueue_burst(ring, pkts_burst, 32, &free_space);
> 
> Code with the patch:
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> /* Reserve space on the ring */
> n = rte_ring_enqueue_sg_burst_start(ring, 32, &sgd, NULL);
> /* Pkt I/O core polls packets from the NIC */
> if (n == 32)
> 	nb_rx = rte_eth_rx_burst(portid, queueid, sgd->ptr1, 32);
> else
> 	nb_rx = rte_eth_rx_burst(portid, queueid, sgd->ptr1, sgd->n1);
> /* Provide packets to the packet processing cores */
> /* Temporary storage 'pkts_burst' is not required */
> rte_ring_enqueue_sg_finish(ring, nb_rx);
> 
> 
> 2) This is same/similar to what Konstantin mentioned
> 
> Code without this patch:
> 
> struct elem_obj {uint64_t a; uint32_t b, c;};
> struct elem_obj obj;
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> obj.a = rte_get_a();
> obj.b = rte_get_b();
> obj.c = rte_get_c();
> /* obj is the temporary storage and results in memcpy in the following call */
> rte_ring_enqueue_elem(ring, sizeof(struct elem_obj), 1, &obj, NULL);
> 
> Code with the patch:
> 
> struct elem_obj *obj;
> /* Reserve space on the ring */
> n = rte_ring_enqueue_sg_bulk_elem_start(ring, sizeof(elem_obj), 1, &sgd, NULL);
> 
> obj = (struct elem_obj *)sgd->ptr1;
> obj.a = rte_get_a();
> obj.b = rte_get_b();
> obj.c = rte_get_c();
> /* obj is not a temporary storage */
> rte_ring_enqueue_sg_elem_finish(ring, n);

  reply	other threads:[~2020-10-12 17:06 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:39 [dpdk-dev] [RFC 0/1] lib/ring: add scatter gather and serial dequeue APIs Honnappa Nagarahalli
2020-02-24 20:39 ` [dpdk-dev] [RFC 1/1] " Honnappa Nagarahalli
2020-02-26 20:38   ` Ananyev, Konstantin
2020-02-26 23:21     ` Ananyev, Konstantin
2020-02-28  0:18     ` Honnappa Nagarahalli
2020-03-02 18:20       ` Ananyev, Konstantin
2020-03-04 23:21         ` Honnappa Nagarahalli
2020-03-05 18:26           ` Ananyev, Konstantin
2020-03-25 20:43             ` Honnappa Nagarahalli
2020-10-06 13:29 ` [dpdk-dev] [RFC v2 0/1] lib/ring: add scatter gather APIs Honnappa Nagarahalli
2020-10-06 13:29   ` [dpdk-dev] [RFC v2 1/1] " Honnappa Nagarahalli
2020-10-07  8:27     ` Olivier Matz
2020-10-08 20:44       ` Honnappa Nagarahalli
2020-10-08 20:47         ` Honnappa Nagarahalli
2020-10-09  7:33         ` Olivier Matz
2020-10-09  8:05           ` Ananyev, Konstantin
2020-10-09 22:54             ` Honnappa Nagarahalli
2020-10-12 17:06               ` Ananyev, Konstantin [this message]
2020-10-12 16:20     ` Ananyev, Konstantin
2020-10-12 22:31       ` Honnappa Nagarahalli
2020-10-13 11:38         ` Ananyev, Konstantin
2020-10-23  4:43 ` [dpdk-dev] [PATCH v3 0/5] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 1/5] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-23 13:24     ` Ananyev, Konstantin
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 2/5] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-23 13:59     ` Ananyev, Konstantin
2020-10-24 15:45       ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 3/5] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-23 14:22     ` Ananyev, Konstantin
2020-10-23 23:54       ` Honnappa Nagarahalli
2020-10-24  0:29         ` Stephen Hemminger
2020-10-24  0:31           ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 4/5] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-23 14:20     ` Ananyev, Konstantin
2020-10-23 22:47       ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 5/5] test/ring: add stress " Honnappa Nagarahalli
2020-10-23 14:11     ` Ananyev, Konstantin
2020-10-24 16:11 ` [dpdk-dev] [PATCH v4 0/8] lib/ring: add " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 1/8] " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 2/8] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 3/8] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 4/8] test/ring: add stress " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 5/8] doc/ring: add zero copy peek APIs Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 6/8] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 7/8] test/ring: remove unnecessary braces Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 8/8] test/ring: user uintptr_t instead of unsigned long Honnappa Nagarahalli
2020-10-24 16:18   ` [dpdk-dev] [PATCH v4 0/8] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-25  7:16     ` David Marchand
2020-10-25  8:14       ` Thomas Monjalon
2020-10-25  5:45 ` [dpdk-dev] [PATCH v5 " Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 1/8] " Honnappa Nagarahalli
2020-10-27 14:11     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 2/8] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-27 13:51     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 3/8] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 4/8] test/ring: add stress " Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 5/8] doc/ring: add zero copy peek APIs Honnappa Nagarahalli
2020-10-27 14:11     ` Ananyev, Konstantin
2020-10-29 10:52     ` David Marchand
2020-10-29 11:28       ` Ananyev, Konstantin
2020-10-29 12:35         ` David Marchand
2020-10-29 17:29           ` Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 6/8] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 7/8] test/ring: remove unnecessary braces Honnappa Nagarahalli
2020-10-27 14:13     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 8/8] test/ring: user uintptr_t instead of unsigned long Honnappa Nagarahalli
2020-10-27 14:14     ` Ananyev, Konstantin
2020-10-29 13:57   ` [dpdk-dev] [PATCH v5 0/8] lib/ring: add zero copy APIs David Marchand
2020-10-29 22:11     ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR11MB3301191B27890F387D48D34B9A070@BYAPR11MB3301.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.