From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Trahe, Fiona" Subject: Re: [RFC v2] doc compression API for DPDK Date: Thu, 15 Feb 2018 18:47:21 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry , "Trahe, Fiona" To: "Verma, Shally" , Ahmed Mansour , "dev@dpdk.org" Return-path: Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 06B7F1B161 for ; Thu, 15 Feb 2018 19:47:27 +0100 (CET) In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Shally, Ahmed,=20 Sorry for the delay in replying, Comments below > -----Original Message----- > From: Verma, Shally [mailto:Shally.Verma@cavium.com] > Sent: Wednesday, February 14, 2018 7:41 AM > To: Ahmed Mansour ; Trahe, Fiona ; > dev@dpdk.org > Cc: Athreya, Narayana Prasad ; Gupta, = Ashish > ; Sahu, Sunila ; De Lara= Guarch, Pablo > ; Challa, Mahipal ; Jain, Deepak K > ; Hemant Agrawal ; Roy P= ledge > ; Youri Querry > Subject: RE: [RFC v2] doc compression API for DPDK >=20 > Hi Ahmed, >=20 > >-----Original Message----- > >From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] > >Sent: 02 February 2018 01:53 > >To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org > >Cc: Athreya, Narayana Prasad ; Gupta,= Ashish > ; Sahu, Sunila > >; De Lara Guarch, Pablo ; Challa, > Mahipal > >; Jain, Deepak K ; H= emant Agrawal > ; Roy > >Pledge ; Youri Querry > >Subject: Re: [RFC v2] doc compression API for DPDK > > > >On 1/31/2018 2:03 PM, Trahe, Fiona wrote: > >> Hi Ahmed, Shally, > >> > >> ///snip/// > >>>>>>>> D.1.1 Stateless and OUT_OF_SPACE > >>>>>>>> ------------------------------------------------ > >>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space > >>>>> and > >>>>>>> where PMD still has more data to produce. If PMD run into such > >>>>> condition, > >>>>>>> then it's an error condition in stateless processing. > >>>>>>>> In such case, PMD resets itself and return with status > >>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 > >>>>> i.e. > >>>>>>> no input read, no output written. > >>>>>>>> Application can resubmit an full input with larger output buffer= size. > >>>>>>> [Ahmed] Can we add an option to allow the user to read the data t= hat > >>>>> was > >>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly usefu= l for > >>>>>>> decompression applications doing search. > >>>>>> [Shally] It is there but applicable for stateful operation type (p= lease refer to > >>>>> handling out_of_space under > >>>>>> "Stateful Section"). > >>>>>> By definition, "stateless" here means that application (such as IP= COMP) > >>>>> knows maximum output size > >>>>>> guaranteedly and ensure that uncompressed data size cannot grow mo= re > >>>>> than provided output buffer. > >>>>>> Such apps can submit an op with type =3D STATELESS and provide ful= l input, > >>>>> then PMD assume it has > >>>>>> sufficient input and output and thus doesn't need to maintain any = contexts > >>>>> after op is processed. > >>>>>> If application doesn't know about max output size, then it should = process it > >>>>> as stateful op i.e. setup op > >>>>>> with type =3D STATEFUL and attach a stream so that PMD can maintai= n > >>>>> relevant context to handle such > >>>>>> condition. > >>>>> [Fiona] There may be an alternative that's useful for Ahmed, while = still > >>>>> respecting the stateless concept. > >>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression > >>>>> case > >>>>> it could also return consumed=3D0, produced =3D x, where x>0. X ind= icates the > >>>>> amount of valid data which has > >>>>> been written to the output buffer. It is not complete, but if an a= pplication > >>>>> wants to search it it may be sufficient. > >>>>> If the application still wants the data it must resubmit the whole = input with a > >>>>> bigger output buffer, and > >>>>> decompression will be repeated from the start, it > >>>>> cannot expect to continue on as the PMD has not maintained state, = history > >>>>> or data. > >>>>> I don't think there would be any need to indicate this in capabilit= ies, PMDs > >>>>> which cannot provide this > >>>>> functionality would always return produced=3Dconsumed=3D0, while PM= Ds which > >>>>> can could set produced > 0. > >>>>> If this works for you both, we could consider a similar case for co= mpression. > >>>>> > >>>> [Shally] Sounds Fine to me. Though then in that case, consume should= also be updated to actual > >>> consumed by PMD. > >>>> Setting consumed =3D 0 with produced > 0 doesn't correlate. > >>> [Ahmed]I like Fiona's suggestion, but I also do not like the implicat= ion > >>> of returning consumed =3D 0. At the same time returning consumed =3D = y > >>> implies to the user that it can proceed from the middle. I prefer the > >>> consumed =3D 0 implementation, but I think a different return is need= ed to > >>> distinguish it from OUT_OF_SPACE that the use can recover from. Perha= ps > >>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allow= s > >>> future PMD implementations to provide recover-ability even in STATELE= SS > >>> mode if they so wish. In this model STATELESS or STATEFUL would be a > >>> hint for the PMD implementation to make optimizations for each case, = but > >>> it does not force the PMD implementation to limit functionality if it > >>> can provide recover-ability. > >> [Fiona] So you're suggesting the following: > >> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.p= roduced > >> can be used and next op in stream should continue on from op.consu= med+1. > >> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > >> Error condition, no recovery possible. > >> consumed=3Dproduced=3D0. Application must resubmit all input data = with > >> a bigger output buffer. > >> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some = recovery possible. > >> - consumed =3D 0, produced > 0. Application must resubmit all inp= ut data with > >> a bigger output buffer. However in decompression case, data up= to produced > >> in dst buffer may be inspected/searched. Never happens in comp= ression > >> case as output data would be meaningless. > >> - consumed > 0, produced > 0. PMD has stored relevant state and h= istory and so > >> can convert to stateful, using op.produced and continuing from= consumed+1. > >> I don't expect our PMDs to use this last case, but maybe this works fo= r others? > >> I'm not convinced it's not just adding complexity. It sounds like a ve= rsion of stateful > >> without a stream, and maybe less efficient? > >> If so should it respect the FLUSH flag? Which would have been FULL or = FINAL in the op. > >> Or treat it as FLUSH_NONE or SYNC? I don't know why an application wou= ld not > >> simply have submitted a STATEFUL request if this is the behaviour it w= ants? > >[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely > >and replacing it with > >OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > > Error condition, no recovery possible. > > - consumed=3D0 produced=3Damount of data produced. Application m= ust > >resubmit all input data with > > a bigger output buffer to process all of the op > >OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. Not > >an error. Op.produced > > can be used and next op in stream should continue on from op.consume= d+1. > > - consumed > 0, produced > 0. PMD has stored relevant state and > >history and so > > can continue using op.produced and continuing from consumed+= 1. > > > >We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our > >implementation either. > > > >Regardless of speculative future PMDs. The more important aspect of this > >for today is that the return status clearly determines > >the meaning of "consumed". If it is RECOVERABLE then consumed is > >meaningful. if it is TERMINATED then consumed in meaningless. > >This way we take away the ambiguity of having OUT_OF_SPACE mean two > >different user work flows. > > > >A speculative future PMD may be designed to return RECOVERABLE for > >stateless ops that are attached to streams. > >A future PMD may look to see if an op has a stream is attached and write > >out the state there and go into recoverable mode. > >in essence this leaves the choice up to the implementation and allows > >the PMD to take advantage of stateless optimizations > >so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump > >context as soon as it fully processes an op. It will only > >write context out in cases where the op chokes. > >This futuristic PMD should ignore the FLUSH since this STATELESS mode as > >indicated by the user and optimize >=20 > [Shally] IMO, it looks okay to have two separate return code TERMINATED a= nd RECOVERABLE with > definition as you mentioned and seem doable. > So then it mean all following conditions: > a. stateless with flush =3D full/final, no stream pointer provided , PMD = can return TERMINATED i.e. user > has to start all over again, it's a failure (as in current definition) > b. stateless with flush =3D full/final, stream pointer provided, here it'= s up to PMD to return either > TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable= , then PMD will maintain > states in stream pointer) > c. stateful with flush =3D full / NO_SYNC, stream pointer always there, P= MD will > TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION fe= ature flag > enabled or not [Fiona] I don't think the flush flag is relevant - it could be out of space= on any flush flag, and if out of space should ignore the flush flag.=20 Is there a need for TERMINATED? - I didn't think it would ever need to be r= eturned in stateful case. Why the ref to feature flag? If a PMD doesn't support a feature I think it= should fail the op - not with out-of space, but unsupported or similar. Or it would fail on stream creat= ion. >=20 > and one more exception case is: > d. stateless with flush =3D full, no stream pointer provided, PMD can ret= urn RECOVERABLE i.e. PMD > internally maintained that state somehow and consumed & produced > 0, so = user can start consumed+1 > but there's restriction on user not to alter or change op until it is ful= ly processed?! [Fiona] Why the need for this case?=20 There's always a restriction on user not to alter or change op until it is = fully processed. If a PMD can do this - why doesn't it create a stream when that API is call= ed - and then it's same as b? >=20 > API currently takes care of case a and c, and case b can be supported if = specification accept another > proposal which mention optional usage of stream with stateless. [Fiona] API has this, but as we agreed, not optional to call the create_str= eam() with an op_type=20 parameter (stateful/stateless). PMD can return NULL or provide a stream, if= the latter then that=20 stream must be attached to ops. Until then API takes no difference to > case b and c i.e. we can have op such as, > - type=3D stateful with flush =3D full/final, stream pointer provided, PM= D can return > TERMINATED/RECOVERABLE according to its ability >=20 > Case d , is something exceptional, if there's requirement in PMDs to supp= ort it, then believe it will be > doable with concept of different return code. >=20 [Fiona] That's not quite how I understood it. Can it be simpler and only fo= llowing cases? a. stateless with flush =3D full/final, no stream pointer provided , PMD ca= n return TERMINATED i.e. user has to start all over again, it's a failure (as in current definition).= =20 consumed =3D 0, produced=3Damount of data produced. This is usually 0, = but in decompression=20 case a PMD may return > 0 and application may find it useful to inspect= that data. b. stateless with flush =3D full/final, stream pointer provided, here it's = up to PMD to return either TERMINATED or RECOVERABLE depending upon its ability (note if Recoverab= le, then PMD will maintain states in stream pointer) c. stateful with flush =3D any, stream pointer always there, PMD will retur= n RECOVERABLE. op.produced can be used and next op in stream should continue on from o= p.consumed+1. Consumed=3D0, produced=3D0 is an unusual but allowed case. I'm not sure= if it could ever happen, but no need to change state to TERMINATED in this case. There may be useful= state/history=20 stored in the PMD, even though no output produced yet. > >>>>>>>> D.2 Compression API Stateful operation > >>>>>>>> ---------------------------------------------------------- > >>>>>>>> A Stateful operation in DPDK compression means application invo= kes > >>>>>>> enqueue burst() multiple times to process related chunk of data e= ither > >>>>>>> because > >>>>>>>> - Application broke data into several ops, and/or > >>>>>>>> - PMD ran into out_of_space situation during input processing > >>>>>>>> > >>>>>>>> In case of either one or all of the above conditions, PMD is req= uired to > >>>>>>> maintain state of op across enque_burst() calls and > >>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with > >>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value > >>>>>>> RTE_COMP_FULL/FINAL_FLUSH. > >>>>>>>> D.2.1 Stateful operation state maintenance > >>>>>>>> --------------------------------------------------------------- > >>>>>>>> It is always an ideal expectation from application that it shoul= d parse > >>>>>>> through all related chunk of source data making its mbuf-chain an= d > >>>>> enqueue > >>>>>>> it for stateless processing. > >>>>>>>> However, if it need to break it into several enqueue_burst() cal= ls, then > >>>>> an > >>>>>>> expected call flow would be something like: > >>>>>>>> enqueue_burst( |op.no_flush |) > >>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call d= equeue > >>>>>>> burst in a loop until all ops are received. Is this correct? > >>>>>>> > >>>>>>>> deque_burst(op) // should dequeue before we enqueue next > >>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. Howe= ver > >>>>> this illustration is specifically in > >>>>>> context of stateful op processing to reflect if a stream is broken= into > >>>>> chunks, then each chunk should be > >>>>>> submitted as one op at-a-time with type =3D STATEFUL and need to b= e > >>>>> dequeued first before next chunk is > >>>>>> enqueued. > >>>>>> > >>>>>>>> enqueue_burst( |op.no_flush |) > >>>>>>>> deque_burst(op) // should dequeue before we enqueue next > >>>>>>>> enqueue_burst( |op.full_flush |) > >>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand= that > >>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just > >>>>> distinguish > >>>>>>> the response in exception cases? > >>>>>> [Shally] Multiples ops are allowed in flight, however condition is= each op in > >>>>> such case is independent of > >>>>>> each other i.e. belong to different streams altogether. > >>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to pr= ocess all > >>>>> related chunks of data in single > >>>>>> burst by passing them as ops array but later found that as not-so-= useful for > >>>>> PMD handling for various > >>>>>> reasons. You may please refer to RFC v1 doc review comments for sa= me. > >>>>> [Fiona] Agree with Shally. In summary, as only one op can be proces= sed at a > >>>>> time, since each needs the > >>>>> state of the previous, to allow more than 1 op to be in-flight at a= time would > >>>>> force PMDs to implement internal queueing and exception handling fo= r > >>>>> OUT_OF_SPACE conditions you mention. > >>> [Ahmed] But we are putting the ops on qps which would make them > >>> sequential. Handling OUT_OF_SPACE conditions would be a little bit mo= re > >>> complex but doable. > >> [Fiona] In my opinion this is not doable, could be very inefficient. > >> There may be many streams. > >> The PMD would have to have an internal queue per stream so > >> it could adjust the next src offset and length in the OUT_OF_SPACE cas= e. > >> And this may ripple back though all subsequent ops in the stream as ea= ch > >> source len is increased and its dst buffer is not big enough. > >[Ahmed] Regarding multi op OUT_OF_SPACE handling. > >The caller would still need to adjust > >the src length/output buffer as you say. The PMD cannot handle > >OUT_OF_SPACE internally. > >After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream > >until it gets explicit > >confirmation from the caller to continue working on this stream. Any ops > >received by > >the PMD should be returned to the caller with status STREAM_PAUSED since > >the caller did not > >explicitly acknowledge that it has solved the OUT_OF_SPACE issue. > >These semantics can be enabled by adding a new function to the API > >perhaps stream_resume(). > >This allows the caller to indicate that it acknowledges that it has seen > >the issue and this op > >should be used to resolve the issue. Implementations that do not support > >this mode of use > >can push back immediately after one op is in flight. Implementations > >that support this use > >mode can allow many ops from the same session > > > [Shally] Is it still in context of having single burst where all op belon= gs to one stream? If yes, I would still > say it would add an overhead to PMDs especially if it is expected to work= closer to HW (which I think is > the case with DPDK PMD). > Though your approach is doable but why this all cannot be in a layer abov= e PMD? i.e. a layer above PMD > can either pass one-op at a time with burst size =3D 1 OR can make chaine= d mbuf of input and output and > pass than as one op. > Is it just to ease applications of chained mbuf burden or do you see any = performance /use-case > impacting aspect also? >=20 > if it is in context where each op belong to different stream in a burst, = then why do we need > stream_pause and resume? It is a expectations from app to pass more outpu= t buffer with consumed + 1 > from next call onwards as it has already > seen OUT_OF_SPACE. > [Fiona] I still have concerns with this and would not want to support in ou= r PMD. TO make sure I understand, you want to send a burst of ops, with several fr= om same stream. If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process any= =20 subsequent ops in that stream.=20 Should it return them in a dequeue_burst() with status still NOT_PROCESSED? Or somehow drop them? How? While still processing ops form other streams. As we want to offload each op to hardware with as little CPU processing as = possible we would not want to open up each op to see which stream it's attached to and make decisions to do per-stream storage, or drop it, or bypass hw and deque= ue without processing. Maybe we could add a capability if this behaviour is important for you? e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ? Our PMD would set this to 0. And expect no more than one op from a stateful= stream to be in flight at any time. =20 =20 > >Regarding the ordering of ops > >We do force serialization of ops belonging to a stream in STATEFUL > >operation. Related ops do > >not go out of order and are given to available PMDs one at a time. > > > >>> The question is this mode of use useful for real > >>> life applications or would we be just adding complexity? The technica= l > >>> advantage of this is that processing of Stateful ops is interdependen= t > >>> and PMDs can take advantage of caching and other optimizations to mak= e > >>> processing related ops much faster than switching on every op. PMDs h= ave > >>> maintain state of more than 32 KB for DEFLATE for every stream. > >>>>> If the application has all the data, it can put it into chained mbu= fs in a single > >>>>> op rather than > >>>>> multiple ops, which avoids pushing all that complexity down to the = PMDs. > >>> [Ahmed] I think that your suggested scheme of putting all related mbu= fs > >>> into one op may be the best solution without the extra complexity of > >>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra > >>> time If we have a way of marking mbufs as ready for consumption. The > >>> enqueuer may not have all the data at hand but can enqueue the op wit= h a > >>> couple of empty mbus marked as not ready for consumption. The enqueue= r > >>> will then update the rest of the mbufs to ready for consumption once = the > >>> data is added. This introduces a race condition. A second flag for ea= ch > >>> mbuf can be updated by the PMD to indicate that it processed it or no= t. > >>> This way in cases where the PMD beat the application to the op, the > >>> application will just update the op to point to the first unprocessed > >>> mbuf and resend it to the PMD. > >> [Fiona] This doesn't sound safe. You want to add data to a stream afte= r you've > >> enqueued the op. You would have to write to op.src.length at a time wh= en the PMD > >> might be reading it. Sounds like a lock would be necessary. > >> Once the op has been enqueued, my understanding is its ownership is ha= nded > >> over to the PMD and the application should not touch it until it has b= een dequeued. > >> I don't think it's a good idea to change this model. > >> Can't the application just collect a stream of data in chained mbufs u= ntil it has > >> enough to send an op, then construct the op and while waiting for that= op to > >> complete, accumulate the next batch of chained mbufs? Only construct t= he next op > >> after the previous one is complete, based on the result of the previou= s one. > >> > >[Ahmed] Fair enough. I agree with you. I imagined it in a different way > >in which each mbuf would have its own length. > >The advantage to gain is in applications where there is one PMD user, > >the down time between ops can be significant and setting up a single > >producer consumer pair significantly reduces the CPU cycles and PMD down > >time. > > > >////snip////