From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Verma, Shally" <Shally.Verma@cavium.com>
Subject: Re: [RFC v2] doc compression API for DPDK
Date: Wed, 14 Feb 2018 07:41:09 +0000
Message-ID: <CY4PR0701MB36346F0F27F97834E2BD1D59F0F50@CY4PR0701MB3634.namprd07.prod.outlook.com>
References: <BY1PR0701MB1111E78D7398ECBFD76E98B3F01F0@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <AM0PR0402MB3842F20147CFD02B62EAA6B5E1100@AM0PR0402MB3842.eurprd04.prod.outlook.com>
 <BY1PR0701MB1111EFBFFF380349DEEE7EF7F0110@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com>
 <BY1PR0701MB11117F9E9B314BA62A7CA6CFF0170@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <VI1PR0402MB38531C5234498E5EE9C2BE62E1E10@VI1PR0402MB3853.eurprd04.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com>
 <DB3PR0402MB3852855639B5464D3AE1A441E1FA0@DB3PR0402MB3852.eurprd04.prod.outlook.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "Athreya, Narayana Prasad" <NarayanaPrasad.Athreya@cavium.com>, "Gupta,
 Ashish" <Ashish.Gupta@cavium.com>, "Sahu, Sunila" <Sunila.Sahu@cavium.com>,
 "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>, "Challa, Mahipal"
 <Mahipal.Challa@cavium.com>, "Jain, Deepak K" <deepak.k.jain@intel.com>,
 Hemant Agrawal <hemant.agrawal@nxp.com>, Roy Pledge <roy.pledge@nxp.com>,
 Youri Querry <youri.querry_1@nxp.com>
To: Ahmed Mansour <ahmed.mansour@nxp.com>, "Trahe, Fiona"
 <fiona.trahe@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Return-path: <dev-bounces@dpdk.org>
Received: from NAM03-BY2-obe.outbound.protection.outlook.com
 (mail-by2nam03on0045.outbound.protection.outlook.com [104.47.42.45])
 by dpdk.org (Postfix) with ESMTP id 83FA01B311
 for <dev@dpdk.org>; Wed, 14 Feb 2018 08:41:12 +0100 (CET)
In-Reply-To: <DB3PR0402MB3852855639B5464D3AE1A441E1FA0@DB3PR0402MB3852.eurprd04.prod.outlook.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Ahmed,

>-----Original Message-----
>From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
>Sent: 02 February 2018 01:53
>To: Trahe, Fiona <fiona.trahe@intel.com>; Verma, Shally <Shally.Verma@cavi=
um.com>; dev@dpdk.org
>Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta, A=
shish <Ashish.Gupta@cavium.com>; Sahu, Sunila
><Sunila.Sahu@cavium.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@inte=
l.com>; Challa, Mahipal
><Mahipal.Challa@cavium.com>; Jain, Deepak K <deepak.k.jain@intel.com>; Hem=
ant Agrawal <hemant.agrawal@nxp.com>; Roy
>Pledge <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>
>Subject: Re: [RFC v2] doc compression API for DPDK
>
>On 1/31/2018 2:03 PM, Trahe, Fiona wrote:
>> Hi Ahmed, Shally,
>>
>> ///snip///
>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE
>>>>>>>> ------------------------------------------------
>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space
>>>>> and
>>>>>>> where PMD still has more data to produce. If PMD run into such
>>>>> condition,
>>>>>>> then it's an error condition in stateless processing.
>>>>>>>> In such case, PMD resets itself and return with status
>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0
>>>>> i.e.
>>>>>>> no input read, no output written.
>>>>>>>> Application can resubmit an full input with larger output buffer s=
ize.
>>>>>>> [Ahmed] Can we add an option to allow the user to read the data tha=
t
>>>>> was
>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly useful =
for
>>>>>>> decompression applications doing search.
>>>>>> [Shally] It is there but applicable for stateful operation type (ple=
ase refer to
>>>>> handling out_of_space under
>>>>>> "Stateful Section").
>>>>>> By definition, "stateless" here means that application (such as IPCO=
MP)
>>>>> knows maximum output size
>>>>>> guaranteedly and ensure that uncompressed data size cannot grow more
>>>>> than provided output buffer.
>>>>>> Such apps can submit an op with type =3D STATELESS and provide full =
input,
>>>>> then PMD assume it has
>>>>>> sufficient input and output and thus doesn't need to maintain any co=
ntexts
>>>>> after op is processed.
>>>>>> If application doesn't know about max output size, then it should pr=
ocess it
>>>>> as stateful op i.e. setup op
>>>>>> with type =3D STATEFUL and attach a stream so that PMD can maintain
>>>>> relevant context to handle such
>>>>>> condition.
>>>>> [Fiona] There may be an alternative that's useful for Ahmed, while st=
ill
>>>>> respecting the stateless concept.
>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
>>>>> case
>>>>> it could also return consumed=3D0, produced =3D x, where x>0. X indic=
ates the
>>>>> amount of valid data which has
>>>>>  been written to the output buffer. It is not complete, but if an app=
lication
>>>>> wants to search it it may be sufficient.
>>>>> If the application still wants the data it must resubmit the whole in=
put with a
>>>>> bigger output buffer, and
>>>>>  decompression will be repeated from the start, it
>>>>>  cannot expect to continue on as the PMD has not maintained state, hi=
story
>>>>> or data.
>>>>> I don't think there would be any need to indicate this in capabilitie=
s, PMDs
>>>>> which cannot provide this
>>>>> functionality would always return produced=3Dconsumed=3D0, while PMDs=
 which
>>>>> can could set produced > 0.
>>>>> If this works for you both, we could consider a similar case for comp=
ression.
>>>>>
>>>> [Shally] Sounds Fine to me. Though then in that case, consume should a=
lso be updated to actual
>>> consumed by PMD.
>>>> Setting consumed =3D 0 with produced > 0 doesn't correlate.
>>> [Ahmed]I like Fiona's suggestion, but I also do not like the implicatio=
n
>>> of returning consumed =3D 0. At the same time returning consumed =3D y
>>> implies to the user that it can proceed from the middle. I prefer the
>>> consumed =3D 0 implementation, but I think a different return is needed=
 to
>>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
>>> future PMD implementations to provide recover-ability even in STATELESS
>>> mode if they so wish. In this model STATELESS or STATEFUL would be a
>>> hint for the PMD implementation to make optimizations for each case, bu=
t
>>> it does not force the PMD implementation to limit functionality if it
>>> can provide recover-ability.
>> [Fiona] So you're suggesting the following:
>> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.pro=
duced
>>     can be used and next op in stream should continue on from op.consume=
d+1.
>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>>     Error condition, no recovery possible.
>>     consumed=3Dproduced=3D0. Application must resubmit all input data wi=
th
>>     a bigger output buffer.
>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some re=
covery possible.
>>      - consumed =3D 0, produced > 0. Application must resubmit all input=
 data with
>>         a bigger output buffer. However in decompression case, data up t=
o produced
>>         in dst buffer may be inspected/searched. Never happens in compre=
ssion
>>         case as output data would be meaningless.
>>      - consumed > 0, produced > 0. PMD has stored relevant state and his=
tory and so
>>         can convert to stateful, using op.produced and continuing from c=
onsumed+1.
>> I don't expect our PMDs to use this last case, but maybe this works for =
others?
>> I'm not convinced it's not just adding complexity. It sounds like a vers=
ion of stateful
>> without a stream, and maybe less efficient?
>> If so should it respect the FLUSH flag? Which would have been FULL or FI=
NAL in the op.
>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application would=
 not
>> simply have submitted a STATEFUL request if this is the behaviour it wan=
ts?
>[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely
>and replacing it with
>OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>        Error condition, no recovery possible.
>        - consumed=3D0 produced=3Damount of data produced. Application mus=
t
>resubmit all input data with
>          a bigger output buffer to process all of the op
>OUT_OF_SPACE_RECOVERABLE -  Normally returned on stateful operation. Not
>an error. Op.produced
>    can be used and next op in stream should continue on from op.consumed+=
1.
>        -  consumed > 0, produced > 0. PMD has stored relevant state and
>history and so
>            can continue using op.produced and continuing from consumed+1.
>
>We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our
>implementation either.
>
>Regardless of speculative future PMDs. The more important aspect of this
>for today is that the return status clearly determines
>the meaning of "consumed". If it is RECOVERABLE then consumed is
>meaningful. if it is TERMINATED then consumed in meaningless.
>This way we take away the ambiguity of having OUT_OF_SPACE mean two
>different user work flows.
>
>A speculative future PMD may be designed to return RECOVERABLE for
>stateless ops that are attached to streams.
>A future PMD may look to see if an op has a stream is attached and write
>out the state there and go into recoverable mode.
>in essence this leaves the choice up to the implementation and allows
>the PMD to take advantage of stateless optimizations
>so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump
>context as soon as it fully processes an op. It will only
>write context out in cases where the op chokes.
>This futuristic PMD should ignore the FLUSH since this STATELESS mode as
>indicated by the user and optimize

[Shally] IMO, it looks okay to have two separate return code TERMINATED and=
 RECOVERABLE with definition as you mentioned and seem doable.
So then it mean all following conditions:
a. stateless with flush =3D full/final, no stream pointer provided , PMD ca=
n return TERMINATED i.e. user has to start all over again, it's a failure (=
as in current definition)
b. stateless with flush =3D full/final, stream pointer provided, here it's =
up to PMD to return either TERMINATED or RECOVERABLE depending upon its abi=
lity (note if Recoverable, then PMD will maintain states in stream pointer)
c. stateful with flush =3D full / NO_SYNC, stream pointer always there, PMD=
 will TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSIO=
N feature flag enabled or not

and one more exception case is:
d. stateless with flush =3D full, no stream pointer provided, PMD can retur=
n RECOVERABLE i.e. PMD internally maintained that state somehow and consume=
d & produced > 0, so user can start consumed+1 but there's restriction on u=
ser not to alter or change op until it is fully processed?!
=20
API currently takes care of case a and c, and case b can be supported if sp=
ecification accept another proposal which mention optional usage of stream =
with stateless. Until then API takes no difference to case b and c i.e. we =
can have op such as,
- type=3D stateful with flush =3D full/final, stream pointer provided, PMD =
can return TERMINATED/RECOVERABLE according to its ability

Case d , is something exceptional, if there's requirement in PMDs to suppor=
t it, then believe it will be doable with concept of different return code.

>>>>>>>> D.2 Compression API Stateful operation
>>>>>>>> ----------------------------------------------------------
>>>>>>>>  A Stateful operation in DPDK compression means application invoke=
s
>>>>>>> enqueue burst() multiple times to process related chunk of data eit=
her
>>>>>>> because
>>>>>>>> - Application broke data into several ops, and/or
>>>>>>>> - PMD ran into out_of_space situation during input processing
>>>>>>>>
>>>>>>>> In case of either one or all of the above conditions, PMD is requi=
red to
>>>>>>> maintain state of op across enque_burst() calls and
>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
>>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value
>>>>>>> RTE_COMP_FULL/FINAL_FLUSH.
>>>>>>>> D.2.1 Stateful operation state maintenance
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> It is always an ideal expectation from application that it should =
parse
>>>>>>> through all related chunk of source data making its mbuf-chain and
>>>>> enqueue
>>>>>>> it for stateless processing.
>>>>>>>> However, if it need to break it into several enqueue_burst() calls=
, then
>>>>> an
>>>>>>> expected call flow would be something like:
>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call deq=
ueue
>>>>>>> burst in a loop until all ops are received. Is this correct?
>>>>>>>
>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. Howeve=
r
>>>>> this illustration is specifically in
>>>>>> context of stateful op processing to reflect if a stream is broken i=
nto
>>>>> chunks, then each chunk should be
>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need to be
>>>>> dequeued first before next chunk is
>>>>>> enqueued.
>>>>>>
>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>>>> enqueue_burst( |op.full_flush |)
>>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand t=
hat
>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just
>>>>> distinguish
>>>>>>> the response in exception cases?
>>>>>> [Shally] Multiples ops are allowed in flight, however condition is e=
ach op in
>>>>> such case is independent of
>>>>>> each other i.e. belong to different streams altogether.
>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to proc=
ess all
>>>>> related chunks of data in single
>>>>>> burst by passing them as ops array but later found that as not-so-us=
eful for
>>>>> PMD handling for various
>>>>>> reasons. You may please refer to RFC v1 doc review comments for same=
.
>>>>> [Fiona] Agree with Shally. In summary, as only one op can be processe=
d at a
>>>>> time, since each needs the
>>>>> state of the previous, to allow more than 1 op to be in-flight at a t=
ime would
>>>>> force PMDs to implement internal queueing and exception handling for
>>>>> OUT_OF_SPACE conditions you mention.
>>> [Ahmed] But we are putting the ops on qps which would make them
>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more
>>> complex but doable.
>> [Fiona] In my opinion this is not doable, could be very inefficient.
>> There may be many streams.
>> The PMD would have to have an internal queue per stream so
>> it could adjust the next src offset and length in the OUT_OF_SPACE case.
>> And this may ripple back though all subsequent ops in the stream as each
>> source len is increased and its dst buffer is not big enough.
>[Ahmed] Regarding multi op OUT_OF_SPACE handling.
>The caller would still need to adjust
>the src length/output buffer as you say. The PMD cannot handle
>OUT_OF_SPACE internally.
>After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream
>until it gets explicit
>confirmation from the caller to continue working on this stream. Any ops
>received by
>the PMD should be returned to the caller with status STREAM_PAUSED since
>the caller did not
>explicitly acknowledge that it has solved the OUT_OF_SPACE issue.
>These semantics can be enabled by adding a new function to the API
>perhaps stream_resume().
>This allows the caller to indicate that it acknowledges that it has seen
>the issue and this op
>should be used to resolve the issue. Implementations that do not support
>this mode of use
>can push back immediately after one op is in flight. Implementations
>that support this use
>mode can allow many ops from the same session
>
[Shally] Is it still in context of having single burst where all op belongs=
 to one stream? If yes, I would still say it would add an overhead to PMDs =
especially if it is expected to work closer to HW (which I think is the cas=
e with DPDK PMD).
Though your approach is doable but why this all cannot be in a layer above =
PMD? i.e. a layer above PMD can either pass one-op at a time with burst siz=
e =3D 1 OR can make chained mbuf of input and output and pass than as one o=
p.
Is it just to ease applications of chained mbuf burden or do you see any pe=
rformance /use-case impacting aspect also?

if it is in context where each op belong to different stream in a burst, th=
en why do we need stream_pause and resume? It is a expectations from app to=
 pass more output buffer with consumed + 1 from next call onwards as it has=
 already
seen OUT_OF_SPACE.

>Regarding the ordering of ops
>We do force serialization of ops belonging to a stream in STATEFUL
>operation. Related ops do
>not go out of order and are given to available PMDs one at a time.
>
>>> The question is this mode of use useful for real
>>> life applications or would we be just adding complexity? The technical
>>> advantage of this is that processing of Stateful ops is interdependent
>>> and PMDs can take advantage of caching and other optimizations to make
>>> processing related ops much faster than switching on every op. PMDs hav=
e
>>> maintain state of more than 32 KB for DEFLATE for every stream.
>>>>> If the application has all the data, it can put it into chained mbufs=
 in a single
>>>>> op rather than
>>>>> multiple ops, which avoids pushing all that complexity down to the PM=
Ds.
>>> [Ahmed] I think that your suggested scheme of putting all related mbufs
>>> into one op may be the best solution without the extra complexity of
>>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra
>>> time If we have a way of marking mbufs as ready for consumption. The
>>> enqueuer may not have all the data at hand but can enqueue the op with =
a
>>> couple of empty mbus marked as not ready for consumption. The enqueuer
>>> will then update the rest of the mbufs to ready for consumption once th=
e
>>> data is added. This introduces a race condition. A second flag for each
>>> mbuf can be updated by the PMD to indicate that it processed it or not.
>>> This way in cases where the PMD beat the application to the op, the
>>> application will just update the op to point to the first unprocessed
>>> mbuf and resend it to the PMD.
>> [Fiona] This doesn't sound safe. You want to add data to a stream after =
you've
>> enqueued the op. You would have to write to op.src.length at a time when=
 the PMD
>> might be reading it. Sounds like a lock would be necessary.
>> Once the op has been enqueued, my understanding is its ownership is hand=
ed
>> over to the PMD and the application should not touch it until it has bee=
n dequeued.
>> I don't think it's a good idea to change this model.
>> Can't the application just collect a stream of data in chained mbufs unt=
il it has
>> enough to send an op, then construct the op and while waiting for that o=
p to
>> complete, accumulate the next batch of chained mbufs? Only construct the=
 next op
>> after the previous one is complete, based on the result of the previous =
one.
>>
>[Ahmed] Fair enough. I agree with you. I imagined it in a different way
>in which each mbuf would have its own length.
>The advantage to gain is in applications where there is one PMD user,
>the down time between ops can be significant and setting up a single
>producer consumer pair significantly reduces the CPU cycles and PMD down
>time.
>
>////snip////