Re: [PATCH v5 06/20] event/sw: add support for event queues

From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: "Van Haaren, Harry" <harry.van.haaren@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [PATCH v5 06/20] event/sw: add support for event queues
Date: Tue, 28 Mar 2017 23:06:11 +0530	[thread overview]
Message-ID: <20170328173610.3hi6wyqvdpx2lo7e@localhost.localdomain> (raw)
In-Reply-To: <E923DB57A917B54B9182A2E928D00FA612A20807@IRSMSX102.ger.corp.intel.com>

On Tue, Mar 28, 2017 at 12:42:27PM +0000, Van Haaren, Harry wrote:
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, March 28, 2017 11:43 AM
> > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> > Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > Subject: Re: [PATCH v5 06/20] event/sw: add support for event queues
> > 
> > On Mon, Mar 27, 2017 at 03:17:48PM +0000, Van Haaren, Harry wrote:
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Monday, March 27, 2017 8:45 AM
> > > > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> > > > Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > > > Subject: Re: [PATCH v5 06/20] event/sw: add support for event queues
> 
> <snip code + details>
> 
> > > > Just for my understanding, Are 4(SW_IQS_MAX) iq rings created to address
> > > > different priority for each enqueue operation? What is the significance of
> > > > 4(SW_IQS_MAX) here?
> > >
> > > Yes each IQ represents a priority level. There is a compile-time define (SW_IQS_MAX) which
> > allows setting the number of internal-queues at each queue stage. The default number of
> > priorities is currently 4.
> > 
> > OK. The reason why I asked because, If i understood it correctly the
> > PRIO_TO_IQ is not normalizing it correctly if SW_IQS_MAX == 4.
> > 
> > I thought following mapping will be the correct normalization if SW_IQS_MAX
> > == 4
> > 
> > What do you think?
> 
> <snip code suggestion + api header>
> 
> Good catch - agreed, will fix.
> 
> 
> > > > > +static int
> > > > > +sw_queue_setup(struct rte_eventdev *dev, uint8_t queue_id,
> > > > > +		const struct rte_event_queue_conf *conf)
> > > > > +{
> > > > > +	int type;
> > > > > +
> > > > > +	switch (conf->event_queue_cfg) {
> > > > > +	case RTE_EVENT_QUEUE_CFG_SINGLE_LINK:
> > > > > +		type = SW_SCHED_TYPE_DIRECT;
> > > > > +		break;
> > > >
> > > > event_queue_cfg is a bitmap. It is valid to have
> > > > RTE_EVENT_QUEUE_CFG_SINGLE_LINK | RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY.
> > > > i.e An atomic schedule type queue and it has only one port linked to
> > > > dequeue the events.
> > > > So in the above context, The switch case is not correct. i.e
> > > > it goes to the default condition. Right?
> > > > Is this intentional?
> > > >
> > > > If I understand it correctly, Based on the use case(grouped based event
> > > > pipelining), you have shared in
> > > > the documentation patch. RTE_EVENT_QUEUE_CFG_SINGLE_LINK used for last
> > > > stage(last queue). One option is if SW PMD cannot support
> > > > RTE_EVENT_QUEUE_CFG_SINGLE_LINK | RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY mode
> > > > then even tough application sets the RTE_EVENT_QUEUE_CFG_SINGLE_LINK |
> > > > RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY, driver can ignore
> > > > RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY. But I am not sure the case where
> > > > application sets RTE_EVENT_QUEUE_CFG_SINGLE_LINK in the middle of the pipeline.
> > > >
> > > > Thoughts?
> > >
> > >
> > > I don't like the idea of the SW PMD ignoring flags for queues - the PMD has no idea if the
> > queue is the final or middle of the pipeline as it's the applications usage which defines that.
> > >
> > >
> > > Does anybody have a need for a queue to be both Atomic *and* Single-link?  I understand the
> > current API doesn't prohibit it, but I don't see the actual use-case in which that may be
> > useful. Atomic implies load-balancing is occurring, single link implies there is only one
> > consuming core. Those seem like opposites to me?
> > >
> > > Unless anybody sees value in queue's having both, I suggest we update the documentation to
> > specify that a queue is either load balanced, or single-link, and that setting both flags will
> > result in -ENOTSUP being returned. (This check can be added to EventDev layer if consistent for
> > all PMDs).
> > 
> > If I understand it correctly(Based on the previous discussions),
> > HW implementations(Cavium or NXP) does not
> > need to use RTE_EVENT_QUEUE_CFG_* flags for the operations(sched type
> > will be derived from event.sched_type on enqueue). So that means we are
> > free to tailor the header file based on the SW PMD requirement on this.
> > But semantically it has to be inline with rest of the header file.We can
> > work together to make it happen.
> 
> OK :)
> 
> 
> > A few question on everyone benefit:
> > 
> > 1) Does RTE_EVENT_QUEUE_CFG_SINGLE_LINK has any other meaning other than an
> > event queue linked only to single port?  Based on the discussions, It was
> > add in the header file so that SW PMD can know upfront only single port
> > will be linked to the given event queue. It is added as an optimization for SW
> > PMD. Does it has any functional expectation?
> 
> In the context of the SW PMD, SINGLE_LINK means that a specific queue and port have a unique relationship in that there is only connection. This allows bypassing of Atomic, Ordering and Load-Balancing code. The result is a good performance increase, particularly if the worker port dequeue depth is large, as then large bursts of packets can be dequeued with little overhead.
> 
> As a result, (ATOMIC | SINGLE_LINK) is not a supported combination for the sw pmd queue types.
> To be more precise, a SINGLE_LINK is its own queue type, and can not be OR-ed with any other type.
> 
> 
> > 2) Based on following topology given in documentation patch for queue
> > based event pipelining,
> > 
> >   rx_port    w1_port
> > 	 \     /         \
> > 	  qid0 - w2_port - qid1
> > 	       \         /     \
> > 		    w3_port        tx_port
> > 
> > a) I understand, rx_port is feeding events to qid0
> > b) But, Do you see any issue with following model? IMO, It scales well
> > linearly based on number of cores available to work(Since it is ATOMIC to
> > ATOMIC). Nothing wrong with
> > qid1 just connects to tx_port, I am just trying understand the rational
> > behind it?
> > 
> >   rx_port   w1_port         w1_port
> > 	 \     /         \     /
> > 	  qid0 - w2_port - qid1- w2_port
> > 	       \         /     \
> > 		   w3_port         w3_port
> 
> 
> This is also a valid model from the SW eventdev. 

OK. If understand it correctly, On the above topology,  Even though you
make qid2 as ATOMIC. SW PMD will not maintain ingress order when comes out of
qid1 on different workers. A SINGLE_LINK queue with one port attached
scheme is required at end of the pipeline or where ever ordering has to be
maintained. Is my understanding correct?

> The value of using a SINGLE_LINK at the end of a pipeline is
> A) can TX all traffic on a single core (using a single queue)
> B) re-ordering of traffic from the previous stage is possible
> 
> To illustrate (B), a very simple pipeline here
> 
>  RX port -> QID #1 (Ordered) -> workers(eg 4 ports) -> QID # 2 (SINGLE_LINK to tx) -> TX port
> 
> Here, QID #1 is allowed to send the packets out of order to the 4 worker ports - because they are later passed back to the eventdev for re-ordering before they get to the SINGLE_LINK stage, and then TX in the correct order.
> 
> 
> > 3)
> > > Does anybody have a need for a queue to be both Atomic *and* Single-link?  I understand the
> > current API doesn't prohibit it, but I don't see the actual use-case in which that may be
> > useful. Atomic implies load-balancing is occurring, single link implies there is only one
> > consuming core. Those seem like opposites to me?
> > 
> > I can think about the following use case:
> > 
> > topology:
> > 
> >   rx_port    w1_port
> > 	 \     /         \
> > 	  qid0 - w2_port - qid1
> > 	       \         /     \
> > 		    w3_port        tx_port
> > 
> > Use case:
> > 
> > Queue based event pipeling:
> > ORERDED(Stage1) to ATOMIC(Stage2) pipeline:
> > - For ingress order maintenance
> > - For executing Stage 1 in parallel for better scaling
> > i.e A fat flow can spray over N cores while maintaining the ingress
> > order when it sends out on the wire(after consuming from tx_port)
> > 
> > I am not sure how SW PMD work in the use case of ingress order maintenance.
> 
> I think my illustration of (B) above is the same use-case as you have here. Instead of using an ATOMIC stage2, the SW PMD benefits from using the SINGLE_LINK port/queue, and the SINGLE_LINK queue ensures ingress order is also egress order to the TX port.
> 
> 
> > But the HW and header file expects this form:
> > Snippet from header file:
> > --
> >  * The source flow ordering from an event queue is maintained when events are
> >  * enqueued to their destination queue within the same ordered flow context.
> >  *
> >  * Events from the source queue appear in their original order when dequeued
> >  * from a destination queue.
> > --
> > Here qid0 is source queue with ORDERED sched_type and qid1 is destination
> > queue with ATOMIC sched_type. qid1 can be linked to only port(tx_port).
> > 
> > Are we on same page? If not, let me know the differences? We will try to
> > accommodate the same in header file.
> 
> Yes I think we are saying the same thing, using slightly different words.
> 
> To summarize;
> - SW PMD sees SINGLE_LINK as its own queue type, and does not support load-balanced (Atomic Ordered, Parallel) queue functionality.
> - SW PMD would use a SINGLE_LINK queue/port for the final stage of a pipeline
>    A) to allow re-ordering to happen if required
>    B) to merge traffic from multiple ports into a single stream for TX
> 
> A possible solution;
> 1) The application creates a SINGLE_LINK for the purpose of ensuring re-ordering is taking place as expected, and linking only one port for TX.

The only issue is in Low-end cores case it wont scale. TX core will become as
bottleneck and we need to have different pipelines based on the amount of traffic(40G or 10G)
a core can handle.

> 2) SW PMDs can create a SINGLE_LINK queue type, and benefit from the optimization

Yes.

> 3) HW PMDs can ignore the "SINGLE_LINK" aspect and uses an ATOMIC instead (as per your example in 3) above)

But topology will be fixed for both HW and SW. An extra port and
extra core needs to wasted for ordering business in case HW. Right?

I think, we can roll out something based on capability.

> 
> The application doesn't have to change anything, and just configures its pipeline. The PMD is able to optimize if it makes sense (SW) or just use another queue type to provide the same functionality to the application (HW).
> 
> Thoughts? -Harry