netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic classes
Date: Thu, 21 May 2020 12:07:52 -0700	[thread overview]
Message-ID: <20200521120752.07fd83aa@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <VI1PR0402MB38710DAD1F17B80F83403396E0B60@VI1PR0402MB3871.eurprd04.prod.outlook.com>

On Wed, 20 May 2020 20:24:43 +0000 Ioana Ciornei wrote:
> > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> > classes
> > 
> > On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:  
> > > DPAA2 has frame queues per each Rx traffic class and the decision from
> > > which queue to pull frames from is made by the HW based on the queue
> > > priority within a channel (there is one channel per each CPU).  
> > 
> > IOW you're reading the descriptor for the device memory/iomem address and
> > the HW will return the next descriptor based on configured priority?  
> 
> That's the general idea but the decision is not made on a frame by frame bases
> but rather on a dequeue operation which can, at a maximum, return
> 16 frame descriptors at a time.

I see!

> > Presumably strict priority?  
> 
> Only the two highest traffic classes are in strict priority, while the other 6 TCs
> form two priority tiers - medium(4 TCs) and low (last two TCs).
> 
> > > If this should be modeled in software, then I assume there should be a
> > > NAPI instance for each traffic class and the stack should know in
> > > which order to call the poll() callbacks so that the priority is respected.  
> > 
> > Right, something like that. But IMHO not needed if HW can serve the right
> > descriptor upon poll.  
> 
> After thinking this through I don't actually believe that multiple NAPI instances
> would solve this in any circumstance at all:
> 
> - If you have hardware prioritization with full scheduling on dequeue then job on the
> driver side is already done.
> - If you only have hardware assist for prioritization (ie hardware gives you multiple
> rings but doesn't tell you from which one to dequeue) then you can still use a single
> NAPI instance just fine and pick the highest priority non-empty ring on-the-fly basically.
> 
> What I am having trouble understanding is how the fully software implementation
> of this possible new Rx qdisc should work. Somehow the skb->priority should be taken
> into account when the skb is passing though the stack (ie a higher priority skb should
> surpass another previously received skb even if the latter one was received first, but
> its priority queue is congested).

I'd think the SW implementation would come down to which ring to
service first. If there are multiple rings on the host NAPI can try
to read from highest priority ring first and then move on to next prio.
Not sure if there would be a use case for multiple NAPIs for busy
polling or not.

I was hoping we can solve this with the new ring config API (which is
coming any day now, ehh) - in which I hope user space will be able to
assign rings to NAPI instances, all we would have needed would be also
controlling the querying order. But that doesn't really work for you,
it seems, since the selection is offloaded to HW :S

> I don't have a very deep understanding of the stack but I am thinking that the
> enqueue_to_backlog()/process_backlog() area could be a candidate place for sorting out
> bottlenecks. In case we do that I don't see why a qdisc would be necessary at all and not
> have everybody benefit from prioritization based on skb->priority.

I think once the driver picks the frame up it should run with it to
completion (+/-GRO). We have natural batching with NAPI processing.
Every NAPI budget high priority rings get a chance to preempt lower
ones.

  reply	other threads:[~2020-05-21 19:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15 18:47 [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic classes Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 1/7] dpaa2-eth: Add " Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 2/7] dpaa2-eth: Trim debugfs FQ stats Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 3/7] dpaa2-eth: Distribute ingress frames based on VLAN prio Ioana Ciornei
2020-05-15 23:07   ` kbuild test robot
2020-05-15 18:47 ` [PATCH v2 net-next 4/7] dpaa2-eth: Add helper functions Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 5/7] dpaa2-eth: Minor cleanup in dpaa2_eth_set_rx_taildrop() Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 6/7] dpaa2-eth: Add congestion group taildrop Ioana Ciornei
2020-05-15 18:47 ` [PATCH v2 net-next 7/7] dpaa2-eth: Update FQ taildrop threshold and buffer pool count Ioana Ciornei
2020-05-15 19:20 ` [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic classes Jakub Kicinski
2020-05-15 19:31   ` Ioana Ciornei
2020-05-15 19:40     ` Jakub Kicinski
2020-05-15 20:48       ` Ioana Ciornei
2020-05-15 22:25         ` Jakub Kicinski
2020-05-15 23:33           ` David Miller
2020-05-16  8:16           ` Ioana Ciornei
2020-05-18 19:35             ` Jakub Kicinski
2020-05-19  7:38               ` Ioana Ciornei
2020-05-19 18:43                 ` Jakub Kicinski
2020-05-19 20:58                   ` Ioana Ciornei
2020-05-19 21:35                     ` Jakub Kicinski
2020-05-20 15:10                       ` Ioana Ciornei
2020-05-20 19:12                         ` Jakub Kicinski
2020-05-20 20:24                           ` Ioana Ciornei
2020-05-21 19:07                             ` Jakub Kicinski [this message]
2020-05-22 13:58                               ` Ioana Ciornei
2020-05-29 11:45               ` Ioana Ciornei
2020-05-29 19:40                 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521120752.07fd83aa@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ioana.ciornei@nxp.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).