netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <Daniel.Machon@microchip.com>
To: <petrm@nvidia.com>
Cc: <netdev@vger.kernel.org>, <kuba@kernel.org>,
	<vinicius.gomes@intel.com>, <vladimir.oltean@nxp.com>,
	<thomas.petazzoni@bootlin.com>, <Allan.Nielsen@microchip.com>,
	<maxime.chevallier@bootlin.com>, <nikolay@nvidia.com>,
	<roopa@nvidia.com>
Subject: Re: Basic PCP/DEI-based queue classification
Date: Sun, 21 Aug 2022 20:58:05 +0000	[thread overview]
Message-ID: <YwKeVQWtVM9WC9Za@DEN-LT-70577> (raw)
In-Reply-To: <874jy8mo0n.fsf@nvidia.com>

Hi Petr,
Thank you for your answer.

> > Hi netdev,
> >
> > I am posting this thread in continuation of:
> >
> > https://lore.kernel.org/netdev/20220415173718.494f5fdb@fedora/
> >
> > and as a new starting point for further discussion of offloading PCP-based
> > queue classification into the classification tables of a switch.
> >
> > Today, we use a proprietary tool to configure the internal switch tables for
> > PCP/DEI and DSCP based queue classification [1]. We are, however, looking for
> > an upstream solution.
> >
> > More specifically we want an upstream solution which allows projects like DENT
> > and others with similar purpose to implement the ieee802-dot1q-bridge.yang [2].
> > As a first step we would like to focus on the priority maps of the "Priority
> > Code Point Decoding Table" and "Priority Code Point Enconding table" of the
> > 802.1Q-2018 standard. These tables are well defined and maps well to the
> > hardware.
> >
> > The purpose is not to create a new kernel interface which looks like what IEEE
> > defines - but rather to do the needed plumbing to allow user-space tools to
> > implement an interface like this.
> >
> > In essence we need an upstream solution that initially supports:
> >
> >  - Per-port mapping of PCP/DEI to QoS class. For both ingress and egress.
> >
> >  - Per-port default priority for frames which are not VLAN tagged.
> 
> This exists in DCB APP. Rules with selector 1 (EtherType) and PID 0
> assign a default priority. iproute2's dcb tool supports this.
> 
> >  - Per-port configuration of "trust" to signal if the VLAN-prio shall be used,
> >    or if port default priority shall be used.
> 
> This would be nice. Currently mlxsw ports are in trust PCP mode until
> the user configures any DSCP rules. Then it switches to trust DSCP.
> There's no way to express "trust both", or to configure the particular
> PCP mapping for trust PCP (it's just hardcoded as 1:1).

Right, so this could be of use by you guys as well.

> 
> Re this "VLAN or default", note it's not (always) either-or. In Spectrum
> switches, the default priority is always applicable. E.g. for a port in
> trust PCP mode, if a packet has no 802.1q header, it gets port-default
> priority. 802.1q describes the default priority as "for use when
> application priority is not otherwise specified", so I think this
> behavior actually matches the standard.
> 
> > In the old thread, Maxime has compiled a list of ways we can possibly offload
> > the queue classification. However none of them is a good match for our purpose,
> > for the following reasons:
> >
> >  - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
> >    and would require one hardware-table entry per rule. Even less of a match
> >    when DEI is also considered. These tools are well suited for advanced
> >    classification, and not so much for basic per-port classification.
> 
> Yeah.
> 
> Offloading this is a pain. You need to parse out the particular shape of
> rules (which is not a big deal honestly), and make sure the ordering of
> the rules is correct and matches what the HW is doing. And tolerate any
> ACL-/TCAM- like rules as well. And there's mismatch between how a
> missing rule behaves in SW (fall-through) and HW (likely priority 0 gets
> assigned).
> 
> And configuration is pain as well, because a) it's a whole bunch of
> rules to configure, and b) you need to be aware of all the limitations
> from the previous paragraph and manage the coexistence with ACL/TCAM
> rules.
> 
> It's just not a great story for this functionality.
> 
> I wonder if a specialized filter or action would make things easier to
> work with. Something like "matchall action dcb dscp foo bar priority 7".
> 

I really think that pcp mapping should not go into tc. It is just not 
user-friendly at all, and I believe better alternatives exists.

> >  - ip-link: The ingress and egress maps of ip-link is per-linux-vlan interface;
> >    we need per-port mapping. Not possible to map both PCP and DEI.
> >
> >  - dcb-app: Not possible to map PCP/DEI (only DSCP).
> >
> > We have been looking around the kernel to snoop what other switch driver
> > developers do, to configure basic per-port PCP/DEI based queue classification,
> > and have not been able to find anything useful, in the standard kernel
> > interfaces.  It seems like people use their own out-of-tree tools to configure
> > this (like mlnx_qos from Mellanox [3]).
> >
> > Finally, we would appreciate any input to this, as we are looking for an
> > upstream solution that can be accepted by the community. Hopefully we can
> > arrive at some consensus on whether this is a feature that can be of general
> > use by developers, and furthermore, in which part of the kernel it should
> > reside:
> >
> >  - ethtool: add new setting to configure the pcp tables (seems like a good
> >    candidate to us).
> >
> >  - ip-link: add support for per-port-interface ingress and egress mapping of
> >    pcp/dei
> >
> >  - dcb-*: as an extension or new command to the dcb utilities. The pcp tables
> >    seems to be in line with what dcb-app does with the application priority
> >    table.
> 
> I'm not a fan of DCB, but the TC story is so unconvincing that this
> looks good in comparison.
> 

Agree.

> But note that DCB as such is standardized. I think the dcb-maxrate
> interfaces are not, and the DCB subsystem has a whole bunch of weird
> pre-standard stuff that's not exposed. But what's in iproute2 dcb is
> largely standard. So maybe this should be hidden under some extension
> attribute.
> 

So a pcp mapping functionality could very well go into dcb as an extension,
for the following reasons:

 - dcb already contains non-standard extension (dcb-maxrate)

 - Adding an extension (dcb-pcp?) for configuring the pcp tables of ieee-802.1q
   seems to be in line with what dcb-app is doing with the app table. Now, the
   app table and the pcp tables are different, but they are both inteded to map
   priority to queue (dscp or pcp/dei).

 - default prio for non-tagged frames is already possible in dcb-app

 - dscp priority mapping is also possible in dcb-app

 - dcb already has the necessary data structures for mapping priority to queue 
   (array parameter)

 - Seems conventient to place the priority mapping in one place (dscp and pcp/dei).

Any thoughts?

> >  - somewhere else
> >
> > In summary:
> >
> >  - We would like feedback from the community on the suggested implemenation of
> >    the ieee-802.1Q Priority Code Point encoding an decoding tables.
> >
> >  - And if we can agree that such a solution could and should be implemented;
> >    where should the implemenation go?
> >
> >  - Also, should the solution be supported in the sw-bridge as well.
> 
> That would be ideal, yeah.

  reply	other threads:[~2022-08-21 20:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19  9:09 Basic PCP/DEI-based queue classification Daniel.Machon
2022-08-19 10:50 ` Petr Machata
2022-08-21 20:58   ` Daniel.Machon [this message]
2022-08-22 10:34     ` Petr Machata
2022-08-24  7:39       ` Daniel.Machon
2022-08-24  9:45         ` Petr Machata
2022-08-24 17:55           ` Daniel.Machon
2022-08-24 19:36             ` Petr Machata
2022-08-25  0:54               ` Jakub Kicinski
2022-08-26 18:11                 ` Petr Machata
2022-08-29  7:53                 ` Allan W. Nielsen
2022-09-02 13:32                   ` Vladimir Oltean
2022-09-07 10:41                     ` Daniel.Machon
2022-09-07 17:26                       ` Vladimir Oltean
2022-09-07 19:57                         ` Daniel.Machon
2022-09-08  8:03                           ` Allan Nielsen - M31684
2022-09-08 11:18                           ` Petr Machata
2022-09-08 12:01                             ` Daniel.Machon
2022-09-09 12:11                           ` Vladimir Oltean
2022-09-08  8:27                         ` Petr Machata
2022-08-25 11:31               ` Daniel.Machon
2022-08-25 13:30                 ` Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwKeVQWtVM9WC9Za@DEN-LT-70577 \
    --to=daniel.machon@microchip.com \
    --cc=Allan.Nielsen@microchip.com \
    --cc=kuba@kernel.org \
    --cc=maxime.chevallier@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=petrm@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vinicius.gomes@intel.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).