All of lore.kernel.org
 help / color / mirror / Atom feed
* Basic PCP/DEI-based queue classification
@ 2022-08-19  9:09 Daniel.Machon
  2022-08-19 10:50 ` Petr Machata
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-08-19  9:09 UTC (permalink / raw)
  To: netdev
  Cc: kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, nikolay, roopa

Hi netdev,

I am posting this thread in continuation of:

https://lore.kernel.org/netdev/20220415173718.494f5fdb@fedora/

and as a new starting point for further discussion of offloading PCP-based
queue classification into the classification tables of a switch.

Today, we use a proprietary tool to configure the internal switch tables for
PCP/DEI and DSCP based queue classification [1]. We are, however, looking for
an upstream solution.

More specifically we want an upstream solution which allows projects like DENT
and others with similar purpose to implement the ieee802-dot1q-bridge.yang [2].
As a first step we would like to focus on the priority maps of the "Priority
Code Point Decoding Table" and "Priority Code Point Enconding table" of the
802.1Q-2018 standard. These tables are well defined and maps well to the
hardware.

The purpose is not to create a new kernel interface which looks like what IEEE
defines - but rather to do the needed plumbing to allow user-space tools to
implement an interface like this.

In essence we need an upstream solution that initially supports:

 - Per-port mapping of PCP/DEI to QoS class. For both ingress and egress.

 - Per-port default priority for frames which are not VLAN tagged.

 - Per-port configuration of "trust" to signal if the VLAN-prio shall be used,
   or if port default priority shall be used.

In the old thread, Maxime has compiled a list of ways we can possibly offload
the queue classification. However none of them is a good match for our purpose,
for the following reasons:

 - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
   and would require one hardware-table entry per rule. Even less of a match
   when DEI is also considered. These tools are well suited for advanced
   classification, and not so much for basic per-port classification.

 - ip-link: The ingress and egress maps of ip-link is per-linux-vlan interface;
   we need per-port mapping. Not possible to map both PCP and DEI.

 - dcb-app: Not possible to map PCP/DEI (only DSCP).

We have been looking around the kernel to snoop what other switch driver
developers do, to configure basic per-port PCP/DEI based queue classification,
and have not been able to find anything useful, in the standard kernel
interfaces.  It seems like people use their own out-of-tree tools to configure
this (like mlnx_qos from Mellanox [3]).

Finally, we would appreciate any input to this, as we are looking for an
upstream solution that can be accepted by the community. Hopefully we can
arrive at some consensus on whether this is a feature that can be of general
use by developers, and furthermore, in which part of the kernel it should
reside:

 - ethtool: add new setting to configure the pcp tables (seems like a good
   candidate to us).

 - ip-link: add support for per-port-interface ingress and egress mapping of
   pcp/dei

 - dcb-*: as an extension or new command to the dcb utilities. The pcp tables
   seems to be in line with what dcb-app does with the application priority
   table.

 - somewhere else

In summary:

 - We would like feedback from the community on the suggested implemenation of
   the ieee-802.1Q Priority Code Point encoding an decoding tables.

 - And if we can agree that such a solution could and should be implemented;
   where should the implemenation go?

 - Also, should the solution be supported in the sw-bridge as well.

Thank you,
Best regards,

Daniel

[1] https://github.com/microchip-ung/qos-utils
[2] https://github.com/YangModels/yang/blob/main/standard/ieee/published/802.1/ieee802-dot1q-bridge.yang
[3] https://github.com/Mellanox/mlnx-tools/blob/master/ofed_scripts/utils/mlnx_qos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-19  9:09 Basic PCP/DEI-based queue classification Daniel.Machon
@ 2022-08-19 10:50 ` Petr Machata
  2022-08-21 20:58   ` Daniel.Machon
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2022-08-19 10:50 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: netdev, kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, nikolay, roopa


<Daniel.Machon@microchip.com> writes:

> Hi netdev,
>
> I am posting this thread in continuation of:
>
> https://lore.kernel.org/netdev/20220415173718.494f5fdb@fedora/
>
> and as a new starting point for further discussion of offloading PCP-based
> queue classification into the classification tables of a switch.
>
> Today, we use a proprietary tool to configure the internal switch tables for
> PCP/DEI and DSCP based queue classification [1]. We are, however, looking for
> an upstream solution.
>
> More specifically we want an upstream solution which allows projects like DENT
> and others with similar purpose to implement the ieee802-dot1q-bridge.yang [2].
> As a first step we would like to focus on the priority maps of the "Priority
> Code Point Decoding Table" and "Priority Code Point Enconding table" of the
> 802.1Q-2018 standard. These tables are well defined and maps well to the
> hardware.
>
> The purpose is not to create a new kernel interface which looks like what IEEE
> defines - but rather to do the needed plumbing to allow user-space tools to
> implement an interface like this.
>
> In essence we need an upstream solution that initially supports:
>
>  - Per-port mapping of PCP/DEI to QoS class. For both ingress and egress.
>
>  - Per-port default priority for frames which are not VLAN tagged.

This exists in DCB APP. Rules with selector 1 (EtherType) and PID 0
assign a default priority. iproute2's dcb tool supports this.

>  - Per-port configuration of "trust" to signal if the VLAN-prio shall be used,
>    or if port default priority shall be used.

This would be nice. Currently mlxsw ports are in trust PCP mode until
the user configures any DSCP rules. Then it switches to trust DSCP.
There's no way to express "trust both", or to configure the particular
PCP mapping for trust PCP (it's just hardcoded as 1:1).

Re this "VLAN or default", note it's not (always) either-or. In Spectrum
switches, the default priority is always applicable. E.g. for a port in
trust PCP mode, if a packet has no 802.1q header, it gets port-default
priority. 802.1q describes the default priority as "for use when
application priority is not otherwise specified", so I think this
behavior actually matches the standard.

> In the old thread, Maxime has compiled a list of ways we can possibly offload
> the queue classification. However none of them is a good match for our purpose,
> for the following reasons:
>
>  - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
>    and would require one hardware-table entry per rule. Even less of a match
>    when DEI is also considered. These tools are well suited for advanced
>    classification, and not so much for basic per-port classification.

Yeah.

Offloading this is a pain. You need to parse out the particular shape of
rules (which is not a big deal honestly), and make sure the ordering of
the rules is correct and matches what the HW is doing. And tolerate any
ACL-/TCAM- like rules as well. And there's mismatch between how a
missing rule behaves in SW (fall-through) and HW (likely priority 0 gets
assigned).

And configuration is pain as well, because a) it's a whole bunch of
rules to configure, and b) you need to be aware of all the limitations
from the previous paragraph and manage the coexistence with ACL/TCAM
rules.

It's just not a great story for this functionality.

I wonder if a specialized filter or action would make things easier to
work with. Something like "matchall action dcb dscp foo bar priority 7".

>  - ip-link: The ingress and egress maps of ip-link is per-linux-vlan interface;
>    we need per-port mapping. Not possible to map both PCP and DEI.
>
>  - dcb-app: Not possible to map PCP/DEI (only DSCP).
>
> We have been looking around the kernel to snoop what other switch driver
> developers do, to configure basic per-port PCP/DEI based queue classification,
> and have not been able to find anything useful, in the standard kernel
> interfaces.  It seems like people use their own out-of-tree tools to configure
> this (like mlnx_qos from Mellanox [3]).
>
> Finally, we would appreciate any input to this, as we are looking for an
> upstream solution that can be accepted by the community. Hopefully we can
> arrive at some consensus on whether this is a feature that can be of general
> use by developers, and furthermore, in which part of the kernel it should
> reside:
>
>  - ethtool: add new setting to configure the pcp tables (seems like a good
>    candidate to us).
>
>  - ip-link: add support for per-port-interface ingress and egress mapping of
>    pcp/dei
>
>  - dcb-*: as an extension or new command to the dcb utilities. The pcp tables
>    seems to be in line with what dcb-app does with the application priority
>    table.

I'm not a fan of DCB, but the TC story is so unconvincing that this
looks good in comparison.

But note that DCB as such is standardized. I think the dcb-maxrate
interfaces are not, and the DCB subsystem has a whole bunch of weird
pre-standard stuff that's not exposed. But what's in iproute2 dcb is
largely standard. So maybe this should be hidden under some extension
attribute.

>  - somewhere else
>
> In summary:
>
>  - We would like feedback from the community on the suggested implemenation of
>    the ieee-802.1Q Priority Code Point encoding an decoding tables.
>
>  - And if we can agree that such a solution could and should be implemented;
>    where should the implemenation go?
>
>  - Also, should the solution be supported in the sw-bridge as well.

That would be ideal, yeah.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-19 10:50 ` Petr Machata
@ 2022-08-21 20:58   ` Daniel.Machon
  2022-08-22 10:34     ` Petr Machata
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-08-21 20:58 UTC (permalink / raw)
  To: petrm
  Cc: netdev, kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, nikolay, roopa

Hi Petr,
Thank you for your answer.

> > Hi netdev,
> >
> > I am posting this thread in continuation of:
> >
> > https://lore.kernel.org/netdev/20220415173718.494f5fdb@fedora/
> >
> > and as a new starting point for further discussion of offloading PCP-based
> > queue classification into the classification tables of a switch.
> >
> > Today, we use a proprietary tool to configure the internal switch tables for
> > PCP/DEI and DSCP based queue classification [1]. We are, however, looking for
> > an upstream solution.
> >
> > More specifically we want an upstream solution which allows projects like DENT
> > and others with similar purpose to implement the ieee802-dot1q-bridge.yang [2].
> > As a first step we would like to focus on the priority maps of the "Priority
> > Code Point Decoding Table" and "Priority Code Point Enconding table" of the
> > 802.1Q-2018 standard. These tables are well defined and maps well to the
> > hardware.
> >
> > The purpose is not to create a new kernel interface which looks like what IEEE
> > defines - but rather to do the needed plumbing to allow user-space tools to
> > implement an interface like this.
> >
> > In essence we need an upstream solution that initially supports:
> >
> >  - Per-port mapping of PCP/DEI to QoS class. For both ingress and egress.
> >
> >  - Per-port default priority for frames which are not VLAN tagged.
> 
> This exists in DCB APP. Rules with selector 1 (EtherType) and PID 0
> assign a default priority. iproute2's dcb tool supports this.
> 
> >  - Per-port configuration of "trust" to signal if the VLAN-prio shall be used,
> >    or if port default priority shall be used.
> 
> This would be nice. Currently mlxsw ports are in trust PCP mode until
> the user configures any DSCP rules. Then it switches to trust DSCP.
> There's no way to express "trust both", or to configure the particular
> PCP mapping for trust PCP (it's just hardcoded as 1:1).

Right, so this could be of use by you guys as well.

> 
> Re this "VLAN or default", note it's not (always) either-or. In Spectrum
> switches, the default priority is always applicable. E.g. for a port in
> trust PCP mode, if a packet has no 802.1q header, it gets port-default
> priority. 802.1q describes the default priority as "for use when
> application priority is not otherwise specified", so I think this
> behavior actually matches the standard.
> 
> > In the old thread, Maxime has compiled a list of ways we can possibly offload
> > the queue classification. However none of them is a good match for our purpose,
> > for the following reasons:
> >
> >  - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
> >    and would require one hardware-table entry per rule. Even less of a match
> >    when DEI is also considered. These tools are well suited for advanced
> >    classification, and not so much for basic per-port classification.
> 
> Yeah.
> 
> Offloading this is a pain. You need to parse out the particular shape of
> rules (which is not a big deal honestly), and make sure the ordering of
> the rules is correct and matches what the HW is doing. And tolerate any
> ACL-/TCAM- like rules as well. And there's mismatch between how a
> missing rule behaves in SW (fall-through) and HW (likely priority 0 gets
> assigned).
> 
> And configuration is pain as well, because a) it's a whole bunch of
> rules to configure, and b) you need to be aware of all the limitations
> from the previous paragraph and manage the coexistence with ACL/TCAM
> rules.
> 
> It's just not a great story for this functionality.
> 
> I wonder if a specialized filter or action would make things easier to
> work with. Something like "matchall action dcb dscp foo bar priority 7".
> 

I really think that pcp mapping should not go into tc. It is just not 
user-friendly at all, and I believe better alternatives exists.

> >  - ip-link: The ingress and egress maps of ip-link is per-linux-vlan interface;
> >    we need per-port mapping. Not possible to map both PCP and DEI.
> >
> >  - dcb-app: Not possible to map PCP/DEI (only DSCP).
> >
> > We have been looking around the kernel to snoop what other switch driver
> > developers do, to configure basic per-port PCP/DEI based queue classification,
> > and have not been able to find anything useful, in the standard kernel
> > interfaces.  It seems like people use their own out-of-tree tools to configure
> > this (like mlnx_qos from Mellanox [3]).
> >
> > Finally, we would appreciate any input to this, as we are looking for an
> > upstream solution that can be accepted by the community. Hopefully we can
> > arrive at some consensus on whether this is a feature that can be of general
> > use by developers, and furthermore, in which part of the kernel it should
> > reside:
> >
> >  - ethtool: add new setting to configure the pcp tables (seems like a good
> >    candidate to us).
> >
> >  - ip-link: add support for per-port-interface ingress and egress mapping of
> >    pcp/dei
> >
> >  - dcb-*: as an extension or new command to the dcb utilities. The pcp tables
> >    seems to be in line with what dcb-app does with the application priority
> >    table.
> 
> I'm not a fan of DCB, but the TC story is so unconvincing that this
> looks good in comparison.
> 

Agree.

> But note that DCB as such is standardized. I think the dcb-maxrate
> interfaces are not, and the DCB subsystem has a whole bunch of weird
> pre-standard stuff that's not exposed. But what's in iproute2 dcb is
> largely standard. So maybe this should be hidden under some extension
> attribute.
> 

So a pcp mapping functionality could very well go into dcb as an extension,
for the following reasons:

 - dcb already contains non-standard extension (dcb-maxrate)

 - Adding an extension (dcb-pcp?) for configuring the pcp tables of ieee-802.1q
   seems to be in line with what dcb-app is doing with the app table. Now, the
   app table and the pcp tables are different, but they are both inteded to map
   priority to queue (dscp or pcp/dei).

 - default prio for non-tagged frames is already possible in dcb-app

 - dscp priority mapping is also possible in dcb-app

 - dcb already has the necessary data structures for mapping priority to queue 
   (array parameter)

 - Seems conventient to place the priority mapping in one place (dscp and pcp/dei).

Any thoughts?

> >  - somewhere else
> >
> > In summary:
> >
> >  - We would like feedback from the community on the suggested implemenation of
> >    the ieee-802.1Q Priority Code Point encoding an decoding tables.
> >
> >  - And if we can agree that such a solution could and should be implemented;
> >    where should the implemenation go?
> >
> >  - Also, should the solution be supported in the sw-bridge as well.
> 
> That would be ideal, yeah.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-21 20:58   ` Daniel.Machon
@ 2022-08-22 10:34     ` Petr Machata
  2022-08-24  7:39       ` Daniel.Machon
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2022-08-22 10:34 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: petrm, netdev, kuba, vinicius.gomes, vladimir.oltean,
	thomas.petazzoni, Allan.Nielsen, maxime.chevallier, nikolay,
	roopa


<Daniel.Machon@microchip.com> writes:

>> > In the old thread, Maxime has compiled a list of ways we can possibly offload
>> > the queue classification. However none of them is a good match for our purpose,
>> > for the following reasons:
>> >
>> >  - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
>> >    and would require one hardware-table entry per rule. Even less of a match
>> >    when DEI is also considered. These tools are well suited for advanced
>> >    classification, and not so much for basic per-port classification.
>> 
>> Yeah.
>> 
>> Offloading this is a pain. You need to parse out the particular shape of
>> rules (which is not a big deal honestly), and make sure the ordering of
>> the rules is correct and matches what the HW is doing. And tolerate any
>> ACL-/TCAM- like rules as well. And there's mismatch between how a
>> missing rule behaves in SW (fall-through) and HW (likely priority 0 gets
>> assigned).
>> 
>> And configuration is pain as well, because a) it's a whole bunch of
>> rules to configure, and b) you need to be aware of all the limitations
>> from the previous paragraph and manage the coexistence with ACL/TCAM
>> rules.
>> 
>> It's just not a great story for this functionality.
>> 
>> I wonder if a specialized filter or action would make things easier to
>> work with. Something like "matchall action dcb dscp foo bar priority 7".
>> 
>
> I really think that pcp mapping should not go into tc. It is just not 
> user-friendly at all, and I believe better alternatives exists.

"better" along the "more specialized, hence easier to configure" axis.
But DCB in particular doesn't address the SW datapath at all, and if you
want this sort of prioritization in SW, you have to fall back on flower
et.al. anyway.

> So a pcp mapping functionality could very well go into dcb as an extension,
> for the following reasons:
>
>  - dcb already contains non-standard extension (dcb-maxrate)

Yeah, and if the standard DCB ever gets maxrate knobs that are not 1:1
compatible with what we now have, we're in trouble.

Whatever new things is getting added, needs to be added in a way that
makes standard collisions very unlikely.

>  - Adding an extension (dcb-pcp?) for configuring the pcp tables of ieee-802.1q
>    seems to be in line with what dcb-app is doing with the app table. Now, the
>    app table and the pcp tables are different, but they are both inteded to map
>    priority to queue (dscp or pcp/dei).

(Not priority to queue, but header field values to priority. Queue
assignment is "dcb buffer prio-buffer" on ingress and "dcb ets prio-tc"
on egress.)

>  - default prio for non-tagged frames is already possible in dcb-app
>
>  - dscp priority mapping is also possible in dcb-app
>
>  - dcb already has the necessary data structures for mapping priority to queue 
>    (array parameter)
>
>  - Seems conventient to place the priority mapping in one place (dscp and pcp/dei).
>
> Any thoughts?

How do the pcp-prio rules work with the APP rules? There's the dscp-prio
sparse table, then there will be the pcp-prio (sparse?) table, what
happens if a packet arrives that has both headers? In Spectrum switches,
DSCP takes precedence, but that may not be universal.

It looks like adding "PCP" to APP would make the integration easiest.
Maybe we could use an out-of-band sel value for the selector, say 256,
to likely avoid incompatible standardization?

Then the trust level can be an array of selectors that shows how the
rules should be applied. E.g. [TCPUDP, DSCP, PCP]. Some of these
configurations are not supported by the HW and will be bounced by the
driver.

(Am I missing something in the standard? It doesn't seem to deal with
how the APP rules are actually applied at all.)


Another issue: DCB APP is a sparse table. There's a question of what
should happen for the e.g. DSCP values that don't have an APP entry.
Logically I think they should "fall through" to other APP rules as per
the selector array.

Thing is, ASICs probably don't support this "fall-through" feature. So I
don't know what to do with this. Kinda brings back some of that TC
complexity, where you need to add all the rules, otherwise the HW can't
be compatibly configured.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-22 10:34     ` Petr Machata
@ 2022-08-24  7:39       ` Daniel.Machon
  2022-08-24  9:45         ` Petr Machata
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-08-24  7:39 UTC (permalink / raw)
  To: petrm
  Cc: netdev, kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, nikolay, roopa

> How do the pcp-prio rules work with the APP rules? There's the dscp-prio
> sparse table, then there will be the pcp-prio (sparse?) table, what
> happens if a packet arrives that has both headers? In Spectrum switches,
> DSCP takes precedence, but that may not be universal.

In lan966x and sparx5 switches, dscp also takes precendence over pcp, in
default mode. Wrt. trust: DSCP mapping can be enabled/disabled and trusted
per-dscp-value. PCP mapping can be enabled/disabled, but not trusted
per-pcp-value. If DSCP mapping is enabled, and the DSCP value is trusted,
then DSCP mapping is used, otherwise PCP (if tagged).

> 
> It looks like adding "PCP" to APP would make the integration easiest.
> Maybe we could use an out-of-band sel value for the selector, say 256,
> to likely avoid incompatible standardization?
> 
> Then the trust level can be an array of selectors that shows how the
> rules should be applied. E.g. [TCPUDP, DSCP, PCP]. Some of these
> configurations are not supported by the HW and will be bounced by the
> driver.

We also need to consider the DEI bit. And also whether the mapping is for
ingress or egress.

This suddenly becomes quite an intrusive addition to an already standardized
APP interface.

As I hinted earlier, we could also add an entirely new PCP interface 
(like with maxrate), this will give us a bit more flexibility and will 
not crash with anything. This approach will not give is trust for DSCP, 
but maybe we can disregard this and go with a PCP solution initially?

> 
> (Am I missing something in the standard? It doesn't seem to deal with
> how the APP rules are actually applied at all.)

No, this part is somewhat vague.

> 
> 
> Another issue: DCB APP is a sparse table. There's a question of what
> should happen for the e.g. DSCP values that don't have an APP entry.
> Logically I think they should "fall through" to other APP rules as per
> the selector array.
> 
> Thing is, ASICs probably don't support this "fall-through" feature. So I
> don't know what to do with this. Kinda brings back some of that TC
> complexity, where you need to add all the rules, otherwise the HW can't
> be compatibly configured.

True. Let this be a PCP mapping that is inteded for the hw datapath only.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-24  7:39       ` Daniel.Machon
@ 2022-08-24  9:45         ` Petr Machata
  2022-08-24 17:55           ` Daniel.Machon
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2022-08-24  9:45 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: petrm, netdev, kuba, vinicius.gomes, vladimir.oltean,
	thomas.petazzoni, Allan.Nielsen, maxime.chevallier, roopa


<Daniel.Machon@microchip.com> writes:

>> How do the pcp-prio rules work with the APP rules? There's the dscp-prio
>> sparse table, then there will be the pcp-prio (sparse?) table, what
>> happens if a packet arrives that has both headers? In Spectrum switches,
>> DSCP takes precedence, but that may not be universal.
>
> In lan966x and sparx5 switches, dscp also takes precendence over pcp, in
> default mode. Wrt. trust: DSCP mapping can be enabled/disabled and trusted
> per-dscp-value. PCP mapping can be enabled/disabled, but not trusted
> per-pcp-value. If DSCP mapping is enabled, and the DSCP value is trusted,
> then DSCP mapping is used, otherwise PCP (if tagged).

Nice, so you can actually implement the sparsity of dscp-prio map. And
since PCP is always second in order, you can backfill any unspecified
PCP values from the default priority, or 0, and it will be semantically
the same.

>> It looks like adding "PCP" to APP would make the integration easiest.
>> Maybe we could use an out-of-band sel value for the selector, say 256,
>> to likely avoid incompatible standardization?
>> 
>> Then the trust level can be an array of selectors that shows how the
>> rules should be applied. E.g. [TCPUDP, DSCP, PCP]. Some of these
>> configurations are not supported by the HW and will be bounced by the
>> driver.
>
> We also need to consider the DEI bit. And also whether the mapping is for
> ingress or egress.

Yeah, I keep saying pcp-prio, but actually what I mean is (pcp,
dei)-prio. The standard likewise talks about DEI always in connection to
priority, I believe, nobody prioritizes on DEI alone.

> This suddenly becomes quite an intrusive addition to an already standardized
> APP interface.

The 802.1q DCB has APP selector at three bits. Even if the standard
decides to get more bits somewhere, it seems unlikely that they would
add very many, because how many different fields does one need to
prioritize on? So I would feel safe using a large value internally in
Linux. But yeah, it's a concern.

> As I hinted earlier, we could also add an entirely new PCP interface 
> (like with maxrate), this will give us a bit more flexibility and will 
> not crash with anything. This approach will not give is trust for DSCP, 
> but maybe we can disregard this and go with a PCP solution initially?

I would like to have a line of sight to how things will be done. Not
everything needs to be implemented at once, but we have to understand
how to get there when we need to. At least for issues that we can
already foresee now, such as the DSCP / PCP / default ordering.

Adding the PCP rules as a new APP selector, and then expressing the
ordering as a "selector policy" or whatever, IMHO takes care of this
nicely.

But OK, let's talk about the "flexibility" bit that you mention: what
does this approach make difficult or impossible?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-24  9:45         ` Petr Machata
@ 2022-08-24 17:55           ` Daniel.Machon
  2022-08-24 19:36             ` Petr Machata
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-08-24 17:55 UTC (permalink / raw)
  To: petrm
  Cc: netdev, kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, roopa

> >> How do the pcp-prio rules work with the APP rules? There's the dscp-prio
> >> sparse table, then there will be the pcp-prio (sparse?) table, what
> >> happens if a packet arrives that has both headers? In Spectrum switches,
> >> DSCP takes precedence, but that may not be universal.
> >
> > In lan966x and sparx5 switches, dscp also takes precendence over pcp, in
> > default mode. Wrt. trust: DSCP mapping can be enabled/disabled and trusted
> > per-dscp-value. PCP mapping can be enabled/disabled, but not trusted
> > per-pcp-value. If DSCP mapping is enabled, and the DSCP value is trusted,
> > then DSCP mapping is used, otherwise PCP (if tagged).
> 
> Nice, so you can actually implement the sparsity of dscp-prio map. And
> since PCP is always second in order, you can backfill any unspecified
> PCP values from the default priority, or 0, and it will be semantically
> the same.
> 
> >> It looks like adding "PCP" to APP would make the integration easiest.
> >> Maybe we could use an out-of-band sel value for the selector, say 256,
> >> to likely avoid incompatible standardization?
> >>
> >> Then the trust level can be an array of selectors that shows how the
> >> rules should be applied. E.g. [TCPUDP, DSCP, PCP]. Some of these
> >> configurations are not supported by the HW and will be bounced by the
> >> driver.
> >
> > We also need to consider the DEI bit. And also whether the mapping is for
> > ingress or egress.
> 
> Yeah, I keep saying pcp-prio, but actually what I mean is (pcp,
> dei)-prio. The standard likewise talks about DEI always in connection to
> priority, I believe, nobody prioritizes on DEI alone.
> 
> > This suddenly becomes quite an intrusive addition to an already standardized
> > APP interface.
> 
> The 802.1q DCB has APP selector at three bits. Even if the standard
> decides to get more bits somewhere, it seems unlikely that they would
> add very many, because how many different fields does one need to
> prioritize on? So I would feel safe using a large value internally in
> Linux. But yeah, it's a concern.
> 
> > As I hinted earlier, we could also add an entirely new PCP interface
> > (like with maxrate), this will give us a bit more flexibility and will
> > not crash with anything. This approach will not give is trust for DSCP,
> > but maybe we can disregard this and go with a PCP solution initially?
> 
> I would like to have a line of sight to how things will be done. Not
> everything needs to be implemented at once, but we have to understand
> how to get there when we need to. At least for issues that we can
> already foresee now, such as the DSCP / PCP / default ordering.
> 
> Adding the PCP rules as a new APP selector, and then expressing the
> ordering as a "selector policy" or whatever, IMHO takes care of this
> nicely.
> 
> But OK, let's talk about the "flexibility" bit that you mention: what
> does this approach make difficult or impossible?

It was merely a concern of not changing too much on something that is
already standard. Maybe I dont quite see how the APP interface can be
extended to accomodate for: pcp/dei, ingress/egress and trust. Lets
try to break it down:

  - pcp/dei: 
        this *could* be expressed in app->protocol and map 1:1 to the 
        pcp table entrise, so that 8*dei+pcp:priority. If I want to map 
        pcp 3, with dei 1 to priority 2, it would be encoded 11:2.

  - ingress/egress:
        I guess we need a selector for each? I notice that the mellanox
        driver uses the dcb_ieee_getapp_prio_dscp_mask_map and
        dcb_ieee_getapp_dscp_prio_mask_map for priority map and priority
        rewrite map, but these seems to be the same for both ingress and
        egress to me?

So far only subtle changes. Now how do you see trust going in. Can you
elaborate a little on the policy selector you mentioned?

/ Daniel
            


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-24 17:55           ` Daniel.Machon
@ 2022-08-24 19:36             ` Petr Machata
  2022-08-25  0:54               ` Jakub Kicinski
  2022-08-25 11:31               ` Daniel.Machon
  0 siblings, 2 replies; 22+ messages in thread
From: Petr Machata @ 2022-08-24 19:36 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: petrm, netdev, kuba, vinicius.gomes, vladimir.oltean,
	thomas.petazzoni, Allan.Nielsen, maxime.chevallier, roopa


<Daniel.Machon@microchip.com> writes:

>> > As I hinted earlier, we could also add an entirely new PCP interface
>> > (like with maxrate), this will give us a bit more flexibility and will
>> > not crash with anything. This approach will not give is trust for DSCP,
>> > but maybe we can disregard this and go with a PCP solution initially?
>> 
>> I would like to have a line of sight to how things will be done. Not
>> everything needs to be implemented at once, but we have to understand
>> how to get there when we need to. At least for issues that we can
>> already foresee now, such as the DSCP / PCP / default ordering.
>> 
>> Adding the PCP rules as a new APP selector, and then expressing the
>> ordering as a "selector policy" or whatever, IMHO takes care of this
>> nicely.
>> 
>> But OK, let's talk about the "flexibility" bit that you mention: what
>> does this approach make difficult or impossible?
>
> It was merely a concern of not changing too much on something that is
> already standard. Maybe I dont quite see how the APP interface can be
> extended to accomodate for: pcp/dei, ingress/egress and trust. Lets
> try to break it down:
>
>   - pcp/dei: 
>         this *could* be expressed in app->protocol and map 1:1 to the 
>         pcp table entrise, so that 8*dei+pcp:priority. If I want to map 
>         pcp 3, with dei 1 to priority 2, it would be encoded 11:2.

Yep. In particular something like {sel=255, pid=11, prio=2}.

iproute2 "dcb" would obviously grow brains to let you configure this
stuff semantically, so e.g.:

# dcb app replace dev X pcp-prio 3:3 3de:2 2:2 2de:1

>   - ingress/egress:
>         I guess we need a selector for each? I notice that the mellanox
>         driver uses the dcb_ieee_getapp_prio_dscp_mask_map and
>         dcb_ieee_getapp_dscp_prio_mask_map for priority map and priority
>         rewrite map, but these seems to be the same for both ingress and
>         egress to me?

Ha, I was only thinking about prioritization, not about rewrite at all.

Yeah, mlxsw uses APP rules for rewrite as well. The logic is that if the
network behind port X uses DSCP value D to express priority P, then
packets with priority P leaving that port should have DSCP value of D.
Of course it doesn't work too well, because there are 8 priorities, but
64 DSCP values. So mlxsw arbitrarily chooses the highest DSCP value.

The situation is similar with PCP, where there are 16 PCP+DEI
combinations, but only 8 priorities.

So having a way to configure rewrite would be good. But then we are very
firmly in the extension territory. This would basically need a separate
APP-like object.

> So far only subtle changes. Now how do you see trust going in. Can you
> elaborate a little on the policy selector you mentioned?

Sure. In my mind the policy is a array that describes the order in which
APP rules are applied. "default" is implicitly last.

So "trust DSCP" has a policy of just [DSCP]. "Trust PCP" of [PCP].
"Trust DSCP, then PCP" of [DSCP, PCP]. "Trust port" (i.e. just default)
is simply []. Etc.

Individual drivers validate whether their device can implement the
policy.

I expect most devices to really just support the DSCP and PCP parts, but
this is flexible in allowing more general configuration in devices that
allow it.

ABI-wise it is tempting to reuse APP to assign priority to selectors in
the same way that it currently assigns priority to field values:

# dcb app replace dev X sel-prio dscp:2 pcp:1

But that feels like a hack. It will probably be better to have a
dedicated object for this:

# dcb app-policy set dev X sel-order dscp pcp

This can be sliced in different ways that we can bikeshed to death
later. Does the above basically address your request?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-24 19:36             ` Petr Machata
@ 2022-08-25  0:54               ` Jakub Kicinski
  2022-08-26 18:11                 ` Petr Machata
  2022-08-29  7:53                 ` Allan W. Nielsen
  2022-08-25 11:31               ` Daniel.Machon
  1 sibling, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2022-08-25  0:54 UTC (permalink / raw)
  To: Petr Machata
  Cc: Daniel.Machon, netdev, vinicius.gomes, vladimir.oltean,
	thomas.petazzoni, Allan.Nielsen, maxime.chevallier, roopa

On Wed, 24 Aug 2022 21:36:54 +0200 Petr Machata wrote:
> > So far only subtle changes. Now how do you see trust going in. Can you
> > elaborate a little on the policy selector you mentioned?  
> 
> Sure. In my mind the policy is a array that describes the order in which
> APP rules are applied. "default" is implicitly last.
> 
> So "trust DSCP" has a policy of just [DSCP]. "Trust PCP" of [PCP].
> "Trust DSCP, then PCP" of [DSCP, PCP]. "Trust port" (i.e. just default)
> is simply []. Etc.
> 
> Individual drivers validate whether their device can implement the
> policy.
> 
> I expect most devices to really just support the DSCP and PCP parts, but
> this is flexible in allowing more general configuration in devices that
> allow it.
> 
> ABI-wise it is tempting to reuse APP to assign priority to selectors in
> the same way that it currently assigns priority to field values:
> 
> # dcb app replace dev X sel-prio dscp:2 pcp:1
> 
> But that feels like a hack. It will probably be better to have a
> dedicated object for this:
> 
> # dcb app-policy set dev X sel-order dscp pcp
> 
> This can be sliced in different ways that we can bikeshed to death
> later. Does the above basically address your request?

For an uneducated maintainer like myself, how do embedded people look
at DCB? Only place I've seen it used is in RDMA clusers. I suggested 
to Vladimir to look at DCBNL for frame preemption because it's the only
existing API we have that's vaguely relevant to HW/prio control but he
ended up going with ethtool.
No preference here, just trying to map it out in my head.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-24 19:36             ` Petr Machata
  2022-08-25  0:54               ` Jakub Kicinski
@ 2022-08-25 11:31               ` Daniel.Machon
  2022-08-25 13:30                 ` Petr Machata
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-08-25 11:31 UTC (permalink / raw)
  To: petrm
  Cc: netdev, kuba, vinicius.gomes, vladimir.oltean, thomas.petazzoni,
	Allan.Nielsen, maxime.chevallier, roopa

> > It was merely a concern of not changing too much on something that is
> > already standard. Maybe I dont quite see how the APP interface can be
> > extended to accomodate for: pcp/dei, ingress/egress and trust. Lets
> > try to break it down:
> >
> >   - pcp/dei:
> >         this *could* be expressed in app->protocol and map 1:1 to the
> >         pcp table entrise, so that 8*dei+pcp:priority. If I want to map
> >         pcp 3, with dei 1 to priority 2, it would be encoded 11:2.
> 
> Yep. In particular something like {sel=255, pid=11, prio=2}.
> 
> iproute2 "dcb" would obviously grow brains to let you configure this
> stuff semantically, so e.g.:
> 
> # dcb app replace dev X pcp-prio 3:3 3de:2 2:2 2de:1
> 
> >   - ingress/egress:
> >         I guess we need a selector for each? I notice that the mellanox
> >         driver uses the dcb_ieee_getapp_prio_dscp_mask_map and
> >         dcb_ieee_getapp_dscp_prio_mask_map for priority map and priority
> >         rewrite map, but these seems to be the same for both ingress and
> >         egress to me?
> 
> Ha, I was only thinking about prioritization, not about rewrite at all.
> 
> Yeah, mlxsw uses APP rules for rewrite as well. The logic is that if the
> network behind port X uses DSCP value D to express priority P, then
> packets with priority P leaving that port should have DSCP value of D.
> Of course it doesn't work too well, because there are 8 priorities, but
> 64 DSCP values. So mlxsw arbitrarily chooses the highest DSCP value.
> 
> The situation is similar with PCP, where there are 16 PCP+DEI
> combinations, but only 8 priorities.
> 
> So having a way to configure rewrite would be good. But then we are very
> firmly in the extension territory. This would basically need a separate
> APP-like object.
> 
> > So far only subtle changes. Now how do you see trust going in. Can you
> > elaborate a little on the policy selector you mentioned?
> 
> Sure. In my mind the policy is a array that describes the order in which
> APP rules are applied. "default" is implicitly last.
> 
> So "trust DSCP" has a policy of just [DSCP]. "Trust PCP" of [PCP].
> "Trust DSCP, then PCP" of [DSCP, PCP]. "Trust port" (i.e. just default)
> is simply []. Etc.
> 
> Individual drivers validate whether their device can implement the
> policy.
> 
> I expect most devices to really just support the DSCP and PCP parts, but
> this is flexible in allowing more general configuration in devices that
> allow it.
> 
> ABI-wise it is tempting to reuse APP to assign priority to selectors in
> the same way that it currently assigns priority to field values:
> 
> # dcb app replace dev X sel-prio dscp:2 pcp:1
> 
> But that feels like a hack. It will probably be better to have a
> dedicated object for this:
> 
> # dcb app-policy set dev X sel-order dscp pcp
> 
> This can be sliced in different ways that we can bikeshed to death
> later. Does the above basically address your request?

Yes, thanks for elaborating - I follow you now. Also, I agree that this 
could fit into APP.

I will prepare some patches for this soon and make sure to cc you. 
Initially I would like to add support for:

  - pcp/dei-prio mapping for ingress only. If things look good, we
    can add support for rewrite later. Any objections to this?

  - Support for trust ordering as a new dedicated object.

/ Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-25 11:31               ` Daniel.Machon
@ 2022-08-25 13:30                 ` Petr Machata
  0 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2022-08-25 13:30 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: petrm, netdev, kuba, vinicius.gomes, vladimir.oltean,
	thomas.petazzoni, Allan.Nielsen, maxime.chevallier, roopa


<Daniel.Machon@microchip.com> writes:

> I will prepare some patches for this soon and make sure to cc you. 
> Initially I would like to add support for:
>
>   - pcp/dei-prio mapping for ingress only. If things look good, we
>     can add support for rewrite later. Any objections to this?

Rewrite is closely related, but we don't have tools to configure it
anyway, and I don't see how adding a PCP prioritization first makes it
more difficult to add these later.

>   - Support for trust ordering as a new dedicated object.

Sounds good.

But note that I'm not a gatekeeper, I just happened on some opinions :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-25  0:54               ` Jakub Kicinski
@ 2022-08-26 18:11                 ` Petr Machata
  2022-08-29  7:53                 ` Allan W. Nielsen
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Machata @ 2022-08-26 18:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, Daniel.Machon, netdev, vinicius.gomes,
	vladimir.oltean, thomas.petazzoni, Allan.Nielsen,
	maxime.chevallier, roopa


Jakub Kicinski <kuba@kernel.org> writes:

> For an uneducated maintainer like myself, how do embedded people look
> at DCB? Only place I've seen it used is in RDMA clusers.

(Not an embedded person, but FWIW our use case for supporting all these
prioritizations and buffer assignments in mlxsw is exactly RDMA / RoCE.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-25  0:54               ` Jakub Kicinski
  2022-08-26 18:11                 ` Petr Machata
@ 2022-08-29  7:53                 ` Allan W. Nielsen
  2022-09-02 13:32                   ` Vladimir Oltean
  1 sibling, 1 reply; 22+ messages in thread
From: Allan W. Nielsen @ 2022-08-29  7:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, Daniel.Machon, netdev, vinicius.gomes,
	vladimir.oltean, thomas.petazzoni, maxime.chevallier, roopa

Hi,

On 24.08.2022 17:54, Jakub Kicinski wrote:
>For an uneducated maintainer like myself, how do embedded people look
>at DCB? Only place I've seen it used is in RDMA clusers. I suggested
>to Vladimir to look at DCBNL for frame preemption because it's the only
>existing API we have that's vaguely relevant to HW/prio control but he
>ended up going with ethtool.
>No preference here, just trying to map it out in my head.

As I work toghether with Daniel, I'm biased. But I work a lot with
embedded devices ;-)

I think this feature belong in DCB simply because DCB already have the
related configurations (like the default prio, priority-flow-control
etc), and it will be more logical for the user to find the PCP mapping
in the same place.

I think that framepreemption should be in ethtool as it is partly
defined in IEEE802.3 (the physical layer) and it is ethernet only
technology.

/Allan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-08-29  7:53                 ` Allan W. Nielsen
@ 2022-09-02 13:32                   ` Vladimir Oltean
  2022-09-07 10:41                     ` Daniel.Machon
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-09-02 13:32 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Jakub Kicinski, Petr Machata, Daniel.Machon, netdev,
	vinicius.gomes, thomas.petazzoni, maxime.chevallier, roopa

On Mon, Aug 29, 2022 at 09:53:42AM +0200, Allan W. Nielsen wrote:
> Hi,
> 
> On 24.08.2022 17:54, Jakub Kicinski wrote:
> > For an uneducated maintainer like myself, how do embedded people look
> > at DCB? Only place I've seen it used is in RDMA clusers. I suggested
> > to Vladimir to look at DCBNL for frame preemption because it's the only
> > existing API we have that's vaguely relevant to HW/prio control but he
> > ended up going with ethtool.
> > No preference here, just trying to map it out in my head.
> 
> As I work toghether with Daniel, I'm biased. But I work a lot with
> embedded devices ;-)
> 
> I think this feature belong in DCB simply because DCB already have the
> related configurations (like the default prio, priority-flow-control
> etc), and it will be more logical for the user to find the PCP mapping
> in the same place.
> 
> I think that framepreemption should be in ethtool as it is partly
> defined in IEEE802.3 (the physical layer) and it is ethernet only
> technology.
> 
> /Allan

My explanation given to Jakub at the time was that I don't quite
understand what should belong to dcbnl and what shouldn't.
https://lore.kernel.org/netdev/20220523224943.75oyvbxmyp2kjiwi@skbuf/

From my perspective it's an inconvenience that dcbnl exists as a
separate kernel subsystem and isn't converged with the ethtool
genetlink.

Regarding the topic at hand, and the apparent lack of PCP-based
prioritization in the software data path. VLAN devices have an
ingress-qos-map and an egress-qos-map. How would prioritization done via
dcbnl interact with those (who would take precedence)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-02 13:32                   ` Vladimir Oltean
@ 2022-09-07 10:41                     ` Daniel.Machon
  2022-09-07 17:26                       ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel.Machon @ 2022-09-07 10:41 UTC (permalink / raw)
  To: vladimir.oltean
  Cc: Allan.Nielsen, kuba, petrm, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

> > I think this feature belong in DCB simply because DCB already have the
> > related configurations (like the default prio, priority-flow-control
> > etc), and it will be more logical for the user to find the PCP mapping
> > in the same place.
> >
> > I think that framepreemption should be in ethtool as it is partly
> > defined in IEEE802.3 (the physical layer) and it is ethernet only
> > technology.
> >
> > /Allan
> 
> My explanation given to Jakub at the time was that I don't quite
> understand what should belong to dcbnl and what shouldn't.
> https://lore.kernel.org/netdev/20220523224943.75oyvbxmyp2kjiwi@skbuf/
> 
> From my perspective it's an inconvenience that dcbnl exists as a
> separate kernel subsystem and isn't converged with the ethtool
> genetlink.
> 
> Regarding the topic at hand, and the apparent lack of PCP-based
> prioritization in the software data path. VLAN devices have an
> ingress-qos-map and an egress-qos-map. How would prioritization done via
> dcbnl interact with those (who would take precedence)?

Hi Vladimir,

They shouldn't interact (at least this is my understanding). 

The ingress and egress maps are for vlan interfaces, and dcb operates
on physical interfaces (dcbx too). You cannot use dcbnl to do
prioritization for vlan interfaces.

Anyway, I think much of the stuff in DCB is for hw offload only, incl. the
topic at hand. Is the APP table even consulted by the sw stack at all - I
dont think so (apart from drivers).

/ Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 10:41                     ` Daniel.Machon
@ 2022-09-07 17:26                       ` Vladimir Oltean
  2022-09-07 19:57                         ` Daniel.Machon
  2022-09-08  8:27                         ` Petr Machata
  0 siblings, 2 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-09-07 17:26 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: Allan.Nielsen, kuba, petrm, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

On Wed, Sep 07, 2022 at 10:41:10AM +0000, Daniel.Machon@microchip.com wrote:
> > Regarding the topic at hand, and the apparent lack of PCP-based
> > prioritization in the software data path. VLAN devices have an
> > ingress-qos-map and an egress-qos-map. How would prioritization done via
> > dcbnl interact with those (who would take precedence)?
> 
> Hi Vladimir,
> 
> They shouldn't interact (at least this is my understanding). 
> 
> The ingress and egress maps are for vlan interfaces, and dcb operates
> on physical interfaces (dcbx too). You cannot use dcbnl to do
> prioritization for vlan interfaces.
> 
> Anyway, I think much of the stuff in DCB is for hw offload only, incl. the
> topic at hand. Is the APP table even consulted by the sw stack at all - I
> dont think so (apart from drivers).

Not directly, but at least ocelot (or in fact felix) does set
skb->priority based on the QoS class from the Extraction Frame Header.
So the stack does end up consulting and meaningfully using something
that was set based on the dcbnl APP table.

In this sense, for ocelot, there is a real overlap between skb->priority
being initially set based on ocelot_xfh_get_qos_class(), and later being
overwritten based on the QoS maps of a VLAN interface.

The problem with the ingress-qos-map and egress-qos-map from 802.1Q that
I see is that they allow for per-VID prioritization, which is way more
fine grained than what we need. This, plus the fact that bridge VLANs
don't have this setting, only termination (8021q) VLANs do. How about an
ingress-qos-map and an egress-qos-map per port rather than per VID,
potentially even a bridge_slave netlink attribute, offloadable through
switchdev? We could make the bridge input fast path alter skb->priority
for the VLAN-tagged code paths, and this could give us superior
semantics compared to putting this non-standardized knob in the hardware
only dcbnl.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 17:26                       ` Vladimir Oltean
@ 2022-09-07 19:57                         ` Daniel.Machon
  2022-09-08  8:03                           ` Allan Nielsen - M31684
                                             ` (2 more replies)
  2022-09-08  8:27                         ` Petr Machata
  1 sibling, 3 replies; 22+ messages in thread
From: Daniel.Machon @ 2022-09-07 19:57 UTC (permalink / raw)
  To: vladimir.oltean
  Cc: Allan.Nielsen, kuba, petrm, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

> On Wed, Sep 07, 2022 at 10:41:10AM +0000, Daniel.Machon@microchip.com wrote:
> > > Regarding the topic at hand, and the apparent lack of PCP-based
> > > prioritization in the software data path. VLAN devices have an
> > > ingress-qos-map and an egress-qos-map. How would prioritization done via
> > > dcbnl interact with those (who would take precedence)?
> >
> > Hi Vladimir,
> >
> > They shouldn't interact (at least this is my understanding).
> >
> > The ingress and egress maps are for vlan interfaces, and dcb operates
> > on physical interfaces (dcbx too). You cannot use dcbnl to do
> > prioritization for vlan interfaces.
> >
> > Anyway, I think much of the stuff in DCB is for hw offload only, incl. the
> > topic at hand. Is the APP table even consulted by the sw stack at all - I
> > dont think so (apart from drivers).
> 
> Not directly, but at least ocelot (or in fact felix) does set
> skb->priority based on the QoS class from the Extraction Frame Header.
> So the stack does end up consulting and meaningfully using something
> that was set based on the dcbnl APP table.
> 
> In this sense, for ocelot, there is a real overlap between skb->priority
> being initially set based on ocelot_xfh_get_qos_class(), and later being
> overwritten based on the QoS maps of a VLAN interface.

Right, so VLAN QoS maps currently takes precedence, if somebody would choose
to add a tagged vlan interface on-top of a physical interface with existing
PCP prioritization - is this a real problem, or just a question of configuration?

> The problem with the ingress-qos-map and egress-qos-map from 802.1Q that
> I see is that they allow for per-VID prioritization, which is way more
> fine grained than what we need. This, plus the fact that bridge VLANs
> don't have this setting, only termination (8021q) VLANs do. How about an
> ingress-qos-map and an egress-qos-map per port rather than per VID,
> potentially even a bridge_slave netlink attribute, offloadable through
> switchdev? We could make the bridge input fast path alter skb->priority
> for the VLAN-tagged code paths, and this could give us superior
> semantics compared to putting this non-standardized knob in the hardware
> only dcbnl.

This is a valid alternative solution to dcbnl, although this seems to be a 
much more complex solution, to something that is easily solved in dcbnl,
and actually is in-line with what dcbnl already does. On-top of this, the
notion of 'trust' has also been raised by this topic. It makes a lot of sense
to me to add APP selector trust and trust order to dcbnl. This is the solution 
that I have been implementing lately, and is ready for posting very soon.

/ Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 19:57                         ` Daniel.Machon
@ 2022-09-08  8:03                           ` Allan Nielsen - M31684
  2022-09-08 11:18                           ` Petr Machata
  2022-09-09 12:11                           ` Vladimir Oltean
  2 siblings, 0 replies; 22+ messages in thread
From: Allan Nielsen - M31684 @ 2022-09-08  8:03 UTC (permalink / raw)
  To: Daniel Machon - M70577
  Cc: Vladimir Oltean, kuba, petrm, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

Hi Vladimir, Daniel,

On 07.09.2022 19:57, Daniel Machon - M70577 wrote:
>> On Wed, Sep 07, 2022 at 10:41:10AM +0000, Daniel.Machon@microchip.com wrote:
>> > > Regarding the topic at hand, and the apparent lack of PCP-based
>> > > prioritization in the software data path. VLAN devices have an
>> > > ingress-qos-map and an egress-qos-map. How would prioritization done via
>> > > dcbnl interact with those (who would take precedence)?
>> >
>> > Hi Vladimir,
>> >
>> > They shouldn't interact (at least this is my understanding).
>> >
>> > The ingress and egress maps are for vlan interfaces, and dcb operates
>> > on physical interfaces (dcbx too). You cannot use dcbnl to do
>> > prioritization for vlan interfaces.
>> >
>> > Anyway, I think much of the stuff in DCB is for hw offload only, incl. the
>> > topic at hand. Is the APP table even consulted by the sw stack at all - I
>> > dont think so (apart from drivers).
>>
>> Not directly, but at least ocelot (or in fact felix) does set
>> skb->priority based on the QoS class from the Extraction Frame Header.
>> So the stack does end up consulting and meaningfully using something
>> that was set based on the dcbnl APP table.
>>
>> In this sense, for ocelot, there is a real overlap between skb->priority
>> being initially set based on ocelot_xfh_get_qos_class(), and later being
>> overwritten based on the QoS maps of a VLAN interface.
I do not think this is an overlap.

The mappings Daniel is proposing will cause the QoS class from the
extraction header to be mapped (actually they are mapped today, the
mapping is just a 1:1). If the frame is extracted by the CPU, the
classified QoS class will (and shall) be used to set the skb->priority.

As long as the frame is considered to belong to the physical interface,
this is correct.

If this frame then at a later point is being processed by a VLAN
interface, then the mapping will be updated in the context if the VLAN
interface.

>Right, so VLAN QoS maps currently takes precedence, if somebody would choose
>to add a tagged vlan interface on-top of a physical interface with existing
>PCP prioritization - is this a real problem, or just a question of configuration?
It is not like VLAN QoS maps takes precedence. The mapping is just
updated if the frame goes through a (SW) VLAN interface.

>> The problem with the ingress-qos-map and egress-qos-map from 802.1Q that
>> I see is that they allow for per-VID prioritization, which is way more
>> fine grained than what we need. This, plus the fact that bridge VLANs
>> don't have this setting, only termination (8021q) VLANs do. How about an
>> ingress-qos-map and an egress-qos-map per port rather than per VID,
>> potentially even a bridge_slave netlink attribute, offloadable through
>> switchdev? We could make the bridge input fast path alter skb->priority
>> for the VLAN-tagged code paths, and this could give us superior
>> semantics compared to putting this non-standardized knob in the hardware
>> only dcbnl.
>
>This is a valid alternative solution to dcbnl, although this seems to be a
>much more complex solution, to something that is easily solved in dcbnl,
>and actually is in-line with what dcbnl already does. On-top of this, the
>notion of 'trust' has also been raised by this topic. It makes a lot of sense
>to me to add APP selector trust and trust order to dcbnl.
I agree that this mapping could be done by allowing to set the
ingress-qos-map/egress-qos-map maps on physical interfaces (and to
handle the trust aspect, we would have to introduce a anoter flag which
does not make sense on VLAN interfaces, but only on physical
interfaces).

Still I think it is better to do it in the dcb-app (and skip the
sw-fallback) for the following reasons:
- This mapping is important to supprot meaning use-cases with PFC
   (Priority Flow Control) Buffer handling and in furture maybe
   FramePreemption. These are all features which cannot be done in a
   meaningfull way in SW. Therefore the SW-path will not bennefit from
   this (but rahter will be be slowed down by supporting more features).
- This is closely related to others features found DCB, and as a user I
   just think it make sense to collect related features in the same
   tool and man-page.

>This is the solution that I have been implementing lately, and is ready
>for posting very soon.
Given that this is mostly implemented in accordance with the initial
discussion with Petr, I think we should see it and evaluate it. If we
think it is better to move the ingress-qos-map/egress-qos-map then we
need to try out that.

/Allan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 17:26                       ` Vladimir Oltean
  2022-09-07 19:57                         ` Daniel.Machon
@ 2022-09-08  8:27                         ` Petr Machata
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Machata @ 2022-09-08  8:27 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Daniel.Machon, Allan.Nielsen, kuba, petrm, netdev,
	vinicius.gomes, thomas.petazzoni, maxime.chevallier, roopa


Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> The problem with the ingress-qos-map and egress-qos-map from 802.1Q that
> I see is that they allow for per-VID prioritization, which is way more
> fine grained than what we need. This, plus the fact that bridge VLANs
> don't have this setting, only termination (8021q) VLANs do.
>
> How about an ingress-qos-map and an egress-qos-map per port rather
> than per VID, potentially even a bridge_slave netlink attribute,
> offloadable through switchdev? We could make the bridge input fast
> path alter skb->priority for the VLAN-tagged code paths, and this
> could give us superior semantics compared to putting this
> non-standardized knob in the hardware only dcbnl.

Per-netdevice qos map is exactly what we are looking for. I think it
wasn't even considered because the layering is obviously wrong. Stuff
like this really belongs to TC.

Having them on bridge_slave would IMHO not solve much, besides the
layering violations :) If you use anything besides vlan_filtering
bridges, you have X places to configure compatibly. Which is not great
for either offload or configuration.

Given there's one piece of HW to actually do the prioritization, it
seems obvious to aim at having a single source of the mapping in Linux.
DCB or TC both fit the bill here.

Maybe we need to figure out how to tweak TC to make this stuff easier to
configure and offload...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 19:57                         ` Daniel.Machon
  2022-09-08  8:03                           ` Allan Nielsen - M31684
@ 2022-09-08 11:18                           ` Petr Machata
  2022-09-08 12:01                             ` Daniel.Machon
  2022-09-09 12:11                           ` Vladimir Oltean
  2 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2022-09-08 11:18 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: vladimir.oltean, Allan.Nielsen, kuba, petrm, netdev,
	vinicius.gomes, thomas.petazzoni, maxime.chevallier, roopa


<Daniel.Machon@microchip.com> writes:

>> On Wed, Sep 07, 2022 at 10:41:10AM +0000, Daniel.Machon@microchip.com wrote:
>> > > Regarding the topic at hand, and the apparent lack of PCP-based
>> > > prioritization in the software data path. VLAN devices have an
>> > > ingress-qos-map and an egress-qos-map. How would prioritization done via
>> > > dcbnl interact with those (who would take precedence)?
>> >
>> > Hi Vladimir,
>> >
>> > They shouldn't interact (at least this is my understanding).
>> >
>> > The ingress and egress maps are for vlan interfaces, and dcb operates
>> > on physical interfaces (dcbx too). You cannot use dcbnl to do
>> > prioritization for vlan interfaces.
>> >
>> > Anyway, I think much of the stuff in DCB is for hw offload only, incl. the
>> > topic at hand. Is the APP table even consulted by the sw stack at all - I
>> > dont think so (apart from drivers).
>> 
>> Not directly, but at least ocelot (or in fact felix) does set
>> skb->priority based on the QoS class from the Extraction Frame Header.
>> So the stack does end up consulting and meaningfully using something
>> that was set based on the dcbnl APP table.
>> 
>> In this sense, for ocelot, there is a real overlap between skb->priority
>> being initially set based on ocelot_xfh_get_qos_class(), and later being
>> overwritten based on the QoS maps of a VLAN interface.
>
> Right, so VLAN QoS maps currently takes precedence, if somebody would choose
> to add a tagged vlan interface on-top of a physical interface with existing
> PCP prioritization - is this a real problem, or just a question of configuration?

We have the same deal going on with TC BTW. Whatever prioritization is
transferred from HW to SW is lost when the packet is passed through the
SW TC rules.

And at least on Spectrum, we don't even have a way to transfer the
priority to SW, as far as I know. So in this sense it's exactly like
DCB, where the in-HW prioritization is completely separate from the SW
one. Except that all DCB APP rules are "skip_sw" and can't not be.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-08 11:18                           ` Petr Machata
@ 2022-09-08 12:01                             ` Daniel.Machon
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel.Machon @ 2022-09-08 12:01 UTC (permalink / raw)
  To: petrm
  Cc: vladimir.oltean, Allan.Nielsen, kuba, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

FYI:

RFC patch for the kernel dcbnl side has been posted at:

https://lore.kernel.org/netdev/20220908120442.3069771-1-daniel.machon@microchip.com/T/#t

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Basic PCP/DEI-based queue classification
  2022-09-07 19:57                         ` Daniel.Machon
  2022-09-08  8:03                           ` Allan Nielsen - M31684
  2022-09-08 11:18                           ` Petr Machata
@ 2022-09-09 12:11                           ` Vladimir Oltean
  2 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-09-09 12:11 UTC (permalink / raw)
  To: Daniel.Machon
  Cc: Allan.Nielsen, kuba, petrm, netdev, vinicius.gomes,
	thomas.petazzoni, maxime.chevallier, roopa

On Wed, Sep 07, 2022 at 07:57:08PM +0000, Daniel.Machon@microchip.com wrote:
> Right, so VLAN QoS maps currently takes precedence, if somebody would choose
> to add a tagged vlan interface on-top of a physical interface with existing
> PCP prioritization - is this a real problem, or just a question of configuration?

No, I don't think it's a real problem. I think it can be reasonably seen
as a per-VLAN override of the port-based default.

> > The problem with the ingress-qos-map and egress-qos-map from 802.1Q that
> > I see is that they allow for per-VID prioritization, which is way more
> > fine grained than what we need. This, plus the fact that bridge VLANs
> > don't have this setting, only termination (8021q) VLANs do. How about an
> > ingress-qos-map and an egress-qos-map per port rather than per VID,
> > potentially even a bridge_slave netlink attribute, offloadable through
> > switchdev? We could make the bridge input fast path alter skb->priority
> > for the VLAN-tagged code paths, and this could give us superior
> > semantics compared to putting this non-standardized knob in the hardware
> > only dcbnl.
> 
> This is a valid alternative solution to dcbnl, although this seems to be a 
> much more complex solution, to something that is easily solved in dcbnl,
> and actually is in-line with what dcbnl already does. On-top of this, the
> notion of 'trust' has also been raised by this topic. It makes a lot of sense
> to me to add APP selector trust and trust order to dcbnl. This is the solution 
> that I have been implementing lately, and is ready for posting very soon.

Ok, I'll take a look at what your implementation proposes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-09-09 12:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19  9:09 Basic PCP/DEI-based queue classification Daniel.Machon
2022-08-19 10:50 ` Petr Machata
2022-08-21 20:58   ` Daniel.Machon
2022-08-22 10:34     ` Petr Machata
2022-08-24  7:39       ` Daniel.Machon
2022-08-24  9:45         ` Petr Machata
2022-08-24 17:55           ` Daniel.Machon
2022-08-24 19:36             ` Petr Machata
2022-08-25  0:54               ` Jakub Kicinski
2022-08-26 18:11                 ` Petr Machata
2022-08-29  7:53                 ` Allan W. Nielsen
2022-09-02 13:32                   ` Vladimir Oltean
2022-09-07 10:41                     ` Daniel.Machon
2022-09-07 17:26                       ` Vladimir Oltean
2022-09-07 19:57                         ` Daniel.Machon
2022-09-08  8:03                           ` Allan Nielsen - M31684
2022-09-08 11:18                           ` Petr Machata
2022-09-08 12:01                             ` Daniel.Machon
2022-09-09 12:11                           ` Vladimir Oltean
2022-09-08  8:27                         ` Petr Machata
2022-08-25 11:31               ` Daniel.Machon
2022-08-25 13:30                 ` Petr Machata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.