[RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup

* [RFC PATCH net-next 00/11] ENETC mqprio/taprio cleanup
@ 2023-01-20 14:15 Vladimir Oltean
  2023-01-20 14:15 ` [RFC PATCH net-next 01/11] net/sched: mqprio: refactor nlattr parsing to a separate function Vladimir Oltean
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Vladimir Oltean @ 2023-01-20 14:15 UTC (permalink / raw)
  To: netdev, John Fastabend
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Claudiu Manoil, Camelia Groza, Xiaoliang Yang, Gerhard Engleder,
	Vinicius Costa Gomes, Alexander Duyck, Kurt Kanzenbach,
	Ferenc Fejes, Tony Nguyen, Jesse Brandeburg, Jacob Keller

I realize that this patch set will start a flame war, but there are
things about the mqprio qdisc that I simply don't understand, so in an
attempt to explain how I see things should be done, I've made some
patches to the code. I hope the reviewers will be patient enough with me :)

I need to touch mqprio because I'm preparing a patch set for Frame
Preemption (an IEEE 802.1Q feature). A disagreement started with
Vinicius here:
https://patchwork.kernel.org/project/netdevbpf/patch/20220816222920.1952936-3-vladimir.oltean@nxp.com/#24976672

regarding how TX packet prioritization should be handled. Vinicius said
that for some Intel NICs, prioritization at the egress scheduler stage
is fundamentally attached to TX queues rather than traffic classes.

In other words, in the "popular" mqprio configuration documented by him:

$ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      hw 0

there are 3 Linux traffic classes and 4 TX queues. The TX queues are
organized in strict priority fashion, like this: TXQ 0 has highest prio
(hardware dequeue precedence for TX scheduler), TXQ 3 has lowest prio.
Packets classified by Linux to TC 2 are hashed between TXQ 2 and TXQ 3,
but the hardware has higher precedence for TXQ2 over TXQ 3, and Linux
doesn't know that.

I am surprised by this fact, and this isn't how ENETC works at all.
For ENETC, we try to prioritize on TCs rather than TXQs, and TC 7 has
higher priority than TC 7. For us, groups of TXQs that map to the same
TC have the same egress scheduling priority. It is possible (and maybe
useful) to have 2 TXQs per TC - one TXQ per CPU). Patch 07/11 tries to
make that more clear.

Furthermore (and this is really the biggest point of contention), myself
and Vinicius have the fundamental disagreement whether the 802.1Qbv
(taprio) gate mask should be passed to the device driver per TXQ or per
TC. This is what patch 11/11 is about.

Again, I'm not *certain* that my opinion on this topic is correct
(and it sure is confusing to see such a different approach for Intel).
But I would appreciate any feedback.

Vladimir Oltean (11):
  net/sched: mqprio: refactor nlattr parsing to a separate function
  net/sched: mqprio: refactor offloading and unoffloading to dedicated
    functions
  net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to
    pkt_sched.h
  net/sched: mqprio: allow offloading drivers to request queue count
    validation
  net/sched: mqprio: add extack messages for queue count validation
  net: enetc: request mqprio to validate the queue counts
  net: enetc: act upon the requested mqprio queue configuration
  net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc()
  net: enetc: act upon mqprio queue config in taprio offload
  net/sched: taprio: validate that gate mask does not exceed number of
    TCs
  net/sched: taprio: only calculate gate mask per TXQ for igc

 drivers/net/ethernet/freescale/enetc/enetc.c  |  67 ++--
 .../net/ethernet/freescale/enetc/enetc_qos.c  |  27 +-
 drivers/net/ethernet/intel/igc/igc_main.c     |  17 +
 include/net/pkt_cls.h                         |  10 -
 include/net/pkt_sched.h                       |  16 +
 net/sched/sch_mqprio.c                        | 298 +++++++++++-------
 net/sched/sch_taprio.c                        |  57 ++--
 7 files changed, 310 insertions(+), 182 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 25+ messages in thread