[PATCH v3 net-next 0/7] tc-taprio offload for SJA1105 DSA

* [PATCH v3 net-next 0/7] tc-taprio offload for SJA1105 DSA
@ 2019-09-15  1:53 Vladimir Oltean
  2019-09-15  1:53 ` [PATCH v3 net-next 1/7] taprio: Add support for hardware offloading Vladimir Oltean
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Vladimir Oltean @ 2019-09-15  1:53 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
	Vladimir Oltean

This is the second attempt to submit the tc-taprio offload model for
inclusion in the net tree. The sja1105 switch driver will provide the
first implementation of the offload. Only the bare minimum is added:

- The offload model and a DSA pass-through
- The hardware implementation
- The interaction with the netdev queues in the tagger code
- Documentation

What has been removed from the first attempt is support for
PTP-as-clocksource in sja1105.  This will be added as soon as the
offload model is settled.

Vinicius Costa Gomes (1):
  taprio: Add support for hardware offloading

Vladimir Oltean (6):
  net: dsa: Pass ndo_setup_tc slave callback to drivers
  net: dsa: sja1105: Add static config tables for scheduling
  net: dsa: sja1105: Advertise the 8 TX queues
  net: dsa: sja1105: Make HOSTPRIO a kernel config
  net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio
    offload
  docs: net: dsa: sja1105: Add info about the time-aware scheduler

 Documentation/networking/dsa/sja1105.rst      |  90 ++++
 drivers/net/dsa/sja1105/Kconfig               |  17 +
 drivers/net/dsa/sja1105/Makefile              |   4 +
 drivers/net/dsa/sja1105/sja1105.h             |   6 +
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   8 +
 drivers/net/dsa/sja1105/sja1105_main.c        |  28 +-
 .../net/dsa/sja1105/sja1105_static_config.c   | 167 +++++++
 .../net/dsa/sja1105/sja1105_static_config.h   |  48 +-
 drivers/net/dsa/sja1105/sja1105_tas.c         | 427 ++++++++++++++++++
 drivers/net/dsa/sja1105/sja1105_tas.h         |  42 ++
 include/linux/netdevice.h                     |   1 +
 include/net/dsa.h                             |   2 +
 include/net/pkt_sched.h                       |  23 +
 include/uapi/linux/pkt_sched.h                |   3 +-
 net/dsa/slave.c                               |  12 +-
 net/dsa/tag_sja1105.c                         |   3 +-
 net/sched/sch_taprio.c                        | 409 +++++++++++++++--
 17 files changed, 1237 insertions(+), 53 deletions(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h

-- 

For those who want to follow along with the hardware implementation, the
manual is here:
https://www.nxp.com/docs/en/user-guide/UM10944.pdf

Notable changes in v2:
- Made the series independent from PTP (which is temporarily removed)
- Changed the meaning of the gate_mask - it is now acting on traffic
  classes even in the view exposed by taprio to drivers.
- Removed the next_sched hrtimer.
- Summarized one of the responses given to Vinicius into a new
  documentation section.

The first version of the net-next patch series can be found here:

https://www.spinics.net/lists/netdev/msg597214.html

Changes in the first version of the net-next series compared to RFC v2:
- Made "flags 1" and "flags 2" mutually exclusive in the taprio qdisc
- Moved taprio_enable_offload and taprio_disable_offload out of atomic
  context - spin_lock_bh(qdisc_lock(sch)). This allows drivers that
  implement the ndo_setup_tc to sleep and for taprio memory to be
  allocated with GFP_KERNEL. The only thing that was kept under the
  spinlock is the assignment of the q->dequeue and q->peek pointers.
- Finally making proper use of own API - added a taprio_alloc helper to
  avoid passing stack memory to drivers.

The second version of the RFC is at:
https://www.spinics.net/lists/netdev/msg596663.html

Changes in v2 of the RFC since v1:
- Adapted the taprio offload patch to work by specifying "flags 2" to
  the iproute2-next tc. At the moment I don't clearly understand whether
  the full offload and the txtime assist ("flags 1") are mutually
  exclusive or not (i.e. whether a "flags 3" mode should be rejected,
  which it currently isn't).
- Added reference counting to the taprio offload structure. Maybe the
  function names and placement could have been better though. As for the
  other complaint (cycle time calculation) it got fixed in the taprio
  parser in the meantime.
- Converted sja1105 to use the hardware PTP registers, and save/restore
  the PTP time across resets.
- Made the DSA callback for ndo_setup_tc a bit more generic, but I don't
  know whether it fulfills expectations. Drivers still can't do blocking
  operations in its execution context.
- Added a state machine for starting/stopping the scheduler based on the
  last command run on the PTP clock.

The first RFC from July can be seen at:
https://lists.openwall.net/netdev/2019/07/07/81

Original cover letter:

Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
resent by Voon Weifeng for the stmmac driver), I am submitting for
review a draft implementation of this offload for a DSA switch.

I don't want to insist too much on the hardware specifics of SJA1105
which isn't otherwise very compliant to the IEEE spec.

In order to be able to test with Vedang Patel's iproute2 patch for
taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
I had to actually revert the txtime-assist branch as it had changed the
iproute2 interface.

In terms of impact for DSA drivers, I would like to point out that:

- Maybe somebody should pre-populate qopt->cycle_time in case the user
  does not provide one. Otherwise each driver needs to iterate over the
  GCL once, just to set the cycle time (right now stmmac does as well).

- Configuring the switch over SPI cannot apparently be done from this
  ndo_setup_tc callback because it runs in atomic context. I also have
  some downstream patches to offload tc clsact matchall with mirred
  action, but in that case it looks like the atomic context restriction
  does not apply.

- I had to copy the struct tc_taprio_qopt_offload to driver private
  memory because a static config needs to be constructed every time a
  change takes place, and there are up to 4 switch ports that may take a
  TAS configuration. I have created a private
  tc_taprio_qopt_offload_copy() helper for this - I don't know whether
  it's of any help in the general case.

There is more to be done however. The TAS needs to be integrated with
the PTP driver. This is because with a PTP clock source, the base time
is written dynamically to the PTPSCHTM (PTP schedule time) register and
must be a time in the future. Then the "real" base time of each port's
TAS config can be offset by at most ~50 ms (the DELTA field from the
Schedule Entry Points Table) relative to PTPSCHTM.
Because base times in the past are completely ignored by this hardware,
we need to decide if it's ok behaviorally for a driver to "roll" a past
base time into the immediate future by incrementally adding the cycle
time (so the phase doesn't change). If it is, then decide by how long in
the future it is ok to do so. Or alternatively, is it preferable if the
driver errors out if the user-supplied base time is in the past and the
hardware doesn't like it? But even then, there might be fringe cases
when the base time becomes a past PTP time right as the driver tries to
apply the config.
Also applying a tc-taprio offload to a second SJA1105 switch port will
inevitably need to roll the first port's (now past) base time into an
equivalent future time.
All of this is going to be complicated even further by the fact that
resetting the switch (to apply the tc-taprio offload) makes it reset its
PTP time.

2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread