Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
@ 2019-09-02 16:25 Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 01/15] net: dsa: sja1105: Change the PTP command access pattern Vladimir Oltean
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

This is the first attempt to submit the tc-taprio offload model for
inclusion in the net tree.

Changes in this version:
- Made "flags 1" and "flags 2" mutually exclusive in the taprio qdisc
- Moved taprio_enable_offload and taprio_disable_offload out of atomic
  context - spin_lock_bh(qdisc_lock(sch)). This allows drivers that
  implement the ndo_setup_tc to sleep and for taprio memory to be
  allocated with GFP_KERNEL. The only thing that was kept under the
  spinlock is the assignment of the q->dequeue and q->peek pointers.
- Finally making proper use of own API - added a taprio_alloc helper to
  avoid passing stack memory to drivers.

The first RFC from July can be seen at:
https://lists.openwall.net/netdev/2019/07/07/81

The second version of the RFC is at:
https://www.spinics.net/lists/netdev/msg596663.html

Changes in v2 of the RFC since v1:
- Adapted the taprio offload patch to work by specifying "flags 2" to
  the iproute2-next tc. At the moment I don't clearly understand whether
  the full offload and the txtime assist ("flags 1") are mutually
  exclusive or not (i.e. whether a "flags 3" mode should be rejected,
  which it currently isn't).
- Added reference counting to the taprio offload structure. Maybe the
  function names and placement could have been better though. As for the
  other complaint (cycle time calculation) it got fixed in the taprio
  parser in the meantime.
- Converted sja1105 to use the hardware PTP registers, and save/restore
  the PTP time across resets.
- Made the DSA callback for ndo_setup_tc a bit more generic, but I don't
  know whether it fulfills expectations. Drivers still can't do blocking
  operations in its execution context.
- Added a state machine for starting/stopping the scheduler based on the
  last command run on the PTP clock.

For those who want to follow along with the hardware implementation, the
manual is here:
https://www.nxp.com/docs/en/user-guide/UM10944.pdf

Original cover letter:

Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
resent by Voon Weifeng for the stmmac driver), I am submitting for
review a draft implementation of this offload for a DSA switch.

I don't want to insist too much on the hardware specifics of SJA1105
which isn't otherwise very compliant to the IEEE spec.

In order to be able to test with Vedang Patel's iproute2 patch for
taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
I had to actually revert the txtime-assist branch as it had changed the
iproute2 interface.

In terms of impact for DSA drivers, I would like to point out that:

- Maybe somebody should pre-populate qopt->cycle_time in case the user
  does not provide one. Otherwise each driver needs to iterate over the
  GCL once, just to set the cycle time (right now stmmac does as well).

- Configuring the switch over SPI cannot apparently be done from this
  ndo_setup_tc callback because it runs in atomic context. I also have
  some downstream patches to offload tc clsact matchall with mirred
  action, but in that case it looks like the atomic context restriction
  does not apply.

- I had to copy the struct tc_taprio_qopt_offload to driver private
  memory because a static config needs to be constructed every time a
  change takes place, and there are up to 4 switch ports that may take a
  TAS configuration. I have created a private
  tc_taprio_qopt_offload_copy() helper for this - I don't know whether
  it's of any help in the general case.

There is more to be done however. The TAS needs to be integrated with
the PTP driver. This is because with a PTP clock source, the base time
is written dynamically to the PTPSCHTM (PTP schedule time) register and
must be a time in the future. Then the "real" base time of each port's
TAS config can be offset by at most ~50 ms (the DELTA field from the
Schedule Entry Points Table) relative to PTPSCHTM.
Because base times in the past are completely ignored by this hardware,
we need to decide if it's ok behaviorally for a driver to "roll" a past
base time into the immediate future by incrementally adding the cycle
time (so the phase doesn't change). If it is, then decide by how long in
the future it is ok to do so. Or alternatively, is it preferable if the
driver errors out if the user-supplied base time is in the past and the
hardware doesn't like it? But even then, there might be fringe cases
when the base time becomes a past PTP time right as the driver tries to
apply the config.
Also applying a tc-taprio offload to a second SJA1105 switch port will
inevitably need to roll the first port's (now past) base time into an
equivalent future time.
All of this is going to be complicated even further by the fact that
resetting the switch (to apply the tc-taprio offload) makes it reset its
PTP time.

Vinicius Costa Gomes (1):
  taprio: Add support for hardware offloading

Vladimir Oltean (14):
  net: dsa: sja1105: Change the PTP command access pattern
  net: dsa: sja1105: Get rid of global declaration of struct
    ptp_clock_info
  net: dsa: sja1105: Switch to hardware operations for PTP
  net: dsa: sja1105: Implement the .gettimex64 system call for PTP
  net: dsa: sja1105: Restore PTP time after switch reset
  net: dsa: sja1105: Disallow management xmit during switch reset
  net: dsa: sja1105: Move PTP data to its own private structure
  net: dsa: sja1105: Advertise the 8 TX queues
  net: dsa: Pass ndo_setup_tc slave callback to drivers
  net: dsa: sja1105: Add static config tables for scheduling
  net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio
    offload
  net: dsa: sja1105: Make HOSTPRIO a kernel config
  net: dsa: sja1105: Make the PTP command read-write
  net: dsa: sja1105: Implement state machine for TAS with PTP clock
    source

 drivers/net/dsa/sja1105/Kconfig               |  17 +
 drivers/net/dsa/sja1105/Makefile              |   4 +
 drivers/net/dsa/sja1105/sja1105.h             |  36 +-
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   8 +
 drivers/net/dsa/sja1105/sja1105_main.c        |  94 +-
 drivers/net/dsa/sja1105/sja1105_ptp.c         | 345 ++++----
 drivers/net/dsa/sja1105/sja1105_ptp.h         | 103 ++-
 drivers/net/dsa/sja1105/sja1105_spi.c         |  58 +-
 .../net/dsa/sja1105/sja1105_static_config.c   | 167 ++++
 .../net/dsa/sja1105/sja1105_static_config.h   |  48 +-
 drivers/net/dsa/sja1105/sja1105_tas.c         | 830 ++++++++++++++++++
 drivers/net/dsa/sja1105/sja1105_tas.h         |  69 ++
 include/linux/netdevice.h                     |   1 +
 include/net/dsa.h                             |   2 +
 include/net/pkt_sched.h                       |  33 +
 include/uapi/linux/pkt_sched.h                |   3 +-
 net/dsa/slave.c                               |  12 +-
 net/dsa/tag_sja1105.c                         |   3 +-
 net/sched/sch_taprio.c                        | 278 +++++-
 19 files changed, 1886 insertions(+), 225 deletions(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 01/15] net: dsa: sja1105: Change the PTP command access pattern
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 02/15] net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_info Vladimir Oltean
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

The PTP command register contains enable bits for:
- Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
- Taking timestamps off of the corrected vs free-running clock
- Starting/stopping the TTEthernet scheduling
- Starting/stopping PPS output
- Resetting the switch

When a command needs to be issued (e.g. "change the PTPCLKVAL from write
mode to add/subtract mode"), one cannot simply write to the command
register setting the PTPCLKADD bit to 1, because that would zeroize the
other settings. One also cannot do a read-modify-write (that would be
too easy for this hardware) because not all bits of the command register
are readable over SPI.

So this leaves us with the only option of keeping the value of the PTP
command register in the driver, and operating on that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105.h     | 5 +++++
 drivers/net/dsa/sja1105/sja1105_ptp.c | 6 +-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index 78094db32622..d8a92646e80a 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -50,6 +50,10 @@ struct sja1105_regs {
 	u64 qlevel[SJA1105_NUM_PORTS];
 };
 
+struct sja1105_ptp_cmd {
+	u64 resptp;		/* reset */
+};
+
 struct sja1105_info {
 	u64 device_id;
 	/* Needed for distinction between P and R, and between Q and S
@@ -89,6 +93,7 @@ struct sja1105_private {
 	struct spi_device *spidev;
 	struct dsa_switch *ds;
 	struct sja1105_port ports[SJA1105_NUM_PORTS];
+	struct sja1105_ptp_cmd ptp_cmd;
 	struct ptp_clock_info ptp_caps;
 	struct ptp_clock *clock;
 	/* The cycle counter translates the PTP timestamps (based on
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index d8e8dd59f3d1..07374ba6b9be 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -54,10 +54,6 @@
 #define cc_to_sja1105(d) container_of((d), struct sja1105_private, tstamp_cc)
 #define dw_to_sja1105(d) container_of((d), struct sja1105_private, refresh_work)
 
-struct sja1105_ptp_cmd {
-	u64 resptp;       /* reset */
-};
-
 int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 			struct ethtool_ts_info *info)
 {
@@ -218,8 +214,8 @@ int sja1105_ptpegr_ts_poll(struct sja1105_private *priv, int port, u64 *ts)
 
 int sja1105_ptp_reset(struct sja1105_private *priv)
 {
+	struct sja1105_ptp_cmd cmd = priv->ptp_cmd;
 	struct dsa_switch *ds = priv->ds;
-	struct sja1105_ptp_cmd cmd = {0};
 	int rc;
 
 	mutex_lock(&priv->ptp_lock);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 02/15] net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_info
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 01/15] net: dsa: sja1105: Change the PTP command access pattern Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 03/15] net: dsa: sja1105: Switch to hardware operations for PTP Vladimir Oltean
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

We need priv->ptp_caps to hold a structure and not just a pointer,
because we use container_of in the various PTP callbacks.

Therefore, the sja1105_ptp_caps structure declared in the global memory
of the driver serves no further purpose after copying it into
priv->ptp_caps.

So just populate priv->ptp_caps with the needed operations and remove
sja1105_ptp_caps.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105_ptp.c | 29 +++++++++++++--------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index 07374ba6b9be..13f9f5799e46 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -343,29 +343,28 @@ static void sja1105_ptp_overflow_check(struct work_struct *work)
 	schedule_delayed_work(&priv->refresh_work, SJA1105_REFRESH_INTERVAL);
 }
 
-static const struct ptp_clock_info sja1105_ptp_caps = {
-	.owner		= THIS_MODULE,
-	.name		= "SJA1105 PHC",
-	.adjfine	= sja1105_ptp_adjfine,
-	.adjtime	= sja1105_ptp_adjtime,
-	.gettime64	= sja1105_ptp_gettime,
-	.settime64	= sja1105_ptp_settime,
-	.max_adj	= SJA1105_MAX_ADJ_PPB,
-};
-
 int sja1105_ptp_clock_register(struct sja1105_private *priv)
 {
 	struct dsa_switch *ds = priv->ds;
 
 	/* Set up the cycle counter */
 	priv->tstamp_cc = (struct cyclecounter) {
-		.read = sja1105_ptptsclk_read,
-		.mask = CYCLECOUNTER_MASK(64),
-		.shift = SJA1105_CC_SHIFT,
-		.mult = SJA1105_CC_MULT,
+		.read		= sja1105_ptptsclk_read,
+		.mask		= CYCLECOUNTER_MASK(64),
+		.shift		= SJA1105_CC_SHIFT,
+		.mult		= SJA1105_CC_MULT,
+	};
+	priv->ptp_caps = (struct ptp_clock_info) {
+		.owner		= THIS_MODULE,
+		.name		= "SJA1105 PHC",
+		.adjfine	= sja1105_ptp_adjfine,
+		.adjtime	= sja1105_ptp_adjtime,
+		.gettime64	= sja1105_ptp_gettime,
+		.settime64	= sja1105_ptp_settime,
+		.max_adj	= SJA1105_MAX_ADJ_PPB,
 	};
+
 	mutex_init(&priv->ptp_lock);
-	priv->ptp_caps = sja1105_ptp_caps;
 
 	priv->clock = ptp_clock_register(&priv->ptp_caps, ds->dev);
 	if (IS_ERR_OR_NULL(priv->clock))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 03/15] net: dsa: sja1105: Switch to hardware operations for PTP
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 01/15] net: dsa: sja1105: Change the PTP command access pattern Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 02/15] net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_info Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 04/15] net: dsa: sja1105: Implement the .gettimex64 system call " Vladimir Oltean
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Adjusting the hardware clock (PTPCLKVAL, PTPCLKADD, PTPCLKRATE) is a
requirement for the auxiliary PTP functionality of the switch
(TTEthernet, PPS input, PPS output).

Now that the sync precision issues have been identified (and fixed in
the spi-fsl-dspi driver), we can get rid of the timecounter/cyclecounter
implementation, which is reliant on the free-running PTPTSCLK.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105.h      |  16 +--
 drivers/net/dsa/sja1105/sja1105_main.c |  18 ++-
 drivers/net/dsa/sja1105/sja1105_ptp.c  | 181 ++++++++++++-------------
 drivers/net/dsa/sja1105/sja1105_ptp.h  |  22 +++
 drivers/net/dsa/sja1105/sja1105_spi.c  |   2 -
 5 files changed, 122 insertions(+), 117 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index d8a92646e80a..e4955a025e46 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -32,7 +32,6 @@ struct sja1105_regs {
 	u64 ptp_control;
 	u64 ptpclk;
 	u64 ptpclkrate;
-	u64 ptptsclk;
 	u64 ptpegr_ts[SJA1105_NUM_PORTS];
 	u64 pad_mii_tx[SJA1105_NUM_PORTS];
 	u64 pad_mii_id[SJA1105_NUM_PORTS];
@@ -50,8 +49,15 @@ struct sja1105_regs {
 	u64 qlevel[SJA1105_NUM_PORTS];
 };
 
+enum sja1105_ptp_clk_mode {
+	PTP_ADD_MODE = 1,
+	PTP_SET_MODE = 0,
+};
+
 struct sja1105_ptp_cmd {
 	u64 resptp;		/* reset */
+	u64 corrclk4ts;		/* use the corrected clock for timestamps */
+	u64 ptpclkadd;		/* enum sja1105_ptp_clk_mode */
 };
 
 struct sja1105_info {
@@ -96,13 +102,7 @@ struct sja1105_private {
 	struct sja1105_ptp_cmd ptp_cmd;
 	struct ptp_clock_info ptp_caps;
 	struct ptp_clock *clock;
-	/* The cycle counter translates the PTP timestamps (based on
-	 * a free-running counter) into a software time domain.
-	 */
-	struct cyclecounter tstamp_cc;
-	struct timecounter tstamp_tc;
-	struct delayed_work refresh_work;
-	/* Serializes all operations on the cycle counter */
+	/* Serializes all operations on the PTP hardware clock */
 	struct mutex ptp_lock;
 	/* Serializes transmission of management frames so that
 	 * the switch doesn't confuse them with one another.
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index d8cff0107ec4..630f7e337fe9 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1813,7 +1813,7 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 	struct skb_shared_hwtstamps shwt = {0};
 	int slot = sp->mgmt_slot;
 	struct sk_buff *clone;
-	u64 now, ts;
+	u64 ticks, ts;
 	int rc;
 
 	/* The tragic fact about the switch having 4x2 slots for installing
@@ -1844,7 +1844,7 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 
 	mutex_lock(&priv->ptp_lock);
 
-	now = priv->tstamp_cc.read(&priv->tstamp_cc);
+	ticks = sja1105_ptpclkval_read(priv);
 
 	rc = sja1105_ptpegr_ts_poll(priv, slot, &ts);
 	if (rc < 0) {
@@ -1853,10 +1853,9 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 		goto out_unlock_ptp;
 	}
 
-	ts = sja1105_tstamp_reconstruct(priv, now, ts);
-	ts = timecounter_cyc2time(&priv->tstamp_tc, ts);
+	ts = sja1105_tstamp_reconstruct(priv, ticks, ts);
 
-	shwt.hwtstamp = ns_to_ktime(ts);
+	shwt.hwtstamp = ns_to_ktime(sja1105_ticks_to_ns(ts));
 	skb_complete_tx_timestamp(clone, &shwt);
 
 out_unlock_ptp:
@@ -1994,11 +1993,11 @@ static void sja1105_rxtstamp_work(struct work_struct *work)
 	struct sja1105_tagger_data *data = to_tagger(work);
 	struct sja1105_private *priv = to_sja1105(data);
 	struct sk_buff *skb;
-	u64 now;
+	u64 ticks;
 
 	mutex_lock(&priv->ptp_lock);
 
-	now = priv->tstamp_cc.read(&priv->tstamp_cc);
+	ticks = sja1105_ptpclkval_read(priv);
 
 	while ((skb = skb_dequeue(&data->skb_rxtstamp_queue)) != NULL) {
 		struct skb_shared_hwtstamps *shwt = skb_hwtstamps(skb);
@@ -2007,10 +2006,9 @@ static void sja1105_rxtstamp_work(struct work_struct *work)
 		*shwt = (struct skb_shared_hwtstamps) {0};
 
 		ts = SJA1105_SKB_CB(skb)->meta_tstamp;
-		ts = sja1105_tstamp_reconstruct(priv, now, ts);
-		ts = timecounter_cyc2time(&priv->tstamp_tc, ts);
+		ts = sja1105_tstamp_reconstruct(priv, ticks, ts);
 
-		shwt->hwtstamp = ns_to_ktime(ts);
+		shwt->hwtstamp = ns_to_ktime(sja1105_ticks_to_ns(ts));
 		netif_rx_ni(skb);
 	}
 
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index 13f9f5799e46..bcdfdda46b9c 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -13,24 +13,6 @@
 #define SJA1105_MAX_ADJ_PPB		32000000
 #define SJA1105_SIZE_PTP_CMD		4
 
-/* Timestamps are in units of 8 ns clock ticks (equivalent to a fixed
- * 125 MHz clock) so the scale factor (MULT / SHIFT) needs to be 8.
- * Furthermore, wisely pick SHIFT as 28 bits, which translates
- * MULT into 2^31 (0x80000000).  This is the same value around which
- * the hardware PTPCLKRATE is centered, so the same ppb conversion
- * arithmetic can be reused.
- */
-#define SJA1105_CC_SHIFT		28
-#define SJA1105_CC_MULT			(8 << SJA1105_CC_SHIFT)
-
-/* Having 33 bits of cycle counter left until a 64-bit overflow during delta
- * conversion, we multiply this by the 8 ns counter resolution and arrive at
- * a comfortable 68.71 second refresh interval until the delta would cause
- * an integer overflow, in absence of any other readout.
- * Approximate to 1 minute.
- */
-#define SJA1105_REFRESH_INTERVAL	(HZ * 60)
-
 /*            This range is actually +/- SJA1105_MAX_ADJ_PPB
  *            divided by 1000 (ppb -> ppm) and with a 16-bit
  *            "fractional" part (actually fixed point).
@@ -41,7 +23,7 @@
  *
  * This forgoes a "ppb" numeric representation (up to NSEC_PER_SEC)
  * and defines the scaling factor between scaled_ppm and the actual
- * frequency adjustments (both cycle counter and hardware).
+ * frequency adjustments of the PHC.
  *
  *   ptpclkrate = scaled_ppm * 2^31 / (10^6 * 2^16)
  *   simplifies to
@@ -49,10 +31,9 @@
  */
 #define SJA1105_CC_MULT_NUM		(1 << 9)
 #define SJA1105_CC_MULT_DEM		15625
+#define SJA1105_CC_MULT			0x80000000
 
 #define ptp_to_sja1105(d) container_of((d), struct sja1105_private, ptp_caps)
-#define cc_to_sja1105(d) container_of((d), struct sja1105_private, tstamp_cc)
-#define dw_to_sja1105(d) container_of((d), struct sja1105_private, refresh_work)
 
 int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 			struct ethtool_ts_info *info)
@@ -86,6 +67,8 @@ int sja1105et_ptp_cmd(const void *ctx, const void *data)
 
 	sja1105_pack(buf, &valid,           31, 31, size);
 	sja1105_pack(buf, &cmd->resptp,      2,  2, size);
+	sja1105_pack(buf, &cmd->corrclk4ts,  1,  1, size);
+	sja1105_pack(buf, &cmd->ptpclkadd,   0,  0, size);
 
 	return sja1105_spi_send_packed_buf(priv, SPI_WRITE, regs->ptp_control,
 					   buf, SJA1105_SIZE_PTP_CMD);
@@ -103,6 +86,8 @@ int sja1105pqrs_ptp_cmd(const void *ctx, const void *data)
 
 	sja1105_pack(buf, &valid,           31, 31, size);
 	sja1105_pack(buf, &cmd->resptp,      3,  3, size);
+	sja1105_pack(buf, &cmd->corrclk4ts,  2,  2, size);
+	sja1105_pack(buf, &cmd->ptpclkadd,   0,  0, size);
 
 	return sja1105_spi_send_packed_buf(priv, SPI_WRITE, regs->ptp_control,
 					   buf, SJA1105_SIZE_PTP_CMD);
@@ -215,17 +200,14 @@ int sja1105_ptpegr_ts_poll(struct sja1105_private *priv, int port, u64 *ts)
 int sja1105_ptp_reset(struct sja1105_private *priv)
 {
 	struct sja1105_ptp_cmd cmd = priv->ptp_cmd;
-	struct dsa_switch *ds = priv->ds;
 	int rc;
 
 	mutex_lock(&priv->ptp_lock);
 
 	cmd.resptp = 1;
-	dev_dbg(ds->dev, "Resetting PTP clock\n");
-	rc = priv->info->ptp_cmd(priv, &cmd);
 
-	timecounter_init(&priv->tstamp_tc, &priv->tstamp_cc,
-			 ktime_to_ns(ktime_get_real()));
+	dev_dbg(priv->ds->dev, "Resetting PTP clock\n");
+	rc = priv->info->ptp_cmd(priv, &cmd);
 
 	mutex_unlock(&priv->ptp_lock);
 
@@ -236,124 +218,130 @@ static int sja1105_ptp_gettime(struct ptp_clock_info *ptp,
 			       struct timespec64 *ts)
 {
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
-	u64 ns;
+	u64 ticks;
 
 	mutex_lock(&priv->ptp_lock);
-	ns = timecounter_read(&priv->tstamp_tc);
-	mutex_unlock(&priv->ptp_lock);
 
-	*ts = ns_to_timespec64(ns);
+	ticks = sja1105_ptpclkval_read(priv);
+	*ts = ns_to_timespec64(sja1105_ticks_to_ns(ticks));
+
+	mutex_unlock(&priv->ptp_lock);
 
 	return 0;
 }
 
+/* Caller must hold priv->ptp_lock */
+static int sja1105_ptp_mode_set(struct sja1105_private *priv,
+				enum sja1105_ptp_clk_mode mode)
+{
+	if (priv->ptp_cmd.ptpclkadd == mode)
+		return 0;
+
+	priv->ptp_cmd.ptpclkadd = mode;
+
+	return priv->info->ptp_cmd(priv, &priv->ptp_cmd);
+}
+
+/* Caller must hold priv->ptp_lock */
+static int sja1105_ptpclkval_write(struct sja1105_private *priv, u64 val)
+{
+	const struct sja1105_regs *regs = priv->info->regs;
+
+	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclk, &val, 8);
+}
+
+/* Write to PTPCLKVAL while PTPCLKADD is 0 */
 static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
 			       const struct timespec64 *ts)
 {
+	u64 ticks = ns_to_sja1105_ticks(timespec64_to_ns(ts));
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
-	u64 ns = timespec64_to_ns(ts);
+	int rc;
 
 	mutex_lock(&priv->ptp_lock);
-	timecounter_init(&priv->tstamp_tc, &priv->tstamp_cc, ns);
+
+	rc = sja1105_ptp_mode_set(priv, PTP_SET_MODE);
+	if (rc < 0) {
+		dev_err(priv->ds->dev, "Failed to put PTPCLK in set mode\n");
+		goto out;
+	}
+
+	rc = sja1105_ptpclkval_write(priv, ticks);
+
+out:
 	mutex_unlock(&priv->ptp_lock);
 
-	return 0;
+	return rc;
 }
 
 static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 {
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	const struct sja1105_regs *regs = priv->info->regs;
 	s64 clkrate;
+	int rc;
 
 	clkrate = (s64)scaled_ppm * SJA1105_CC_MULT_NUM;
 	clkrate = div_s64(clkrate, SJA1105_CC_MULT_DEM);
 
-	mutex_lock(&priv->ptp_lock);
-
-	/* Force a readout to update the timer *before* changing its frequency.
-	 *
-	 * This way, its corrected time curve can at all times be modeled
-	 * as a linear "A * x + B" function, where:
-	 *
-	 * - B are past frequency adjustments and offset shifts, all
-	 *   accumulated into the cycle_last variable.
-	 *
-	 * - A is the new frequency adjustments we're just about to set.
-	 *
-	 * Reading now makes B accumulate the correct amount of time,
-	 * corrected at the old rate, before changing it.
-	 *
-	 * Hardware timestamps then become simple points on the curve and
-	 * are approximated using the above function.  This is still better
-	 * than letting the switch take the timestamps using the hardware
-	 * rate-corrected clock (PTPCLKVAL) - the comparison in this case would
-	 * be that we're shifting the ruler at the same time as we're taking
-	 * measurements with it.
-	 *
-	 * The disadvantage is that it's possible to receive timestamps when
-	 * a frequency adjustment took place in the near past.
-	 * In this case they will be approximated using the new ppb value
-	 * instead of a compound function made of two segments (one at the old
-	 * and the other at the new rate) - introducing some inaccuracy.
-	 */
-	timecounter_read(&priv->tstamp_tc);
-
-	priv->tstamp_cc.mult = SJA1105_CC_MULT + clkrate;
+	/* Take a +/- value and re-center it around 2^31. */
+	clkrate = SJA1105_CC_MULT + clkrate;
+	clkrate &= GENMASK_ULL(31, 0);
 
-	mutex_unlock(&priv->ptp_lock);
-
-	return 0;
-}
+	mutex_lock(&priv->ptp_lock);
 
-static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
-{
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
+				  &clkrate, 4);
 
-	mutex_lock(&priv->ptp_lock);
-	timecounter_adjtime(&priv->tstamp_tc, delta);
 	mutex_unlock(&priv->ptp_lock);
 
-	return 0;
+	return rc;
 }
 
-static u64 sja1105_ptptsclk_read(const struct cyclecounter *cc)
+/* Caller must hold priv->ptp_lock */
+u64 sja1105_ptpclkval_read(struct sja1105_private *priv)
 {
-	struct sja1105_private *priv = cc_to_sja1105(cc);
 	const struct sja1105_regs *regs = priv->info->regs;
-	u64 ptptsclk = 0;
+	u64 ptpclkval = 0;
 	int rc;
 
-	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptptsclk,
-				  &ptptsclk, 8);
+	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptpclk,
+				  &ptpclkval, 8);
 	if (rc < 0)
 		dev_err_ratelimited(priv->ds->dev,
-				    "failed to read ptp cycle counter: %d\n",
+				    "failed to read ptp time: %d\n",
 				    rc);
-	return ptptsclk;
+
+	return ptpclkval;
 }
 
-static void sja1105_ptp_overflow_check(struct work_struct *work)
+/* Write to PTPCLKVAL while PTPCLKADD is 1 */
+static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 {
-	struct delayed_work *dw = to_delayed_work(work);
-	struct sja1105_private *priv = dw_to_sja1105(dw);
-	struct timespec64 ts;
+	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	s64 ticks = ns_to_sja1105_ticks(delta);
+	int rc;
 
-	sja1105_ptp_gettime(&priv->ptp_caps, &ts);
+	mutex_lock(&priv->ptp_lock);
 
-	schedule_delayed_work(&priv->refresh_work, SJA1105_REFRESH_INTERVAL);
+	rc = sja1105_ptp_mode_set(priv, PTP_ADD_MODE);
+	if (rc < 0) {
+		dev_err(priv->ds->dev, "Failed to put PTPCLK in add mode\n");
+		goto out;
+	}
+
+	rc = sja1105_ptpclkval_write(priv, ticks);
+
+out:
+	mutex_unlock(&priv->ptp_lock);
+
+	return rc;
 }
 
 int sja1105_ptp_clock_register(struct sja1105_private *priv)
 {
 	struct dsa_switch *ds = priv->ds;
 
-	/* Set up the cycle counter */
-	priv->tstamp_cc = (struct cyclecounter) {
-		.read		= sja1105_ptptsclk_read,
-		.mask		= CYCLECOUNTER_MASK(64),
-		.shift		= SJA1105_CC_SHIFT,
-		.mult		= SJA1105_CC_MULT,
-	};
 	priv->ptp_caps = (struct ptp_clock_info) {
 		.owner		= THIS_MODULE,
 		.name		= "SJA1105 PHC",
@@ -370,8 +358,8 @@ int sja1105_ptp_clock_register(struct sja1105_private *priv)
 	if (IS_ERR_OR_NULL(priv->clock))
 		return PTR_ERR(priv->clock);
 
-	INIT_DELAYED_WORK(&priv->refresh_work, sja1105_ptp_overflow_check);
-	schedule_delayed_work(&priv->refresh_work, SJA1105_REFRESH_INTERVAL);
+	priv->ptp_cmd.corrclk4ts = true;
+	priv->ptp_cmd.ptpclkadd = PTP_SET_MODE;
 
 	return sja1105_ptp_reset(priv);
 }
@@ -381,7 +369,6 @@ void sja1105_ptp_clock_unregister(struct sja1105_private *priv)
 	if (IS_ERR_OR_NULL(priv->clock))
 		return;
 
-	cancel_delayed_work_sync(&priv->refresh_work);
 	ptp_clock_unregister(priv->clock);
 	priv->clock = NULL;
 }
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index af456b0a4d27..51e21d951548 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -4,6 +4,21 @@
 #ifndef _SJA1105_PTP_H
 #define _SJA1105_PTP_H
 
+/* Timestamps are in units of 8 ns clock ticks (equivalent to
+ * a fixed 125 MHz clock).
+ */
+#define SJA1105_TICK_NS			8
+
+static inline s64 ns_to_sja1105_ticks(s64 ns)
+{
+	return ns / SJA1105_TICK_NS;
+}
+
+static inline s64 sja1105_ticks_to_ns(s64 ticks)
+{
+	return ticks * SJA1105_TICK_NS;
+}
+
 #if IS_ENABLED(CONFIG_NET_DSA_SJA1105_PTP)
 
 int sja1105_ptp_clock_register(struct sja1105_private *priv);
@@ -24,6 +39,8 @@ u64 sja1105_tstamp_reconstruct(struct sja1105_private *priv, u64 now,
 
 int sja1105_ptp_reset(struct sja1105_private *priv);
 
+u64 sja1105_ptpclkval_read(struct sja1105_private *priv);
+
 #else
 
 static inline int sja1105_ptp_clock_register(struct sja1105_private *priv)
@@ -53,6 +70,11 @@ static inline int sja1105_ptp_reset(struct sja1105_private *priv)
 	return 0;
 }
 
+static inline u64 sja1105_ptpclkval_read(struct sja1105_private *priv)
+{
+	return 0;
+}
+
 #define sja1105et_ptp_cmd NULL
 
 #define sja1105pqrs_ptp_cmd NULL
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 84dc603138cf..1953d8c54af6 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -517,7 +517,6 @@ static struct sja1105_regs sja1105et_regs = {
 	.ptp_control = 0x17,
 	.ptpclk = 0x18, /* Spans 0x18 to 0x19 */
 	.ptpclkrate = 0x1A,
-	.ptptsclk = 0x1B, /* Spans 0x1B to 0x1C */
 };
 
 static struct sja1105_regs sja1105pqrs_regs = {
@@ -548,7 +547,6 @@ static struct sja1105_regs sja1105pqrs_regs = {
 	.ptp_control = 0x18,
 	.ptpclk = 0x19,
 	.ptpclkrate = 0x1B,
-	.ptptsclk = 0x1C,
 };
 
 struct sja1105_info sja1105e_info = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 04/15] net: dsa: sja1105: Implement the .gettimex64 system call for PTP
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (2 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 03/15] net: dsa: sja1105: Switch to hardware operations for PTP Vladimir Oltean
@ 2019-09-02 16:25 ` " Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 05/15] net: dsa: sja1105: Restore PTP time after switch reset Vladimir Oltean
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
applications (i.e. phc2sys) to compensate for the delays incurred while
reading the PHC's time.

For now implement this ioctl in the driver, although the performance
improvements are minimal. The goal with this patch is to rework the
infrastructure in the driver for SPI transfers to be timestamped. Other
patches depend on this change.

The "performance" implementation of this ioctl will come later, once the
API in the SPI subsystem is agreed upon. The change in the sja1105
driver will be minimal then.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105.h      |  3 ++-
 drivers/net/dsa/sja1105/sja1105_main.c |  8 +++---
 drivers/net/dsa/sja1105/sja1105_ptp.c  | 20 ++++++++------
 drivers/net/dsa/sja1105/sja1105_ptp.h  |  6 +++--
 drivers/net/dsa/sja1105/sja1105_spi.c  | 36 +++++++++++++++++++-------
 5 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index e4955a025e46..c80be59dafbd 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -131,7 +131,8 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 				void *packed_buf, size_t size_bytes);
 int sja1105_spi_send_int(const struct sja1105_private *priv,
 			 sja1105_spi_rw_mode_t rw, u64 reg_addr,
-			 u64 *value, u64 size_bytes);
+			 u64 *value, u64 size_bytes,
+			 struct ptp_system_timestamp *ptp_sts);
 int sja1105_spi_send_long_packed_buf(const struct sja1105_private *priv,
 				     sja1105_spi_rw_mode_t rw, u64 base_addr,
 				     void *packed_buf, u64 buf_len);
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 630f7e337fe9..f7f03d486499 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1844,7 +1844,7 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 
 	mutex_lock(&priv->ptp_lock);
 
-	ticks = sja1105_ptpclkval_read(priv);
+	ticks = sja1105_ptpclkval_read(priv, NULL);
 
 	rc = sja1105_ptpegr_ts_poll(priv, slot, &ts);
 	if (rc < 0) {
@@ -1997,7 +1997,7 @@ static void sja1105_rxtstamp_work(struct work_struct *work)
 
 	mutex_lock(&priv->ptp_lock);
 
-	ticks = sja1105_ptpclkval_read(priv);
+	ticks = sja1105_ptpclkval_read(priv, NULL);
 
 	while ((skb = skb_dequeue(&data->skb_rxtstamp_queue)) != NULL) {
 		struct skb_shared_hwtstamps *shwt = skb_hwtstamps(skb);
@@ -2092,8 +2092,8 @@ static int sja1105_check_device_id(struct sja1105_private *priv)
 	u64 part_no;
 	int rc;
 
-	rc = sja1105_spi_send_int(priv, SPI_READ, regs->device_id,
-				  &device_id, SJA1105_SIZE_DEVICE_ID);
+	rc = sja1105_spi_send_int(priv, SPI_READ, regs->device_id, &device_id,
+				  SJA1105_SIZE_DEVICE_ID, NULL);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index bcdfdda46b9c..04693c702b09 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
  */
+#include <linux/spi/spi.h>
 #include "sja1105.h"
 
 /* The adjfine API clamps ppb between [-32,768,000, 32,768,000], and
@@ -214,15 +215,16 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
 	return rc;
 }
 
-static int sja1105_ptp_gettime(struct ptp_clock_info *ptp,
-			       struct timespec64 *ts)
+static int sja1105_ptp_gettimex(struct ptp_clock_info *ptp,
+				struct timespec64 *ts,
+				struct ptp_system_timestamp *sts)
 {
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
 	u64 ticks;
 
 	mutex_lock(&priv->ptp_lock);
 
-	ticks = sja1105_ptpclkval_read(priv);
+	ticks = sja1105_ptpclkval_read(priv, sts);
 	*ts = ns_to_timespec64(sja1105_ticks_to_ns(ticks));
 
 	mutex_unlock(&priv->ptp_lock);
@@ -247,7 +249,8 @@ static int sja1105_ptpclkval_write(struct sja1105_private *priv, u64 val)
 {
 	const struct sja1105_regs *regs = priv->info->regs;
 
-	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclk, &val, 8);
+	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclk, &val, 8,
+				    NULL);
 }
 
 /* Write to PTPCLKVAL while PTPCLKADD is 0 */
@@ -291,7 +294,7 @@ static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 	mutex_lock(&priv->ptp_lock);
 
 	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
-				  &clkrate, 4);
+				  &clkrate, 4, NULL);
 
 	mutex_unlock(&priv->ptp_lock);
 
@@ -299,14 +302,15 @@ static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 }
 
 /* Caller must hold priv->ptp_lock */
-u64 sja1105_ptpclkval_read(struct sja1105_private *priv)
+u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
+			   struct ptp_system_timestamp *sts)
 {
 	const struct sja1105_regs *regs = priv->info->regs;
 	u64 ptpclkval = 0;
 	int rc;
 
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptpclk,
-				  &ptpclkval, 8);
+				  &ptpclkval, 8, sts);
 	if (rc < 0)
 		dev_err_ratelimited(priv->ds->dev,
 				    "failed to read ptp time: %d\n",
@@ -347,7 +351,7 @@ int sja1105_ptp_clock_register(struct sja1105_private *priv)
 		.name		= "SJA1105 PHC",
 		.adjfine	= sja1105_ptp_adjfine,
 		.adjtime	= sja1105_ptp_adjtime,
-		.gettime64	= sja1105_ptp_gettime,
+		.gettimex64	= sja1105_ptp_gettimex,
 		.settime64	= sja1105_ptp_settime,
 		.max_adj	= SJA1105_MAX_ADJ_PPB,
 	};
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index 51e21d951548..80c33e5e4503 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -39,7 +39,8 @@ u64 sja1105_tstamp_reconstruct(struct sja1105_private *priv, u64 now,
 
 int sja1105_ptp_reset(struct sja1105_private *priv);
 
-u64 sja1105_ptpclkval_read(struct sja1105_private *priv);
+u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
+			   struct ptp_system_timestamp *sts);
 
 #else
 
@@ -70,7 +71,8 @@ static inline int sja1105_ptp_reset(struct sja1105_private *priv)
 	return 0;
 }
 
-static inline u64 sja1105_ptpclkval_read(struct sja1105_private *priv)
+static inline u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
+					 struct ptp_system_timestamp *sts)
 {
 	return 0;
 }
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 1953d8c54af6..26985f1209ad 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -15,7 +15,8 @@
 	(SJA1105_SIZE_SPI_MSG_HEADER + SJA1105_SIZE_SPI_MSG_MAXLEN)
 
 static int sja1105_spi_transfer(const struct sja1105_private *priv,
-				const void *tx, void *rx, int size)
+				const void *tx, void *rx, int size,
+				struct ptp_system_timestamp *ptp_sts)
 {
 	struct spi_device *spi = priv->spidev;
 	struct spi_transfer transfer = {
@@ -35,12 +36,16 @@ static int sja1105_spi_transfer(const struct sja1105_private *priv,
 	spi_message_init(&msg);
 	spi_message_add_tail(&transfer, &msg);
 
+	ptp_read_system_prets(ptp_sts);
+
 	rc = spi_sync(spi, &msg);
 	if (rc < 0) {
 		dev_err(&spi->dev, "SPI transfer failed: %d\n", rc);
 		return rc;
 	}
 
+	ptp_read_system_postts(ptp_sts);
+
 	return rc;
 }
 
@@ -66,9 +71,11 @@ sja1105_spi_message_pack(void *buf, const struct sja1105_spi_message *msg)
  * @size_bytes is smaller than SIZE_SPI_MSG_MAXLEN. Larger packed buffers
  * are chunked in smaller pieces by sja1105_spi_send_long_packed_buf below.
  */
-int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
-				sja1105_spi_rw_mode_t rw, u64 reg_addr,
-				void *packed_buf, size_t size_bytes)
+static int
+__sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
+			      sja1105_spi_rw_mode_t rw, u64 reg_addr,
+			      void *packed_buf, size_t size_bytes,
+			      struct ptp_system_timestamp *ptp_sts)
 {
 	u8 tx_buf[SJA1105_SIZE_SPI_TRANSFER_MAX] = {0};
 	u8 rx_buf[SJA1105_SIZE_SPI_TRANSFER_MAX] = {0};
@@ -90,7 +97,7 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 		memcpy(tx_buf + SJA1105_SIZE_SPI_MSG_HEADER,
 		       packed_buf, size_bytes);
 
-	rc = sja1105_spi_transfer(priv, tx_buf, rx_buf, msg_len);
+	rc = sja1105_spi_transfer(priv, tx_buf, rx_buf, msg_len, ptp_sts);
 	if (rc < 0)
 		return rc;
 
@@ -101,6 +108,14 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 	return 0;
 }
 
+int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
+				sja1105_spi_rw_mode_t rw, u64 reg_addr,
+				void *packed_buf, size_t size_bytes)
+{
+	return __sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
+					     size_bytes, NULL);
+}
+
 /* If @rw is:
  * - SPI_WRITE: creates and sends an SPI write message at absolute
  *		address reg_addr, taking size_bytes from *packed_buf
@@ -114,7 +129,8 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
  */
 int sja1105_spi_send_int(const struct sja1105_private *priv,
 			 sja1105_spi_rw_mode_t rw, u64 reg_addr,
-			 u64 *value, u64 size_bytes)
+			 u64 *value, u64 size_bytes,
+			 struct ptp_system_timestamp *ptp_sts)
 {
 	u8 packed_buf[SJA1105_SIZE_SPI_MSG_MAXLEN];
 	int rc;
@@ -126,8 +142,8 @@ int sja1105_spi_send_int(const struct sja1105_private *priv,
 		sja1105_pack(packed_buf, value, 8 * size_bytes - 1, 0,
 			     size_bytes);
 
-	rc = sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
-					 size_bytes);
+	rc = __sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
+					   size_bytes, ptp_sts);
 
 	if (rw == SPI_READ)
 		sja1105_unpack(packed_buf, value, 8 * size_bytes - 1, 0,
@@ -291,7 +307,7 @@ int sja1105_inhibit_tx(const struct sja1105_private *priv,
 	int rc;
 
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->port_control,
-				  &inhibit_cmd, SJA1105_SIZE_PORT_CTRL);
+				  &inhibit_cmd, SJA1105_SIZE_PORT_CTRL, NULL);
 	if (rc < 0)
 		return rc;
 
@@ -301,7 +317,7 @@ int sja1105_inhibit_tx(const struct sja1105_private *priv,
 		inhibit_cmd &= ~port_bitmap;
 
 	return sja1105_spi_send_int(priv, SPI_WRITE, regs->port_control,
-				    &inhibit_cmd, SJA1105_SIZE_PORT_CTRL);
+				    &inhibit_cmd, SJA1105_SIZE_PORT_CTRL, NULL);
 }
 
 struct sja1105_status {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 05/15] net: dsa: sja1105: Restore PTP time after switch reset
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (3 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 04/15] net: dsa: sja1105: Implement the .gettimex64 system call " Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 06/15] net: dsa: sja1105: Disallow management xmit during " Vladimir Oltean
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

The PTP time of the switch is not preserved when uploading a new static
configuration. Work around this hardware oddity by reading its PTP time
before a static config upload, and restoring it afterwards.

Static config changes are expected to occur at runtime even in scenarios
directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
programmed in this way.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105_main.c | 32 ++++++++++++-
 drivers/net/dsa/sja1105/sja1105_ptp.c  | 66 ++++++++++++++++++--------
 drivers/net/dsa/sja1105/sja1105_ptp.h  | 25 ++++++++++
 drivers/net/dsa/sja1105/sja1105_spi.c  |  4 --
 4 files changed, 101 insertions(+), 26 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index f7f03d486499..abb22f0a9884 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1382,8 +1382,13 @@ static void sja1105_bridge_leave(struct dsa_switch *ds, int port,
  */
 static int sja1105_static_config_reload(struct sja1105_private *priv)
 {
+	struct ptp_system_timestamp ptp_sts_before;
+	struct ptp_system_timestamp ptp_sts_after;
 	struct sja1105_mac_config_entry *mac;
 	int speed_mbps[SJA1105_NUM_PORTS];
+	s64 t1, t2, t3, t4;
+	s64 ptpclkval;
+	s64 t12, t34;
 	int rc, i;
 
 	mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
@@ -1398,10 +1403,35 @@ static int sja1105_static_config_reload(struct sja1105_private *priv)
 		mac[i].speed = SJA1105_SPEED_AUTO;
 	}
 
+	/* No PTP operations can run right now */
+	mutex_lock(&priv->ptp_lock);
+
+	ptpclkval = __sja1105_ptp_gettimex(priv, &ptp_sts_before);
+
 	/* Reset switch and send updated static configuration */
 	rc = sja1105_static_config_upload(priv);
 	if (rc < 0)
-		goto out;
+		goto out_unlock_ptp;
+
+	rc = __sja1105_ptp_settime(priv, 0, &ptp_sts_after);
+	if (rc < 0)
+		goto out_unlock_ptp;
+
+	t1 = timespec64_to_ns(&ptp_sts_before.pre_ts);
+	t2 = timespec64_to_ns(&ptp_sts_before.post_ts);
+	t3 = timespec64_to_ns(&ptp_sts_after.pre_ts);
+	t4 = timespec64_to_ns(&ptp_sts_after.post_ts);
+	/* Mid point, corresponds to pre-reset PTPCLKVAL */
+	t12 = t1 + (t2 - t1) / 2;
+	/* Mid point, corresponds to post-reset PTPCLKVAL, aka 0 */
+	t34 = t3 + (t4 - t3) / 2;
+	/* Advance PTPCLKVAL by the time it took since its readout */
+	ptpclkval += (t34 - t12);
+
+	__sja1105_ptp_adjtime(priv, ptpclkval);
+
+out_unlock_ptp:
+	mutex_unlock(&priv->ptp_lock);
 
 	/* Configure the CGU (PLLs) for MII and RMII PHYs.
 	 * For these interfaces there is no dynamic configuration
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index 04693c702b09..a7722c0944fb 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -215,17 +215,26 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
 	return rc;
 }
 
+/* Caller must hold priv->ptp_lock */
+u64 __sja1105_ptp_gettimex(struct sja1105_private *priv,
+			   struct ptp_system_timestamp *sts)
+{
+	u64 ticks;
+
+	ticks = sja1105_ptpclkval_read(priv, sts);
+
+	return sja1105_ticks_to_ns(ticks);
+}
+
 static int sja1105_ptp_gettimex(struct ptp_clock_info *ptp,
 				struct timespec64 *ts,
 				struct ptp_system_timestamp *sts)
 {
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
-	u64 ticks;
 
 	mutex_lock(&priv->ptp_lock);
 
-	ticks = sja1105_ptpclkval_read(priv, sts);
-	*ts = ns_to_timespec64(sja1105_ticks_to_ns(ticks));
+	*ts = ns_to_timespec64(__sja1105_ptp_gettimex(priv, sts));
 
 	mutex_unlock(&priv->ptp_lock);
 
@@ -245,33 +254,42 @@ static int sja1105_ptp_mode_set(struct sja1105_private *priv,
 }
 
 /* Caller must hold priv->ptp_lock */
-static int sja1105_ptpclkval_write(struct sja1105_private *priv, u64 val)
+static int sja1105_ptpclkval_write(struct sja1105_private *priv, u64 val,
+				   struct ptp_system_timestamp *ptp_sts)
 {
 	const struct sja1105_regs *regs = priv->info->regs;
 
 	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclk, &val, 8,
-				    NULL);
+				    ptp_sts);
 }
 
 /* Write to PTPCLKVAL while PTPCLKADD is 0 */
-static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
-			       const struct timespec64 *ts)
+int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
+			  struct ptp_system_timestamp *ptp_sts)
 {
-	u64 ticks = ns_to_sja1105_ticks(timespec64_to_ns(ts));
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	u64 ticks = ns_to_sja1105_ticks(ns);
 	int rc;
 
-	mutex_lock(&priv->ptp_lock);
-
 	rc = sja1105_ptp_mode_set(priv, PTP_SET_MODE);
 	if (rc < 0) {
 		dev_err(priv->ds->dev, "Failed to put PTPCLK in set mode\n");
-		goto out;
+		return rc;
 	}
 
-	rc = sja1105_ptpclkval_write(priv, ticks);
+	return sja1105_ptpclkval_write(priv, ticks, ptp_sts);
+}
+
+static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
+			       const struct timespec64 *ts)
+{
+	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	u64 ns = timespec64_to_ns(ts);
+	int rc;
+
+	mutex_lock(&priv->ptp_lock);
+
+	rc = __sja1105_ptp_settime(priv, ns, NULL);
 
-out:
 	mutex_unlock(&priv->ptp_lock);
 
 	return rc;
@@ -320,23 +338,29 @@ u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
 }
 
 /* Write to PTPCLKVAL while PTPCLKADD is 1 */
-static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
 {
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
 	s64 ticks = ns_to_sja1105_ticks(delta);
 	int rc;
 
-	mutex_lock(&priv->ptp_lock);
-
 	rc = sja1105_ptp_mode_set(priv, PTP_ADD_MODE);
 	if (rc < 0) {
 		dev_err(priv->ds->dev, "Failed to put PTPCLK in add mode\n");
-		goto out;
+		return rc;
 	}
 
-	rc = sja1105_ptpclkval_write(priv, ticks);
+	return sja1105_ptpclkval_write(priv, ticks, NULL);
+}
+
+static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	int rc;
+
+	mutex_lock(&priv->ptp_lock);
+
+	rc = __sja1105_ptp_adjtime(priv, delta);
 
-out:
 	mutex_unlock(&priv->ptp_lock);
 
 	return rc;
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index 80c33e5e4503..c699611e585d 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -42,6 +42,14 @@ int sja1105_ptp_reset(struct sja1105_private *priv);
 u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
 			   struct ptp_system_timestamp *sts);
 
+u64 __sja1105_ptp_gettimex(struct sja1105_private *priv,
+			   struct ptp_system_timestamp *sts);
+
+int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
+			  struct ptp_system_timestamp *ptp_sts);
+
+int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta);
+
 #else
 
 static inline int sja1105_ptp_clock_register(struct sja1105_private *priv)
@@ -77,6 +85,23 @@ static inline u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
 	return 0;
 }
 
+static inline u64 __sja1105_ptp_gettimex(struct sja1105_private *priv,
+					 struct ptp_system_timestamp *sts)
+{
+	return 0;
+}
+
+static inline int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
+					struct ptp_system_timestamp *ptp_sts)
+{
+	return 0;
+}
+
+static inline int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
+{
+	return 0;
+}
+
 #define sja1105et_ptp_cmd NULL
 
 #define sja1105pqrs_ptp_cmd NULL
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 26985f1209ad..eae9c9baa189 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -496,10 +496,6 @@ int sja1105_static_config_upload(struct sja1105_private *priv)
 		dev_info(dev, "Succeeded after %d tried\n", RETRIES - retries);
 	}
 
-	rc = sja1105_ptp_reset(priv);
-	if (rc < 0)
-		dev_err(dev, "Failed to reset PTP clock: %d\n", rc);
-
 	dev_info(dev, "Reset switch and programmed static config\n");
 
 out:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 06/15] net: dsa: sja1105: Disallow management xmit during switch reset
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (4 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 05/15] net: dsa: sja1105: Restore PTP time after switch reset Vladimir Oltean
@ 2019-09-02 16:25 ` " Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 07/15] net: dsa: sja1105: Move PTP data to its own private structure Vladimir Oltean
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

The purpose here is to avoid ptp4l fail due to this condition:

  timed out while polling for tx timestamp
  increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
  port 1: send peer delay request failed

So either reset the switch before the management frame was sent, or
after it was timestamped as well, but not in the middle.

The condition may arise either due to a true timeout (i.e. because
re-uploading the static config takes time), or due to the TX timestamp
actually getting lost due to reset. For the former we can increase
tx_timestamp_timeout in userspace, for the latter we need this patch.

Locking all traffic during switch reset does not make sense at all,
though. Forcing all CPU-originated traffic to potentially block waiting
for a sleepable context to send > 800 bytes over SPI is not a good idea.
Flows that are autonomously forwarded by the switch will get dropped
anyway during switch reset no matter what. So just let all other
CPU-originated traffic be dropped as well.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index abb22f0a9884..d92f15b3aea9 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1391,6 +1391,8 @@ static int sja1105_static_config_reload(struct sja1105_private *priv)
 	s64 t12, t34;
 	int rc, i;
 
+	mutex_lock(&priv->mgmt_lock);
+
 	mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
 
 	/* Back up the dynamic link speed changed by sja1105_adjust_port_config
@@ -1447,6 +1449,8 @@ static int sja1105_static_config_reload(struct sja1105_private *priv)
 			goto out;
 	}
 out:
+	mutex_unlock(&priv->mgmt_lock);
+
 	return rc;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 07/15] net: dsa: sja1105: Move PTP data to its own private structure
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (5 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 06/15] net: dsa: sja1105: Disallow management xmit during " Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 08/15] net: dsa: sja1105: Advertise the 8 TX queues Vladimir Oltean
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Reduce the size of the sja1105_private structure when
CONFIG_NET_DSA_SJA1105_PTP is not enabled. Also make the PTP code a
little bit more self-contained.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105.h      | 20 +------
 drivers/net/dsa/sja1105/sja1105_main.c | 12 ++--
 drivers/net/dsa/sja1105/sja1105_ptp.c  | 81 +++++++++++++++-----------
 drivers/net/dsa/sja1105/sja1105_ptp.h  | 29 +++++++++
 4 files changed, 84 insertions(+), 58 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index c80be59dafbd..3ca0b87aa3e4 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -20,6 +20,8 @@
  */
 #define SJA1105_AGEING_TIME_MS(ms)	((ms) / 10)
 
+#include "sja1105_ptp.h"
+
 /* Keeps the different addresses between E/T and P/Q/R/S */
 struct sja1105_regs {
 	u64 device_id;
@@ -49,17 +51,6 @@ struct sja1105_regs {
 	u64 qlevel[SJA1105_NUM_PORTS];
 };
 
-enum sja1105_ptp_clk_mode {
-	PTP_ADD_MODE = 1,
-	PTP_SET_MODE = 0,
-};
-
-struct sja1105_ptp_cmd {
-	u64 resptp;		/* reset */
-	u64 corrclk4ts;		/* use the corrected clock for timestamps */
-	u64 ptpclkadd;		/* enum sja1105_ptp_clk_mode */
-};
-
 struct sja1105_info {
 	u64 device_id;
 	/* Needed for distinction between P and R, and between Q and S
@@ -99,20 +90,15 @@ struct sja1105_private {
 	struct spi_device *spidev;
 	struct dsa_switch *ds;
 	struct sja1105_port ports[SJA1105_NUM_PORTS];
-	struct sja1105_ptp_cmd ptp_cmd;
-	struct ptp_clock_info ptp_caps;
-	struct ptp_clock *clock;
-	/* Serializes all operations on the PTP hardware clock */
-	struct mutex ptp_lock;
 	/* Serializes transmission of management frames so that
 	 * the switch doesn't confuse them with one another.
 	 */
 	struct mutex mgmt_lock;
 	struct sja1105_tagger_data tagger_data;
+	struct sja1105_ptp_data ptp_data;
 };
 
 #include "sja1105_dynamic_config.h"
-#include "sja1105_ptp.h"
 
 struct sja1105_spi_message {
 	u64 access;
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index d92f15b3aea9..670c069722d5 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -1406,7 +1406,7 @@ static int sja1105_static_config_reload(struct sja1105_private *priv)
 	}
 
 	/* No PTP operations can run right now */
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&priv->ptp_data.lock);
 
 	ptpclkval = __sja1105_ptp_gettimex(priv, &ptp_sts_before);
 
@@ -1433,7 +1433,7 @@ static int sja1105_static_config_reload(struct sja1105_private *priv)
 	__sja1105_ptp_adjtime(priv, ptpclkval);
 
 out_unlock_ptp:
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&priv->ptp_data.lock);
 
 	/* Configure the CGU (PLLs) for MII and RMII PHYs.
 	 * For these interfaces there is no dynamic configuration
@@ -1876,7 +1876,7 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 
 	skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS;
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&priv->ptp_data.lock);
 
 	ticks = sja1105_ptpclkval_read(priv, NULL);
 
@@ -1893,7 +1893,7 @@ static netdev_tx_t sja1105_port_deferred_xmit(struct dsa_switch *ds, int port,
 	skb_complete_tx_timestamp(clone, &shwt);
 
 out_unlock_ptp:
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&priv->ptp_data.lock);
 out:
 	mutex_unlock(&priv->mgmt_lock);
 	return NETDEV_TX_OK;
@@ -2029,7 +2029,7 @@ static void sja1105_rxtstamp_work(struct work_struct *work)
 	struct sk_buff *skb;
 	u64 ticks;
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&priv->ptp_data.lock);
 
 	ticks = sja1105_ptpclkval_read(priv, NULL);
 
@@ -2046,7 +2046,7 @@ static void sja1105_rxtstamp_work(struct work_struct *work)
 		netif_rx_ni(skb);
 	}
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&priv->ptp_data.lock);
 }
 
 /* Called from dsa_skb_defer_rx_timestamp */
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index a7722c0944fb..f85f44bdab31 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -34,7 +34,10 @@
 #define SJA1105_CC_MULT_DEM		15625
 #define SJA1105_CC_MULT			0x80000000
 
-#define ptp_to_sja1105(d) container_of((d), struct sja1105_private, ptp_caps)
+#define ptp_to_sja1105_data(d) \
+		container_of((d), struct sja1105_ptp_data, caps)
+#define ptp_data_to_sja1105(d) \
+		container_of((d), struct sja1105_private, ptp_data)
 
 int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 			struct ethtool_ts_info *info)
@@ -42,7 +45,7 @@ int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 	struct sja1105_private *priv = ds->priv;
 
 	/* Called during cleanup */
-	if (!priv->clock)
+	if (!priv->ptp_data.clock)
 		return -ENODEV;
 
 	info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE |
@@ -52,7 +55,7 @@ int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 			 (1 << HWTSTAMP_TX_ON);
 	info->rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
 			   (1 << HWTSTAMP_FILTER_PTP_V2_L2_EVENT);
-	info->phc_index = ptp_clock_index(priv->clock);
+	info->phc_index = ptp_clock_index(priv->ptp_data.clock);
 	return 0;
 }
 
@@ -200,22 +203,23 @@ int sja1105_ptpegr_ts_poll(struct sja1105_private *priv, int port, u64 *ts)
 
 int sja1105_ptp_reset(struct sja1105_private *priv)
 {
-	struct sja1105_ptp_cmd cmd = priv->ptp_cmd;
+	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
+	struct sja1105_ptp_cmd cmd = ptp_data->cmd;
 	int rc;
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&ptp_data->lock);
 
 	cmd.resptp = 1;
 
 	dev_dbg(priv->ds->dev, "Resetting PTP clock\n");
 	rc = priv->info->ptp_cmd(priv, &cmd);
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&ptp_data->lock);
 
 	return rc;
 }
 
-/* Caller must hold priv->ptp_lock */
+/* Caller must hold priv->ptp_data.lock */
 u64 __sja1105_ptp_gettimex(struct sja1105_private *priv,
 			   struct ptp_system_timestamp *sts)
 {
@@ -230,30 +234,31 @@ static int sja1105_ptp_gettimex(struct ptp_clock_info *ptp,
 				struct timespec64 *ts,
 				struct ptp_system_timestamp *sts)
 {
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	struct sja1105_ptp_data *ptp_data = ptp_to_sja1105_data(ptp);
+	struct sja1105_private *priv = ptp_data_to_sja1105(ptp_data);
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&ptp_data->lock);
 
 	*ts = ns_to_timespec64(__sja1105_ptp_gettimex(priv, sts));
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&ptp_data->lock);
 
 	return 0;
 }
 
-/* Caller must hold priv->ptp_lock */
+/* Caller must hold priv->ptp_data.lock */
 static int sja1105_ptp_mode_set(struct sja1105_private *priv,
 				enum sja1105_ptp_clk_mode mode)
 {
-	if (priv->ptp_cmd.ptpclkadd == mode)
+	if (priv->ptp_data.cmd.ptpclkadd == mode)
 		return 0;
 
-	priv->ptp_cmd.ptpclkadd = mode;
+	priv->ptp_data.cmd.ptpclkadd = mode;
 
-	return priv->info->ptp_cmd(priv, &priv->ptp_cmd);
+	return priv->info->ptp_cmd(priv, &priv->ptp_data.cmd);
 }
 
-/* Caller must hold priv->ptp_lock */
+/* Caller must hold priv->ptp_data.lock */
 static int sja1105_ptpclkval_write(struct sja1105_private *priv, u64 val,
 				   struct ptp_system_timestamp *ptp_sts)
 {
@@ -282,22 +287,24 @@ int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
 static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
 			       const struct timespec64 *ts)
 {
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	struct sja1105_ptp_data *ptp_data = ptp_to_sja1105_data(ptp);
+	struct sja1105_private *priv = ptp_data_to_sja1105(ptp_data);
 	u64 ns = timespec64_to_ns(ts);
 	int rc;
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&ptp_data->lock);
 
 	rc = __sja1105_ptp_settime(priv, ns, NULL);
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&ptp_data->lock);
 
 	return rc;
 }
 
 static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 {
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	struct sja1105_ptp_data *ptp_data = ptp_to_sja1105_data(ptp);
+	struct sja1105_private *priv = ptp_data_to_sja1105(ptp_data);
 	const struct sja1105_regs *regs = priv->info->regs;
 	s64 clkrate;
 	int rc;
@@ -309,17 +316,17 @@ static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 	clkrate = SJA1105_CC_MULT + clkrate;
 	clkrate &= GENMASK_ULL(31, 0);
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&priv->ptp_data.lock);
 
 	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
 				  &clkrate, 4, NULL);
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&priv->ptp_data.lock);
 
 	return rc;
 }
 
-/* Caller must hold priv->ptp_lock */
+/* Caller must hold priv->ptp_data.lock */
 u64 sja1105_ptpclkval_read(struct sja1105_private *priv,
 			   struct ptp_system_timestamp *sts)
 {
@@ -354,23 +361,25 @@ int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
 
 static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 {
-	struct sja1105_private *priv = ptp_to_sja1105(ptp);
+	struct sja1105_ptp_data *ptp_data = ptp_to_sja1105_data(ptp);
+	struct sja1105_private *priv = ptp_data_to_sja1105(ptp_data);
 	int rc;
 
-	mutex_lock(&priv->ptp_lock);
+	mutex_lock(&ptp_data->lock);
 
 	rc = __sja1105_ptp_adjtime(priv, delta);
 
-	mutex_unlock(&priv->ptp_lock);
+	mutex_unlock(&ptp_data->lock);
 
 	return rc;
 }
 
 int sja1105_ptp_clock_register(struct sja1105_private *priv)
 {
+	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
 	struct dsa_switch *ds = priv->ds;
 
-	priv->ptp_caps = (struct ptp_clock_info) {
+	ptp_data->caps = (struct ptp_clock_info) {
 		.owner		= THIS_MODULE,
 		.name		= "SJA1105 PHC",
 		.adjfine	= sja1105_ptp_adjfine,
@@ -380,23 +389,25 @@ int sja1105_ptp_clock_register(struct sja1105_private *priv)
 		.max_adj	= SJA1105_MAX_ADJ_PPB,
 	};
 
-	mutex_init(&priv->ptp_lock);
+	mutex_init(&ptp_data->lock);
 
-	priv->clock = ptp_clock_register(&priv->ptp_caps, ds->dev);
-	if (IS_ERR_OR_NULL(priv->clock))
-		return PTR_ERR(priv->clock);
+	ptp_data->clock = ptp_clock_register(&ptp_data->caps, ds->dev);
+	if (IS_ERR_OR_NULL(ptp_data->clock))
+		return PTR_ERR(ptp_data->clock);
 
-	priv->ptp_cmd.corrclk4ts = true;
-	priv->ptp_cmd.ptpclkadd = PTP_SET_MODE;
+	ptp_data->cmd.corrclk4ts = true;
+	ptp_data->cmd.ptpclkadd = PTP_SET_MODE;
 
 	return sja1105_ptp_reset(priv);
 }
 
 void sja1105_ptp_clock_unregister(struct sja1105_private *priv)
 {
-	if (IS_ERR_OR_NULL(priv->clock))
+	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
+
+	if (IS_ERR_OR_NULL(ptp_data->clock))
 		return;
 
-	ptp_clock_unregister(priv->clock);
-	priv->clock = NULL;
+	ptp_clock_unregister(ptp_data->clock);
+	ptp_data->clock = NULL;
 }
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index c699611e585d..dfe856200394 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -19,8 +19,29 @@ static inline s64 sja1105_ticks_to_ns(s64 ticks)
 	return ticks * SJA1105_TICK_NS;
 }
 
+struct sja1105_private;
+
 #if IS_ENABLED(CONFIG_NET_DSA_SJA1105_PTP)
 
+enum sja1105_ptp_clk_mode {
+	PTP_ADD_MODE = 1,
+	PTP_SET_MODE = 0,
+};
+
+struct sja1105_ptp_cmd {
+	u64 resptp;		/* reset */
+	u64 corrclk4ts;		/* use the corrected clock for timestamps */
+	u64 ptpclkadd;		/* enum sja1105_ptp_clk_mode */
+};
+
+struct sja1105_ptp_data {
+	struct sja1105_ptp_cmd cmd;
+	struct ptp_clock_info caps;
+	struct ptp_clock *clock;
+	/* Serializes all operations on the PTP hardware clock */
+	struct mutex lock;
+};
+
 int sja1105_ptp_clock_register(struct sja1105_private *priv);
 
 void sja1105_ptp_clock_unregister(struct sja1105_private *priv);
@@ -52,6 +73,14 @@ int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta);
 
 #else
 
+/* Structures cannot be empty in C. Bah!
+ * Keep the mutex as the only element, which is a bit more difficult to
+ * refactor out of sja1105_main.c anyway.
+ */
+struct sja1105_ptp_data {
+	struct mutex lock;
+};
+
 static inline int sja1105_ptp_clock_register(struct sja1105_private *priv)
 {
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 08/15] net: dsa: sja1105: Advertise the 8 TX queues
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (6 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 07/15] net: dsa: sja1105: Move PTP data to its own private structure Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 09/15] taprio: Add support for hardware offloading Vladimir Oltean
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

This is a preparation patch for the tc-taprio offload (and potentially
for other future offloads such as tc-mqprio).

Instead of looking directly at skb->priority during xmit, let's get the
netdev queue and the queue-to-traffic-class mapping, and put the
resulting traffic class into the dsa_8021q PCP field. The switch is
configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
(see vlan_pmap in sja1105_main.c), so the effect is that we can inject
into a front-panel's egress traffic class through VLAN tagging from
Linux, completely transparently.

Unfortunately the switch doesn't look at the VLAN PCP in the case of
management traffic to/from the CPU (link-local frames at
01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
transmission queue of this type of traffic on a frame-by-frame basis. It
is only selected through the "hostprio" setting which ATM is harcoded in
the driver to 7.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105_main.c | 7 ++++++-
 net/dsa/tag_sja1105.c                  | 3 ++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 670c069722d5..8b930cc2dabc 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -384,7 +384,9 @@ static int sja1105_init_general_params(struct sja1105_private *priv)
 		/* Disallow dynamic changing of the mirror port */
 		.mirr_ptacu = 0,
 		.switchid = priv->ds->index,
-		/* Priority queue for link-local frames trapped to CPU */
+		/* Priority queue for link-local management frames
+		 * (both ingress to and egress from CPU - PTP, STP etc)
+		 */
 		.hostprio = 7,
 		.mac_fltres1 = SJA1105_LINKLOCAL_FILTER_A,
 		.mac_flt1    = SJA1105_LINKLOCAL_FILTER_A_MASK,
@@ -1745,6 +1747,9 @@ static int sja1105_setup(struct dsa_switch *ds)
 	 */
 	ds->vlan_filtering_is_global = true;
 
+	/* Advertise the 8 egress queues */
+	ds->num_tx_queues = SJA1105_NUM_TC;
+
 	/* The DSA/switchdev model brings up switch ports in standalone mode by
 	 * default, and that means vlan_filtering is 0 since they're not under
 	 * a bridge, so it's safe to set up switch tagging at this time.
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 47ee88163a9d..9c9aff3e52cf 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -89,7 +89,8 @@ static struct sk_buff *sja1105_xmit(struct sk_buff *skb,
 	struct dsa_port *dp = dsa_slave_to_port(netdev);
 	struct dsa_switch *ds = dp->ds;
 	u16 tx_vid = dsa_8021q_tx_vid(ds, dp->index);
-	u8 pcp = skb->priority;
+	u16 queue_mapping = skb_get_queue_mapping(skb);
+	u8 pcp = netdev_txq_to_tc(netdev, queue_mapping);
 
 	/* Transmitting management traffic does not rely upon switch tagging,
 	 * but instead SPI-installed management routes. Part 2 of this
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 09/15] taprio: Add support for hardware offloading
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (7 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 08/15] net: dsa: sja1105: Advertise the 8 TX queues Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers Vladimir Oltean
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.

The important detail here is the difference between the gate_mask in
taprio and gate_mask for the network driver. For the driver, each bit
in gate_mask references a transmission queue: bit 0 for queue 0, bit 1
for queue 1, and so on. This is done so the driver doesn't need to
know about traffic classes.

Two reference counting API helpers are also added to support the use
case where Ethernet drivers need to keep the taprio offload structure
locally (i.e. they are a multi-port switch driver, and configuring a
port depends on the settings of other ports as well).

Full offload is requested from the network interface by specifying
"flags 2" in the tc qdisc creation command, which in turn corresponds to
the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- Made the combination of FULL_OFFLOAD and TXTIME_ASSIST invalid.
- Made ndo_setup_tc be called from sleepable context.
- Added a taprio_alloc helper to avoid passing stack memory to drivers.
- Made taprio_disable_offload take the extack as well.
- Conditioned the setup of the software (and txtime-assisted)
  implementation of taprio on there not being a full offload in place.
- Fixed a lockdep-related compilation bug.

 include/linux/netdevice.h      |   1 +
 include/net/pkt_sched.h        |  33 ++++
 include/uapi/linux/pkt_sched.h |   3 +-
 net/sched/sch_taprio.c         | 278 ++++++++++++++++++++++++++++++++-
 4 files changed, 309 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b5d28dadf964..8225631b9315 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -847,6 +847,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_ETF,
 	TC_SETUP_ROOT_QDISC,
 	TC_SETUP_QDISC_GRED,
+	TC_SETUP_QDISC_TAPRIO,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index a16fbe9a2a67..bba288f9c98b 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -161,4 +161,37 @@ struct tc_etf_qopt_offload {
 	s32 queue;
 };
 
+struct tc_taprio_sched_entry {
+	u8 command; /* TC_TAPRIO_CMD_* */
+
+	/* The gate_mask in the offloading side refers to HW queues */
+	u32 gate_mask;
+	u32 interval;
+};
+
+struct tc_taprio_qopt_offload {
+	refcount_t users;
+	u8 enable;
+	ktime_t base_time;
+	u64 cycle_time;
+	u64 cycle_time_extension;
+
+	size_t num_entries;
+	struct tc_taprio_sched_entry entries[0];
+};
+
+static inline struct tc_taprio_qopt_offload *
+taprio_get(struct tc_taprio_qopt_offload *taprio)
+{
+	refcount_inc(&taprio->users);
+	return taprio;
+}
+
+static inline void taprio_free(struct tc_taprio_qopt_offload *taprio)
+{
+	if (!refcount_dec_and_test(&taprio->users))
+		return;
+	kfree(taprio);
+}
+
 #endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 18f185299f47..5011259b8f67 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1160,7 +1160,8 @@ enum {
  *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
  */
 
-#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST 0x1
+#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST	BIT(0)
+#define TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD	BIT(1)
 
 enum {
 	TCA_TAPRIO_ATTR_UNSPEC,
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 84b863e2bdbd..f0fa9a47142c 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -29,8 +29,8 @@ static DEFINE_SPINLOCK(taprio_list_lock);
 
 #define TAPRIO_ALL_GATES_OPEN -1
 
-#define FLAGS_VALID(flags) (!((flags) & ~TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST))
 #define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
+#define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
 
 struct sched_entry {
 	struct list_head list;
@@ -75,6 +75,8 @@ struct taprio_sched {
 	struct sched_gate_list __rcu *admin_sched;
 	struct hrtimer advance_timer;
 	struct list_head taprio_list;
+	struct sk_buff *(*dequeue)(struct Qdisc *sch);
+	struct sk_buff *(*peek)(struct Qdisc *sch);
 	u32 txtime_delay;
 };
 
@@ -268,6 +270,19 @@ static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
 	return entry;
 }
 
+static bool taprio_flags_valid(u32 flags)
+{
+	/* Make sure no other flag bits are set. */
+	if (flags & ~(TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST |
+		      TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD))
+		return false;
+	/* txtime-assist and full offload are mutually exclusive */
+	if ((flags & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST) &&
+	    (flags & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD))
+		return false;
+	return true;
+}
+
 /* This returns the tstamp value set by TCP in terms of the set clock. */
 static ktime_t get_tcp_tstamp(struct taprio_sched *q, struct sk_buff *skb)
 {
@@ -417,7 +432,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	return qdisc_enqueue(skb, child, to_free);
 }
 
-static struct sk_buff *taprio_peek(struct Qdisc *sch)
+static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
@@ -461,6 +476,36 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
 	return NULL;
 }
 
+static struct sk_buff *taprio_peek_offload(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct sk_buff *skb;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct Qdisc *child = q->qdiscs[i];
+
+		if (unlikely(!child))
+			continue;
+
+		skb = child->ops->peek(child);
+		if (!skb)
+			continue;
+
+		return skb;
+	}
+
+	return NULL;
+}
+
+static struct sk_buff *taprio_peek(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+
+	return q->peek(sch);
+}
+
 static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
 {
 	atomic_set(&entry->budget,
@@ -468,7 +513,7 @@ static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
 			     atomic64_read(&q->picos_per_byte)));
 }
 
-static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
@@ -550,6 +595,40 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 	return skb;
 }
 
+static struct sk_buff *taprio_dequeue_offload(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct sk_buff *skb;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct Qdisc *child = q->qdiscs[i];
+
+		if (unlikely(!child))
+			continue;
+
+		skb = child->ops->dequeue(child);
+		if (unlikely(!skb))
+			continue;
+
+		qdisc_bstats_update(sch, skb);
+		qdisc_qstats_backlog_dec(sch, skb);
+		sch->q.qlen--;
+
+		return skb;
+	}
+
+	return NULL;
+}
+
+static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+
+	return q->dequeue(sch);
+}
+
 static bool should_restart_cycle(const struct sched_gate_list *oper,
 				 const struct sched_entry *entry)
 {
@@ -1011,6 +1090,163 @@ static void setup_txtime(struct taprio_sched *q,
 	}
 }
 
+static u32 tc_mask_to_queue_mask(const struct tc_mqprio_qopt *mqprio,
+				 u32 tc_mask)
+{
+	u32 i, queue_mask = 0;
+
+	for (i = 0; i < mqprio->num_tc; i++) {
+		u32 offset, count;
+
+		if (!(tc_mask & BIT(i)))
+			continue;
+
+		offset = mqprio->offset[i];
+		count = mqprio->count[i];
+
+		queue_mask |= GENMASK(offset + count - 1, offset);
+	}
+
+	return queue_mask;
+}
+
+static void taprio_sched_to_offload(struct taprio_sched *q,
+				    struct sched_gate_list *sched,
+				    const struct tc_mqprio_qopt *mqprio,
+				    struct tc_taprio_qopt_offload *taprio)
+{
+	struct sched_entry *entry;
+	int i = 0;
+
+	taprio->base_time = sched->base_time;
+	taprio->cycle_time = sched->cycle_time;
+	taprio->cycle_time_extension = sched->cycle_time_extension;
+
+	list_for_each_entry(entry, &sched->entries, list) {
+		struct tc_taprio_sched_entry *e = &taprio->entries[i];
+
+		e->command = entry->command;
+		e->interval = entry->interval;
+
+		/* We do this transformation because the NIC
+		 * has no knowledge of traffic classes, but it
+		 * knows about queues.
+		 */
+		e->gate_mask = tc_mask_to_queue_mask(mqprio, entry->gate_mask);
+		i++;
+	}
+
+	taprio->num_entries = i;
+}
+
+static enum hrtimer_restart next_sched(struct hrtimer *timer)
+{
+	struct taprio_sched *q = container_of(timer, struct taprio_sched,
+					      advance_timer);
+	struct sched_gate_list *oper, *admin;
+
+	spin_lock(&q->current_entry_lock);
+	oper = rcu_dereference_protected(q->oper_sched,
+					 lockdep_is_held(&q->current_entry_lock));
+	admin = rcu_dereference_protected(q->admin_sched,
+					  lockdep_is_held(&q->current_entry_lock));
+
+	rcu_assign_pointer(q->oper_sched, admin);
+	rcu_assign_pointer(q->admin_sched, NULL);
+
+	if (oper)
+		call_rcu(&oper->rcu, taprio_free_sched_cb);
+
+	spin_unlock(&q->current_entry_lock);
+
+	return HRTIMER_NORESTART;
+}
+
+static struct tc_taprio_qopt_offload *taprio_alloc(int num_entries)
+{
+	size_t size = sizeof(struct tc_taprio_sched_entry) * num_entries +
+		      sizeof(struct tc_taprio_qopt_offload);
+	struct tc_taprio_qopt_offload *taprio;
+
+	taprio = kzalloc(size, GFP_KERNEL);
+	if (!taprio)
+		return taprio;
+
+	refcount_set(&taprio->users, 1);
+
+	return taprio;
+}
+
+static int taprio_enable_offload(struct net_device *dev,
+				 struct tc_mqprio_qopt *mqprio,
+				 struct taprio_sched *q,
+				 struct sched_gate_list *sched,
+				 struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_taprio_qopt_offload *taprio;
+	int err = 0;
+
+	if (!ops->ndo_setup_tc) {
+		NL_SET_ERR_MSG(extack,
+			       "Device does not support taprio offload");
+		return -EOPNOTSUPP;
+	}
+
+	taprio = taprio_alloc(sched->num_entries);
+	if (!taprio) {
+		NL_SET_ERR_MSG(extack,
+			       "Not enough memory for enabling offload mode");
+		return -ENOMEM;
+	}
+	taprio->enable = 1;
+	taprio_sched_to_offload(q, sched, mqprio, taprio);
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, taprio);
+	if (err < 0) {
+		NL_SET_ERR_MSG(extack,
+			       "Device failed to setup taprio offload");
+		goto done;
+	}
+
+done:
+	taprio_free(taprio);
+
+	return err;
+}
+
+static int taprio_disable_offload(struct net_device *dev,
+				  struct taprio_sched *q,
+				  struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_taprio_qopt_offload *taprio;
+	int err;
+
+	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
+		return 0;
+
+	if (!ops->ndo_setup_tc)
+		return -EOPNOTSUPP;
+
+	taprio = taprio_alloc(0);
+	if (!taprio) {
+		NL_SET_ERR_MSG(extack,
+			       "Not enough memory to disable offload mode");
+		return -ENOMEM;
+	}
+	taprio->enable = 0;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, taprio);
+	if (err < 0)
+		goto out;
+
+out:
+	taprio_free(taprio);
+
+	return 0;
+}
+
 static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			 struct netlink_ext_ack *extack)
 {
@@ -1038,7 +1274,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		if (q->flags != 0 && q->flags != taprio_flags) {
 			NL_SET_ERR_MSG_MOD(extack, "Changing 'flags' of a running schedule is not supported");
 			return -EOPNOTSUPP;
-		} else if (!FLAGS_VALID(taprio_flags)) {
+		} else if (!taprio_flags_valid(taprio_flags)) {
 			NL_SET_ERR_MSG_MOD(extack, "Specified 'flags' are not valid");
 			return -EINVAL;
 		}
@@ -1102,6 +1338,13 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 
 	taprio_set_picos_per_byte(dev, q);
 
+	if (FULL_OFFLOAD_IS_ENABLED(taprio_flags))
+		err = taprio_enable_offload(dev, mqprio, q, new_admin, extack);
+	else
+		err = taprio_disable_offload(dev, q, extack);
+	if (err)
+		goto free_sched;
+
 	/* Protects against enqueue()/dequeue() */
 	spin_lock_bh(qdisc_lock(sch));
 
@@ -1153,6 +1396,26 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		goto unlock;
 	}
 
+	if (FULL_OFFLOAD_IS_ENABLED(taprio_flags)) {
+		q->dequeue = taprio_dequeue_offload;
+		q->peek = taprio_peek_offload;
+
+		/* This function will only serve to keep the pointers to the
+		 * "oper" and "admin" schedules valid in relation to their
+		 * base times, so when calling dump() the users looks at the
+		 * right schedules.
+		 */
+		q->advance_timer.function = next_sched;
+	} else {
+		/* Just to be sure to keep the function pointers in a
+		 * consistent state always.
+		 */
+		q->dequeue = taprio_dequeue_soft;
+		q->peek = taprio_peek_soft;
+
+		q->advance_timer.function = advance_sched;
+	}
+
 	err = taprio_get_start_time(sch, new_admin, &start);
 	if (err < 0) {
 		NL_SET_ERR_MSG(extack, "Internal error: failed get start time");
@@ -1172,7 +1435,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		rcu_assign_pointer(q->admin_sched, new_admin);
 		if (admin)
 			call_rcu(&admin->rcu, taprio_free_sched_cb);
-	} else {
+	} else if (!FULL_OFFLOAD_IS_ENABLED(taprio_flags)) {
 		setup_first_close_time(q, new_admin, start);
 
 		/* Protects against advance_sched() */
@@ -1212,6 +1475,8 @@ static void taprio_destroy(struct Qdisc *sch)
 
 	hrtimer_cancel(&q->advance_timer);
 
+	taprio_disable_offload(dev, q, NULL);
+
 	if (q->qdiscs) {
 		for (i = 0; i < dev->num_tx_queues && q->qdiscs[i]; i++)
 			qdisc_put(q->qdiscs[i]);
@@ -1241,6 +1506,9 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS);
 	q->advance_timer.function = advance_sched;
 
+	q->dequeue = taprio_dequeue_soft;
+	q->peek = taprio_peek_soft;
+
 	q->root = sch;
 
 	/* We only support static clockids. Use an invalid value as default
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (8 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 09/15] taprio: Add support for hardware offloading Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-04  7:50   ` Kurt Kanzenbach
  2019-09-02 16:25 ` [PATCH v1 net-next 11/15] net: dsa: sja1105: Add static config tables for scheduling Vladimir Oltean
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

DSA currently handles shared block filters (for the classifier-action
qdisc) in the core due to what I believe are simply pragmatic reasons -
hiding the complexity from drivers and offerring a simple API for port
mirroring.

Extend the dsa_slave_setup_tc function by passing all other qdisc
offloads to the driver layer, where the driver may choose what it
implements and how. DSA is simply a pass-through in this case.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- Removed the unused declaration of struct tc_taprio_qopt_offload.

 include/net/dsa.h |  2 ++
 net/dsa/slave.c   | 12 ++++++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 96acb14ec1a8..541fb514e31d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -515,6 +515,8 @@ struct dsa_switch_ops {
 				   bool ingress);
 	void	(*port_mirror_del)(struct dsa_switch *ds, int port,
 				   struct dsa_mall_mirror_tc_entry *mirror);
+	int	(*port_setup_tc)(struct dsa_switch *ds, int port,
+				 enum tc_setup_type type, void *type_data);
 
 	/*
 	 * Cross-chip operations
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9a88035517a6..75d58229a4bd 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1035,12 +1035,16 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			      void *type_data)
 {
-	switch (type) {
-	case TC_SETUP_BLOCK:
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+
+	if (type == TC_SETUP_BLOCK)
 		return dsa_slave_setup_tc_block(dev, type_data);
-	default:
+
+	if (!ds->ops->port_setup_tc)
 		return -EOPNOTSUPP;
-	}
+
+	return ds->ops->port_setup_tc(ds, dp->index, type, type_data);
 }
 
 static void dsa_slave_get_stats64(struct net_device *dev,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 11/15] net: dsa: sja1105: Add static config tables for scheduling
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (9 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload Vladimir Oltean
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

In order to support tc-taprio offload, the TTEthernet egress scheduling
core registers must be made visible through the static interface.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   8 +
 .../net/dsa/sja1105/sja1105_static_config.c   | 167 ++++++++++++++++++
 .../net/dsa/sja1105/sja1105_static_config.h   |  48 ++++-
 3 files changed, 222 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 9988c9d18567..91da430045ff 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -488,6 +488,8 @@ sja1105et_general_params_entry_packing(void *buf, void *entry_ptr,
 
 /* SJA1105E/T: First generation */
 struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = {
+	[BLK_IDX_SCHEDULE] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {0},
 	[BLK_IDX_L2_LOOKUP] = {
 		.entry_packing = sja1105et_dyn_l2_lookup_entry_packing,
 		.cmd_packing = sja1105et_l2_lookup_cmd_packing,
@@ -529,6 +531,8 @@ struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = {
 		.packed_size = SJA1105ET_SIZE_MAC_CONFIG_DYN_CMD,
 		.addr = 0x36,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {0},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.entry_packing = sja1105et_l2_lookup_params_entry_packing,
 		.cmd_packing = sja1105et_l2_lookup_params_cmd_packing,
@@ -552,6 +556,8 @@ struct sja1105_dynamic_table_ops sja1105et_dyn_ops[BLK_IDX_MAX_DYN] = {
 
 /* SJA1105P/Q/R/S: Second generation */
 struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
+	[BLK_IDX_SCHEDULE] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {0},
 	[BLK_IDX_L2_LOOKUP] = {
 		.entry_packing = sja1105pqrs_dyn_l2_lookup_entry_packing,
 		.cmd_packing = sja1105pqrs_l2_lookup_cmd_packing,
@@ -593,6 +599,8 @@ struct sja1105_dynamic_table_ops sja1105pqrs_dyn_ops[BLK_IDX_MAX_DYN] = {
 		.packed_size = SJA1105PQRS_SIZE_MAC_CONFIG_DYN_CMD,
 		.addr = 0x4B,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {0},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.entry_packing = sja1105et_l2_lookup_params_entry_packing,
 		.cmd_packing = sja1105et_l2_lookup_params_cmd_packing,
diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.c b/drivers/net/dsa/sja1105/sja1105_static_config.c
index b31c737dc560..0d03e13e9909 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.c
@@ -371,6 +371,63 @@ size_t sja1105pqrs_mac_config_entry_packing(void *buf, void *entry_ptr,
 	return size;
 }
 
+static size_t
+sja1105_schedule_entry_points_params_entry_packing(void *buf, void *entry_ptr,
+						   enum packing_op op)
+{
+	struct sja1105_schedule_entry_points_params_entry *entry = entry_ptr;
+	const size_t size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_PARAMS_ENTRY;
+
+	sja1105_packing(buf, &entry->clksrc,    31, 30, size, op);
+	sja1105_packing(buf, &entry->actsubsch, 29, 27, size, op);
+	return size;
+}
+
+static size_t
+sja1105_schedule_entry_points_entry_packing(void *buf, void *entry_ptr,
+					    enum packing_op op)
+{
+	struct sja1105_schedule_entry_points_entry *entry = entry_ptr;
+	const size_t size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_ENTRY;
+
+	sja1105_packing(buf, &entry->subschindx, 31, 29, size, op);
+	sja1105_packing(buf, &entry->delta,      28, 11, size, op);
+	sja1105_packing(buf, &entry->address,    10, 1,  size, op);
+	return size;
+}
+
+static size_t sja1105_schedule_params_entry_packing(void *buf, void *entry_ptr,
+						    enum packing_op op)
+{
+	const size_t size = SJA1105_SIZE_SCHEDULE_PARAMS_ENTRY;
+	struct sja1105_schedule_params_entry *entry = entry_ptr;
+	int offset, i;
+
+	for (i = 0, offset = 16; i < 8; i++, offset += 10)
+		sja1105_packing(buf, &entry->subscheind[i],
+				offset + 9, offset + 0, size, op);
+	return size;
+}
+
+static size_t sja1105_schedule_entry_packing(void *buf, void *entry_ptr,
+					     enum packing_op op)
+{
+	const size_t size = SJA1105_SIZE_SCHEDULE_ENTRY;
+	struct sja1105_schedule_entry *entry = entry_ptr;
+
+	sja1105_packing(buf, &entry->winstindex,  63, 54, size, op);
+	sja1105_packing(buf, &entry->winend,      53, 53, size, op);
+	sja1105_packing(buf, &entry->winst,       52, 52, size, op);
+	sja1105_packing(buf, &entry->destports,   51, 47, size, op);
+	sja1105_packing(buf, &entry->setvalid,    46, 46, size, op);
+	sja1105_packing(buf, &entry->txen,        45, 45, size, op);
+	sja1105_packing(buf, &entry->resmedia_en, 44, 44, size, op);
+	sja1105_packing(buf, &entry->resmedia,    43, 36, size, op);
+	sja1105_packing(buf, &entry->vlindex,     35, 26, size, op);
+	sja1105_packing(buf, &entry->delta,       25, 8,  size, op);
+	return size;
+}
+
 size_t sja1105_vlan_lookup_entry_packing(void *buf, void *entry_ptr,
 					 enum packing_op op)
 {
@@ -447,11 +504,15 @@ static void sja1105_table_write_crc(u8 *table_start, u8 *crc_ptr)
  * before blindly indexing kernel memory with the blk_idx.
  */
 static u64 blk_id_map[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = BLKID_SCHEDULE,
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = BLKID_SCHEDULE_ENTRY_POINTS,
 	[BLK_IDX_L2_LOOKUP] = BLKID_L2_LOOKUP,
 	[BLK_IDX_L2_POLICING] = BLKID_L2_POLICING,
 	[BLK_IDX_VLAN_LOOKUP] = BLKID_VLAN_LOOKUP,
 	[BLK_IDX_L2_FORWARDING] = BLKID_L2_FORWARDING,
 	[BLK_IDX_MAC_CONFIG] = BLKID_MAC_CONFIG,
+	[BLK_IDX_SCHEDULE_PARAMS] = BLKID_SCHEDULE_PARAMS,
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = BLKID_SCHEDULE_ENTRY_POINTS_PARAMS,
 	[BLK_IDX_L2_LOOKUP_PARAMS] = BLKID_L2_LOOKUP_PARAMS,
 	[BLK_IDX_L2_FORWARDING_PARAMS] = BLKID_L2_FORWARDING_PARAMS,
 	[BLK_IDX_AVB_PARAMS] = BLKID_AVB_PARAMS,
@@ -461,6 +522,13 @@ static u64 blk_id_map[BLK_IDX_MAX] = {
 
 const char *sja1105_static_config_error_msg[] = {
 	[SJA1105_CONFIG_OK] = "",
+	[SJA1105_TTETHERNET_NOT_SUPPORTED] =
+		"schedule-table present, but TTEthernet is "
+		"only supported on T and Q/S",
+	[SJA1105_INCORRECT_TTETHERNET_CONFIGURATION] =
+		"schedule-table present, but one of "
+		"schedule-entry-points-table, schedule-parameters-table or "
+		"schedule-entry-points-parameters table is empty",
 	[SJA1105_MISSING_L2_POLICING_TABLE] =
 		"l2-policing-table needs to have at least one entry",
 	[SJA1105_MISSING_L2_FORWARDING_TABLE] =
@@ -508,6 +576,21 @@ sja1105_static_config_check_valid(const struct sja1105_static_config *config)
 #define IS_FULL(blk_idx) \
 	(tables[blk_idx].entry_count == tables[blk_idx].ops->max_entry_count)
 
+	if (tables[BLK_IDX_SCHEDULE].entry_count) {
+		if (config->device_id != SJA1105T_DEVICE_ID &&
+		    config->device_id != SJA1105QS_DEVICE_ID)
+			return SJA1105_TTETHERNET_NOT_SUPPORTED;
+
+		if (tables[BLK_IDX_SCHEDULE_ENTRY_POINTS].entry_count == 0)
+			return SJA1105_INCORRECT_TTETHERNET_CONFIGURATION;
+
+		if (!IS_FULL(BLK_IDX_SCHEDULE_PARAMS))
+			return SJA1105_INCORRECT_TTETHERNET_CONFIGURATION;
+
+		if (!IS_FULL(BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS))
+			return SJA1105_INCORRECT_TTETHERNET_CONFIGURATION;
+	}
+
 	if (tables[BLK_IDX_L2_POLICING].entry_count == 0)
 		return SJA1105_MISSING_L2_POLICING_TABLE;
 
@@ -614,6 +697,8 @@ sja1105_static_config_get_length(const struct sja1105_static_config *config)
 
 /* SJA1105E: First generation, no TTEthernet */
 struct sja1105_table_ops sja1105e_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {0},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105et_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -644,6 +729,8 @@ struct sja1105_table_ops sja1105e_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105ET_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {0},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105et_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
@@ -678,6 +765,18 @@ struct sja1105_table_ops sja1105e_table_ops[BLK_IDX_MAX] = {
 
 /* SJA1105T: First generation, TTEthernet */
 struct sja1105_table_ops sja1105t_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {
+		.packing = sja1105_schedule_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {
+		.packing = sja1105_schedule_entry_points_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105et_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -708,6 +807,18 @@ struct sja1105_table_ops sja1105t_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105ET_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {
+		.packing = sja1105_schedule_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_PARAMS_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {
+		.packing = sja1105_schedule_entry_points_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105et_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
@@ -742,6 +853,8 @@ struct sja1105_table_ops sja1105t_table_ops[BLK_IDX_MAX] = {
 
 /* SJA1105P: Second generation, no TTEthernet, no SGMII */
 struct sja1105_table_ops sja1105p_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {0},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105pqrs_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -772,6 +885,8 @@ struct sja1105_table_ops sja1105p_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105PQRS_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {0},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105pqrs_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
@@ -806,6 +921,18 @@ struct sja1105_table_ops sja1105p_table_ops[BLK_IDX_MAX] = {
 
 /* SJA1105Q: Second generation, TTEthernet, no SGMII */
 struct sja1105_table_ops sja1105q_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {
+		.packing = sja1105_schedule_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {
+		.packing = sja1105_schedule_entry_points_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105pqrs_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -836,6 +963,18 @@ struct sja1105_table_ops sja1105q_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105PQRS_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {
+		.packing = sja1105_schedule_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_PARAMS_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {
+		.packing = sja1105_schedule_entry_points_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105pqrs_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
@@ -870,6 +1009,8 @@ struct sja1105_table_ops sja1105q_table_ops[BLK_IDX_MAX] = {
 
 /* SJA1105R: Second generation, no TTEthernet, SGMII */
 struct sja1105_table_ops sja1105r_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {0},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105pqrs_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -900,6 +1041,8 @@ struct sja1105_table_ops sja1105r_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105PQRS_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {0},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {0},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105pqrs_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
@@ -934,6 +1077,18 @@ struct sja1105_table_ops sja1105r_table_ops[BLK_IDX_MAX] = {
 
 /* SJA1105S: Second generation, TTEthernet, SGMII */
 struct sja1105_table_ops sja1105s_table_ops[BLK_IDX_MAX] = {
+	[BLK_IDX_SCHEDULE] = {
+		.packing = sja1105_schedule_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS] = {
+		.packing = sja1105_schedule_entry_points_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP] = {
 		.packing = sja1105pqrs_l2_lookup_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_entry),
@@ -964,6 +1119,18 @@ struct sja1105_table_ops sja1105s_table_ops[BLK_IDX_MAX] = {
 		.packed_entry_size = SJA1105PQRS_SIZE_MAC_CONFIG_ENTRY,
 		.max_entry_count = SJA1105_MAX_MAC_CONFIG_COUNT,
 	},
+	[BLK_IDX_SCHEDULE_PARAMS] = {
+		.packing = sja1105_schedule_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_PARAMS_COUNT,
+	},
+	[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS] = {
+		.packing = sja1105_schedule_entry_points_params_entry_packing,
+		.unpacked_entry_size = sizeof(struct sja1105_schedule_entry_points_params_entry),
+		.packed_entry_size = SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_PARAMS_ENTRY,
+		.max_entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
+	},
 	[BLK_IDX_L2_LOOKUP_PARAMS] = {
 		.packing = sja1105pqrs_l2_lookup_params_entry_packing,
 		.unpacked_entry_size = sizeof(struct sja1105_l2_lookup_params_entry),
diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.h b/drivers/net/dsa/sja1105/sja1105_static_config.h
index 684465fc0882..7f87022a2d61 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.h
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.h
@@ -11,11 +11,15 @@
 
 #define SJA1105_SIZE_DEVICE_ID				4
 #define SJA1105_SIZE_TABLE_HEADER			12
+#define SJA1105_SIZE_SCHEDULE_ENTRY			8
+#define SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_ENTRY	4
 #define SJA1105_SIZE_L2_POLICING_ENTRY			8
 #define SJA1105_SIZE_VLAN_LOOKUP_ENTRY			8
 #define SJA1105_SIZE_L2_FORWARDING_ENTRY		8
 #define SJA1105_SIZE_L2_FORWARDING_PARAMS_ENTRY		12
 #define SJA1105_SIZE_XMII_PARAMS_ENTRY			4
+#define SJA1105_SIZE_SCHEDULE_PARAMS_ENTRY		12
+#define SJA1105_SIZE_SCHEDULE_ENTRY_POINTS_PARAMS_ENTRY	4
 #define SJA1105ET_SIZE_L2_LOOKUP_ENTRY			12
 #define SJA1105ET_SIZE_MAC_CONFIG_ENTRY			28
 #define SJA1105ET_SIZE_L2_LOOKUP_PARAMS_ENTRY		4
@@ -29,11 +33,15 @@
 
 /* UM10944.pdf Page 11, Table 2. Configuration Blocks */
 enum {
+	BLKID_SCHEDULE					= 0x00,
+	BLKID_SCHEDULE_ENTRY_POINTS			= 0x01,
 	BLKID_L2_LOOKUP					= 0x05,
 	BLKID_L2_POLICING				= 0x06,
 	BLKID_VLAN_LOOKUP				= 0x07,
 	BLKID_L2_FORWARDING				= 0x08,
 	BLKID_MAC_CONFIG				= 0x09,
+	BLKID_SCHEDULE_PARAMS				= 0x0A,
+	BLKID_SCHEDULE_ENTRY_POINTS_PARAMS		= 0x0B,
 	BLKID_L2_LOOKUP_PARAMS				= 0x0D,
 	BLKID_L2_FORWARDING_PARAMS			= 0x0E,
 	BLKID_AVB_PARAMS				= 0x10,
@@ -42,11 +50,15 @@ enum {
 };
 
 enum sja1105_blk_idx {
-	BLK_IDX_L2_LOOKUP = 0,
+	BLK_IDX_SCHEDULE = 0,
+	BLK_IDX_SCHEDULE_ENTRY_POINTS,
+	BLK_IDX_L2_LOOKUP,
 	BLK_IDX_L2_POLICING,
 	BLK_IDX_VLAN_LOOKUP,
 	BLK_IDX_L2_FORWARDING,
 	BLK_IDX_MAC_CONFIG,
+	BLK_IDX_SCHEDULE_PARAMS,
+	BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS,
 	BLK_IDX_L2_LOOKUP_PARAMS,
 	BLK_IDX_L2_FORWARDING_PARAMS,
 	BLK_IDX_AVB_PARAMS,
@@ -59,11 +71,15 @@ enum sja1105_blk_idx {
 	BLK_IDX_INVAL = -1,
 };
 
+#define SJA1105_MAX_SCHEDULE_COUNT			1024
+#define SJA1105_MAX_SCHEDULE_ENTRY_POINTS_COUNT		2048
 #define SJA1105_MAX_L2_LOOKUP_COUNT			1024
 #define SJA1105_MAX_L2_POLICING_COUNT			45
 #define SJA1105_MAX_VLAN_LOOKUP_COUNT			4096
 #define SJA1105_MAX_L2_FORWARDING_COUNT			13
 #define SJA1105_MAX_MAC_CONFIG_COUNT			5
+#define SJA1105_MAX_SCHEDULE_PARAMS_COUNT		1
+#define SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT	1
 #define SJA1105_MAX_L2_LOOKUP_PARAMS_COUNT		1
 #define SJA1105_MAX_L2_FORWARDING_PARAMS_COUNT		1
 #define SJA1105_MAX_GENERAL_PARAMS_COUNT		1
@@ -83,6 +99,23 @@ enum sja1105_blk_idx {
 #define SJA1105R_PART_NO				0x9A86
 #define SJA1105S_PART_NO				0x9A87
 
+struct sja1105_schedule_entry {
+	u64 winstindex;
+	u64 winend;
+	u64 winst;
+	u64 destports;
+	u64 setvalid;
+	u64 txen;
+	u64 resmedia_en;
+	u64 resmedia;
+	u64 vlindex;
+	u64 delta;
+};
+
+struct sja1105_schedule_params_entry {
+	u64 subscheind[8];
+};
+
 struct sja1105_general_params_entry {
 	u64 vllupformat;
 	u64 mirr_ptacu;
@@ -112,6 +145,17 @@ struct sja1105_general_params_entry {
 	u64 replay_port;
 };
 
+struct sja1105_schedule_entry_points_entry {
+	u64 subschindx;
+	u64 delta;
+	u64 address;
+};
+
+struct sja1105_schedule_entry_points_params_entry {
+	u64 clksrc;
+	u64 actsubsch;
+};
+
 struct sja1105_vlan_lookup_entry {
 	u64 ving_mirr;
 	u64 vegr_mirr;
@@ -256,6 +300,8 @@ sja1105_static_config_get_length(const struct sja1105_static_config *config);
 
 typedef enum {
 	SJA1105_CONFIG_OK = 0,
+	SJA1105_TTETHERNET_NOT_SUPPORTED,
+	SJA1105_INCORRECT_TTETHERNET_CONFIGURATION,
 	SJA1105_MISSING_L2_POLICING_TABLE,
 	SJA1105_MISSING_L2_FORWARDING_TABLE,
 	SJA1105_MISSING_L2_FORWARDING_PARAMS_TABLE,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (10 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 11/15] net: dsa: sja1105: Add static config tables for scheduling Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-11 19:45   ` Vinicius Costa Gomes
  2019-09-02 16:25 ` [PATCH v1 net-next 13/15] net: dsa: sja1105: Make HOSTPRIO a kernel config Vladimir Oltean
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

This qdisc offload is the closest thing to what the SJA1105 supports in
hardware for time-based egress shaping. The switch core really is built
around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
operate similarly to IEEE 802.1Qbv with some constraints:

- The gate control list is a global list for all ports. There are 8
  execution threads that iterate through this global list in parallel.
  I don't know why 8, there are only 4 front-panel ports.

- Care must be taken by the user to make sure that two execution threads
  never get to execute a GCL entry simultaneously. I created a O(n^4)
  checker for this hardware limitation, prior to accepting a taprio
  offload configuration as valid.

- The spec says that if a GCL entry's interval is shorter than the frame
  length, you shouldn't send it (and end up in head-of-line blocking).
  Well, this switch does anyway.

- The switch has no concept of ADMIN and OPER configurations. Because
  it's so simple, the TAS settings are loaded through the static config
  tables interface, so there isn't even place for any discussion about
  'graceful switchover between ADMIN and OPER'. You just reset the
  switch and upload a new OPER config.

- The switch accepts multiple time sources for the gate events. Right
  now I am using the standalone clock source as opposed to PTP. So the
  base time parameter doesn't really do much. Support for the PTP clock
  source will be added in the next patch.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- Removed the sja1105_tas_config_work workqueue.
- Allocating memory with GFP_KERNEL.
- Made the ASCII art drawing fit in < 80 characters.
- Made most of the time-holding variables s64 instead of u64 (for fear
  of them not holding the result of signed arithmetics properly).

 drivers/net/dsa/sja1105/Kconfig        |   8 +
 drivers/net/dsa/sja1105/Makefile       |   4 +
 drivers/net/dsa/sja1105/sja1105.h      |   5 +
 drivers/net/dsa/sja1105/sja1105_main.c |  19 +-
 drivers/net/dsa/sja1105/sja1105_tas.c  | 420 +++++++++++++++++++++++++
 drivers/net/dsa/sja1105/sja1105_tas.h  |  42 +++
 6 files changed, 497 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h

diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
index 770134a66e48..55424f39cb0d 100644
--- a/drivers/net/dsa/sja1105/Kconfig
+++ b/drivers/net/dsa/sja1105/Kconfig
@@ -23,3 +23,11 @@ config NET_DSA_SJA1105_PTP
 	help
 	  This enables support for timestamping and PTP clock manipulations in
 	  the SJA1105 DSA driver.
+
+config NET_DSA_SJA1105_TAS
+	bool "Support for the Time-Aware Scheduler on NXP SJA1105"
+	depends on NET_DSA_SJA1105
+	help
+	  This enables support for the TTEthernet-based egress scheduling
+	  engine in the SJA1105 DSA driver, which is controlled using a
+	  hardware offload of the tc-tqprio qdisc.
diff --git a/drivers/net/dsa/sja1105/Makefile b/drivers/net/dsa/sja1105/Makefile
index 4483113e6259..66161e874344 100644
--- a/drivers/net/dsa/sja1105/Makefile
+++ b/drivers/net/dsa/sja1105/Makefile
@@ -12,3 +12,7 @@ sja1105-objs := \
 ifdef CONFIG_NET_DSA_SJA1105_PTP
 sja1105-objs += sja1105_ptp.o
 endif
+
+ifdef CONFIG_NET_DSA_SJA1105_TAS
+sja1105-objs += sja1105_tas.o
+endif
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index 3ca0b87aa3e4..d95f9ce3b4f9 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -21,6 +21,7 @@
 #define SJA1105_AGEING_TIME_MS(ms)	((ms) / 10)
 
 #include "sja1105_ptp.h"
+#include "sja1105_tas.h"
 
 /* Keeps the different addresses between E/T and P/Q/R/S */
 struct sja1105_regs {
@@ -96,6 +97,7 @@ struct sja1105_private {
 	struct mutex mgmt_lock;
 	struct sja1105_tagger_data tagger_data;
 	struct sja1105_ptp_data ptp_data;
+	struct sja1105_tas_data tas_data;
 };
 
 #include "sja1105_dynamic_config.h"
@@ -111,6 +113,9 @@ typedef enum {
 	SPI_WRITE = 1,
 } sja1105_spi_rw_mode_t;
 
+/* From sja1105_main.c */
+int sja1105_static_config_reload(struct sja1105_private *priv);
+
 /* From sja1105_spi.c */
 int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 				sja1105_spi_rw_mode_t rw, u64 reg_addr,
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 8b930cc2dabc..4b393782cc84 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -22,6 +22,7 @@
 #include <linux/if_ether.h>
 #include <linux/dsa/8021q.h>
 #include "sja1105.h"
+#include "sja1105_tas.h"
 
 static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int pulse_len,
 			     unsigned int startup_delay)
@@ -1382,7 +1383,7 @@ static void sja1105_bridge_leave(struct dsa_switch *ds, int port,
  * modify at runtime (currently only MAC) and restore them after uploading,
  * such that this operation is relatively seamless.
  */
-static int sja1105_static_config_reload(struct sja1105_private *priv)
+int sja1105_static_config_reload(struct sja1105_private *priv)
 {
 	struct ptp_system_timestamp ptp_sts_before;
 	struct ptp_system_timestamp ptp_sts_after;
@@ -1761,6 +1762,7 @@ static void sja1105_teardown(struct dsa_switch *ds)
 {
 	struct sja1105_private *priv = ds->priv;
 
+	sja1105_tas_teardown(priv);
 	cancel_work_sync(&priv->tagger_data.rxtstamp_work);
 	skb_queue_purge(&priv->tagger_data.skb_rxtstamp_queue);
 	sja1105_ptp_clock_unregister(priv);
@@ -2088,6 +2090,18 @@ static bool sja1105_port_txtstamp(struct dsa_switch *ds, int port,
 	return true;
 }
 
+static int sja1105_port_setup_tc(struct dsa_switch *ds, int port,
+				 enum tc_setup_type type,
+				 void *type_data)
+{
+	switch (type) {
+	case TC_SETUP_QDISC_TAPRIO:
+		return sja1105_setup_tc_taprio(ds, port, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static const struct dsa_switch_ops sja1105_switch_ops = {
 	.get_tag_protocol	= sja1105_get_tag_protocol,
 	.setup			= sja1105_setup,
@@ -2120,6 +2134,7 @@ static const struct dsa_switch_ops sja1105_switch_ops = {
 	.port_hwtstamp_set	= sja1105_hwtstamp_set,
 	.port_rxtstamp		= sja1105_port_rxtstamp,
 	.port_txtstamp		= sja1105_port_txtstamp,
+	.port_setup_tc		= sja1105_port_setup_tc,
 };
 
 static int sja1105_check_device_id(struct sja1105_private *priv)
@@ -2229,6 +2244,8 @@ static int sja1105_probe(struct spi_device *spi)
 	}
 	mutex_init(&priv->mgmt_lock);
 
+	sja1105_tas_setup(priv);
+
 	return dsa_register_switch(priv->ds);
 }
 
diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
new file mode 100644
index 000000000000..769e1d8e5e8f
--- /dev/null
+++ b/drivers/net/dsa/sja1105/sja1105_tas.c
@@ -0,0 +1,420 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
+ */
+#include "sja1105.h"
+
+#define SJA1105_TAS_CLKSRC_DISABLED	0
+#define SJA1105_TAS_CLKSRC_STANDALONE	1
+#define SJA1105_TAS_CLKSRC_AS6802	2
+#define SJA1105_TAS_CLKSRC_PTP		3
+#define SJA1105_GATE_MASK		GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
+#define SJA1105_TAS_MAX_DELTA		BIT(19)
+
+/* This is not a preprocessor macro because the "ns" argument may or may not be
+ * s64 at caller side. This ensures it is properly type-cast before div_s64.
+ */
+static s64 ns_to_sja1105_delta(s64 ns)
+{
+	return div_s64(ns, 200);
+}
+
+/* Lo and behold: the egress scheduler from hell.
+ *
+ * At the hardware level, the Time-Aware Shaper holds a global linear arrray of
+ * all schedule entries for all ports. These are the Gate Control List (GCL)
+ * entries, let's call them "timeslots" for short. This linear array of
+ * timeslots is held in BLK_IDX_SCHEDULE.
+ *
+ * Then there are a maximum of 8 "execution threads" inside the switch, which
+ * iterate cyclically through the "schedule". Each "cycle" has an entry point
+ * and an exit point, both being timeslot indices in the schedule table. The
+ * hardware calls each cycle a "subschedule".
+ *
+ * Subschedule (cycle) i starts when
+ *   ptpclkval >= ptpschtm + BLK_IDX_SCHEDULE_ENTRY_POINTS[i].delta.
+ *
+ * The hardware scheduler iterates BLK_IDX_SCHEDULE with a k ranging from
+ *   k = BLK_IDX_SCHEDULE_ENTRY_POINTS[i].address to
+ *   k = BLK_IDX_SCHEDULE_PARAMS.subscheind[i]
+ *
+ * For each schedule entry (timeslot) k, the engine executes the gate control
+ * list entry for the duration of BLK_IDX_SCHEDULE[k].delta.
+ *
+ *         +---------+
+ *         |         | BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS
+ *         +---------+
+ *              |
+ *              +-----------------+
+ *                                | .actsubsch
+ *  BLK_IDX_SCHEDULE_ENTRY_POINTS v
+ *                 +-------+-------+
+ *                 |cycle 0|cycle 1|
+ *                 +-------+-------+
+ *                   |  |      |  |
+ *  +----------------+  |      |  +-------------------------------------+
+ *  |   .subschindx     |      |             .subschindx                |
+ *  |                   |      +---------------+                        |
+ *  |          .address |        .address      |                        |
+ *  |                   |                      |                        |
+ *  |                   |                      |                        |
+ *  |  BLK_IDX_SCHEDULE v                      v                        |
+ *  |              +-------+-------+-------+-------+-------+------+     |
+ *  |              |entry 0|entry 1|entry 2|entry 3|entry 4|entry5|     |
+ *  |              +-------+-------+-------+-------+-------+------+     |
+ *  |                                  ^                    ^  ^  ^     |
+ *  |                                  |                    |  |  |     |
+ *  |        +-------------------------+                    |  |  |     |
+ *  |        |              +-------------------------------+  |  |     |
+ *  |        |              |              +-------------------+  |     |
+ *  |        |              |              |                      |     |
+ *  | +---------------------------------------------------------------+ |
+ *  | |subscheind[0]<=subscheind[1]<=subscheind[2]<=...<=subscheind[7]| |
+ *  | +---------------------------------------------------------------+ |
+ *  |        ^              ^                BLK_IDX_SCHEDULE_PARAMS    |
+ *  |        |              |                                           |
+ *  +--------+              +-------------------------------------------+
+ *
+ *  In the above picture there are two subschedules (cycles):
+ *
+ *  - cycle 0: iterates the schedule table from 0 to 2 (and back)
+ *  - cycle 1: iterates the schedule table from 3 to 5 (and back)
+ *
+ *  All other possible execution threads must be marked as unused by making
+ *  their "subschedule end index" (subscheind) equal to the last valid
+ *  subschedule's end index (in this case 5).
+ */
+static int sja1105_init_scheduling(struct sja1105_private *priv)
+{
+	struct sja1105_schedule_entry_points_entry *schedule_entry_points;
+	struct sja1105_schedule_entry_points_params_entry
+					*schedule_entry_points_params;
+	struct sja1105_schedule_params_entry *schedule_params;
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	struct sja1105_schedule_entry *schedule;
+	struct sja1105_table *table;
+	int subscheind[8] = {0};
+	int schedule_start_idx;
+	s64 entry_point_delta;
+	int schedule_end_idx;
+	int num_entries = 0;
+	int num_cycles = 0;
+	int cycle = 0;
+	int i, k = 0;
+	int port;
+
+	/* Discard previous Schedule Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
+	if (table->entry_count) {
+		kfree(table->entries);
+		table->entry_count = 0;
+	}
+
+	/* Discard previous Schedule Entry Points Parameters Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
+	if (table->entry_count) {
+		kfree(table->entries);
+		table->entry_count = 0;
+	}
+
+	/* Discard previous Schedule Parameters Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
+	if (table->entry_count) {
+		kfree(table->entries);
+		table->entry_count = 0;
+	}
+
+	/* Discard previous Schedule Entry Points Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
+	if (table->entry_count) {
+		kfree(table->entries);
+		table->entry_count = 0;
+	}
+
+	/* Figure out the dimensioning of the problem */
+	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
+		if (tas_data->config[port]) {
+			num_entries += tas_data->config[port]->num_entries;
+			num_cycles++;
+		}
+	}
+
+	/* Nothing to do */
+	if (!num_cycles)
+		return 0;
+
+	/* Pre-allocate space in the static config tables */
+
+	/* Schedule Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
+	table->entries = kcalloc(num_entries, table->ops->unpacked_entry_size,
+				 GFP_KERNEL);
+	if (!table->entries)
+		return -ENOMEM;
+	table->entry_count = num_entries;
+	schedule = table->entries;
+
+	/* Schedule Points Parameters Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
+	table->entries = kcalloc(SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
+				 table->ops->unpacked_entry_size, GFP_KERNEL);
+	if (!table->entries)
+		return -ENOMEM;
+	table->entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT;
+	schedule_entry_points_params = table->entries;
+
+	/* Schedule Parameters Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
+	table->entries = kcalloc(SJA1105_MAX_SCHEDULE_PARAMS_COUNT,
+				 table->ops->unpacked_entry_size, GFP_KERNEL);
+	if (!table->entries)
+		return -ENOMEM;
+	table->entry_count = SJA1105_MAX_SCHEDULE_PARAMS_COUNT;
+	schedule_params = table->entries;
+
+	/* Schedule Entry Points Table */
+	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
+	table->entries = kcalloc(num_cycles, table->ops->unpacked_entry_size,
+				 GFP_KERNEL);
+	if (!table->entries)
+		return -ENOMEM;
+	table->entry_count = num_cycles;
+	schedule_entry_points = table->entries;
+
+	/* Finally start populating the static config tables */
+	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_STANDALONE;
+	schedule_entry_points_params->actsubsch = num_cycles - 1;
+
+	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
+		const struct tc_taprio_qopt_offload *tas_config;
+
+		tas_config = tas_data->config[port];
+		if (!tas_config)
+			continue;
+
+		schedule_start_idx = k;
+		schedule_end_idx = k + tas_config->num_entries - 1;
+		/* TODO this is only a relative base time for the subschedule
+		 * (relative to PTPSCHTM). But as we're using standalone and
+		 * not PTP clock as time reference, leave it like this for now.
+		 * Later we'll have to enforce that all ports' base times are
+		 * within SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
+		 */
+		entry_point_delta = ns_to_sja1105_delta(tas_config->base_time);
+
+		schedule_entry_points[cycle].subschindx = cycle;
+		schedule_entry_points[cycle].delta = entry_point_delta;
+		schedule_entry_points[cycle].address = schedule_start_idx;
+
+		for (i = cycle; i < 8; i++)
+			subscheind[i] = schedule_end_idx;
+
+		for (i = 0; i < tas_config->num_entries; i++, k++) {
+			s64 delta_ns = tas_config->entries[i].interval;
+
+			schedule[k].delta = ns_to_sja1105_delta(delta_ns);
+			schedule[k].destports = BIT(port);
+			schedule[k].resmedia_en = true;
+			schedule[k].resmedia = SJA1105_GATE_MASK &
+					~tas_config->entries[i].gate_mask;
+		}
+		cycle++;
+	}
+
+	for (i = 0; i < 8; i++)
+		schedule_params->subscheind[i] = subscheind[i];
+
+	return 0;
+}
+
+/* Be there 2 port subschedules, each executing an arbitrary number of gate
+ * open/close events cyclically.
+ * None of those gate events must ever occur at the exact same time, otherwise
+ * the switch is known to act in exotically strange ways.
+ * However the hardware doesn't bother performing these integrity checks - the
+ * designers probably said "nah, let's leave that to the experts" - oh well,
+ * now we're the experts.
+ * So here we are with the task of validating whether the new @qopt has any
+ * conflict with the already established TAS configuration in tas_data->config.
+ * We already know the other ports are in harmony with one another, otherwise
+ * we wouldn't have saved them.
+ * Each gate event executes periodically, with a period of @cycle_time and a
+ * phase given by its cycle's @base_time plus its offset within the cycle
+ * (which in turn is given by the length of the events prior to it).
+ * There are two aspects to possible collisions:
+ * - Collisions within one cycle's (actually the longest cycle's) time frame.
+ *   For that, we need to compare the cartesian product of each possible
+ *   occurrence of each event within one cycle time.
+ * - Collisions in the future. Events may not collide within one cycle time,
+ *   but if two port schedules don't have the same periodicity (aka the cycle
+ *   times aren't multiples of one another), they surely will some time in the
+ *   future (actually they will collide an infinite amount of times).
+ */
+static bool
+sja1105_tas_check_conflicts(struct sja1105_private *priv,
+			    const struct tc_taprio_qopt_offload *qopt)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	int port;
+
+	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
+		const struct tc_taprio_qopt_offload *tas_config;
+		s64 max_cycle_time, min_cycle_time;
+		s64 delta1, delta2;
+		s64 rbt1, rbt2;
+		s64 stop_time;
+		s64 t1, t2;
+		int i, j;
+		s32 rem;
+
+		tas_config = tas_data->config[port];
+
+		if (!tas_config)
+			continue;
+
+		/* Check if the two cycle times are multiples of one another.
+		 * If they aren't, then they will surely collide.
+		 */
+		max_cycle_time = max(tas_config->cycle_time, qopt->cycle_time);
+		min_cycle_time = min(tas_config->cycle_time, qopt->cycle_time);
+		div_s64_rem(max_cycle_time, min_cycle_time, &rem);
+		if (rem)
+			return true;
+
+		/* Calculate the "reduced" base time of each of the two cycles
+		 * (transposed back as close to 0 as possible) by dividing to
+		 * the cycle time.
+		 */
+		div_s64_rem(tas_config->base_time, tas_config->cycle_time,
+			    &rem);
+		rbt1 = rem;
+
+		div_s64_rem(qopt->base_time, qopt->cycle_time, &rem);
+		rbt2 = rem;
+
+		stop_time = max_cycle_time + max(rbt1, rbt2);
+
+		/* delta1 is the relative base time of each GCL entry within
+		 * the established ports' TAS config.
+		 */
+		for (i = 0, delta1 = 0;
+		     i < tas_config->num_entries;
+		     delta1 += tas_config->entries[i].interval, i++) {
+
+			/* delta2 is the relative base time of each GCL entry
+			 * within the newly added TAS config.
+			 */
+			for (j = 0, delta2 = 0;
+			     j < qopt->num_entries;
+			     delta2 += qopt->entries[j].interval, j++) {
+
+				/* t1 follows all possible occurrences of the
+				 * established ports' GCL entry i within the
+				 * first cycle time.
+				 */
+				for (t1 = rbt1 + delta1;
+				     t1 <= stop_time;
+				     t1 += tas_config->cycle_time) {
+
+					/* t2 follows all possible occurrences
+					 * of the newly added GCL entry j
+					 * within the first cycle time.
+					 */
+					for (t2 = rbt2 + delta2;
+					     t2 <= stop_time;
+					     t2 += qopt->cycle_time) {
+
+						if (t1 == t2) {
+							dev_warn(priv->ds->dev,
+								 "GCL entry %d collides with entry %d of port %d\n",
+								 j, i, port);
+							return true;
+						}
+					}
+				}
+			}
+		}
+	}
+
+	return false;
+}
+
+int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
+			    struct tc_taprio_qopt_offload *tas_config)
+{
+	struct sja1105_private *priv = ds->priv;
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	int rc, i;
+
+	/* Can't change an already configured port (must delete qdisc first).
+	 * Can't delete the qdisc from an unconfigured port.
+	 */
+	if (!!tas_data->config[port] == tas_config->enable)
+		return -EINVAL;
+
+	if (!tas_config->enable) {
+		taprio_free(tas_data->config[port]);
+		tas_data->config[port] = NULL;
+
+		rc = sja1105_init_scheduling(priv);
+		if (rc < 0)
+			return rc;
+
+		return sja1105_static_config_reload(priv);
+	}
+
+	/* The cycle time extension is the amount of time the last cycle from
+	 * the old OPER needs to be extended in order to phase-align with the
+	 * base time of the ADMIN when that becomes the new OPER.
+	 * But of course our switch needs to be reset to switch-over between
+	 * the ADMIN and the OPER configs - so much for a seamless transition.
+	 * So don't add insult over injury and just say we don't support cycle
+	 * time extension.
+	 */
+	if (tas_config->cycle_time_extension)
+		return -ENOTSUPP;
+
+	if (!ns_to_sja1105_delta(tas_config->base_time)) {
+		dev_err(ds->dev, "A base time of zero is not hardware-allowed\n");
+		return -ERANGE;
+	}
+
+	for (i = 0; i < tas_config->num_entries; i++) {
+		s64 delta_ns = tas_config->entries[i].interval;
+		s64 delta_cycles = ns_to_sja1105_delta(delta_ns);
+		bool too_long, too_short;
+
+		too_long = (delta_cycles >= SJA1105_TAS_MAX_DELTA);
+		too_short = (delta_cycles == 0);
+		if (too_long || too_short) {
+			dev_err(priv->ds->dev,
+				"Interval %llu too %s for GCL entry %d\n",
+				delta_ns, too_long ? "long" : "short", i);
+			return -ERANGE;
+		}
+	}
+
+	if (sja1105_tas_check_conflicts(priv, tas_config))
+		return -ERANGE;
+
+	tas_data->config[port] = taprio_get(tas_config);
+
+	rc = sja1105_init_scheduling(priv);
+	if (rc < 0)
+		return rc;
+
+	return sja1105_static_config_reload(priv);
+}
+
+void sja1105_tas_setup(struct sja1105_private *priv)
+{
+}
+
+void sja1105_tas_teardown(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	int port;
+
+	for (port = 0; port < SJA1105_NUM_PORTS; port++)
+		if (tas_data->config[port])
+			taprio_free(tas_data->config[port]);
+}
diff --git a/drivers/net/dsa/sja1105/sja1105_tas.h b/drivers/net/dsa/sja1105/sja1105_tas.h
new file mode 100644
index 000000000000..0ef82810d9d7
--- /dev/null
+++ b/drivers/net/dsa/sja1105/sja1105_tas.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
+ */
+#ifndef _SJA1105_TAS_H
+#define _SJA1105_TAS_H
+
+#include <net/pkt_sched.h>
+
+#if IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS)
+
+struct sja1105_tas_data {
+	struct tc_taprio_qopt_offload *config[SJA1105_NUM_PORTS];
+};
+
+int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
+			    struct tc_taprio_qopt_offload *qopt);
+
+void sja1105_tas_setup(struct sja1105_private *priv);
+
+void sja1105_tas_teardown(struct sja1105_private *priv);
+
+#else
+
+/* C doesn't allow empty structures, bah! */
+struct sja1105_tas_data {
+	u8 dummy;
+};
+
+static inline int
+sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
+			struct tc_taprio_qopt_offload *qopt)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void sja1105_tas_setup(struct sja1105_private *priv) { }
+
+static inline void sja1105_tas_teardown(struct sja1105_private *priv) { }
+
+#endif /* IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS) */
+
+#endif /* _SJA1105_TAS_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 13/15] net: dsa: sja1105: Make HOSTPRIO a kernel config
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (11 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 14/15] net: dsa: sja1105: Make the PTP command read-write Vladimir Oltean
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Unfortunately with this hardware, there is no way to transmit in-band
QoS hints with management frames (i.e. VLAN PCP is ignored). The traffic
class for these is fixed in the static config (which in turn requires a
reset to change).

With the new ability to add time gates for individual traffic classes,
there is a real danger that the user might unknowingly turn off the
traffic class for PTP, BPDUs, LLDP etc.

So we need to manage this situation the best we can. There isn't any
knob in Linux for this, and changing it at runtime probably isn't worth
it either. So just make the setting loud enough by promoting it to a
Kconfig, which the user can customize to their particular setup.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/Kconfig        | 9 +++++++++
 drivers/net/dsa/sja1105/sja1105_main.c | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
index 55424f39cb0d..4dc873e985e6 100644
--- a/drivers/net/dsa/sja1105/Kconfig
+++ b/drivers/net/dsa/sja1105/Kconfig
@@ -17,6 +17,15 @@ tristate "NXP SJA1105 Ethernet switch family support"
 	    - SJA1105R (Gen. 2, SGMII, No TT-Ethernet)
 	    - SJA1105S (Gen. 2, SGMII, TT-Ethernet)
 
+config NET_DSA_SJA1105_HOSTPRIO
+	int "Traffic class for management traffic"
+	range 0 7
+	default 7
+	depends on NET_DSA_SJA1105
+	help
+	  Configure the traffic class which will be used for management
+	  (link-local) traffic sent and received over switch ports.
+
 config NET_DSA_SJA1105_PTP
 	bool "Support for the PTP clock on the NXP SJA1105 Ethernet switch"
 	depends on NET_DSA_SJA1105
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 4b393782cc84..0c03347b6429 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -388,7 +388,7 @@ static int sja1105_init_general_params(struct sja1105_private *priv)
 		/* Priority queue for link-local management frames
 		 * (both ingress to and egress from CPU - PTP, STP etc)
 		 */
-		.hostprio = 7,
+		.hostprio = CONFIG_NET_DSA_SJA1105_HOSTPRIO,
 		.mac_fltres1 = SJA1105_LINKLOCAL_FILTER_A,
 		.mac_flt1    = SJA1105_LINKLOCAL_FILTER_A_MASK,
 		.incl_srcpt1 = false,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 14/15] net: dsa: sja1105: Make the PTP command read-write
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (12 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 13/15] net: dsa: sja1105: Make HOSTPRIO a kernel config Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-02 16:25 ` [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source Vladimir Oltean
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
whether the time-aware scheduler is running or not. We will be using
that for monitoring the scheduler in the next patch, so refactor the PTP
command API in order to allow that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- None.

 drivers/net/dsa/sja1105/sja1105.h     | 13 +++---
 drivers/net/dsa/sja1105/sja1105_ptp.c | 64 ++++++++++++++++-----------
 drivers/net/dsa/sja1105/sja1105_ptp.h | 12 +++--
 drivers/net/dsa/sja1105/sja1105_spi.c | 12 ++---
 4 files changed, 58 insertions(+), 43 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index d95f9ce3b4f9..44f7385c51b5 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -20,6 +20,11 @@
  */
 #define SJA1105_AGEING_TIME_MS(ms)	((ms) / 10)
 
+typedef enum {
+	SPI_READ = 0,
+	SPI_WRITE = 1,
+} sja1105_spi_rw_mode_t;
+
 #include "sja1105_ptp.h"
 #include "sja1105_tas.h"
 
@@ -71,7 +76,6 @@ struct sja1105_info {
 	const struct sja1105_dynamic_table_ops *dyn_ops;
 	const struct sja1105_table_ops *static_ops;
 	const struct sja1105_regs *regs;
-	int (*ptp_cmd)(const void *ctx, const void *data);
 	int (*reset_cmd)(const void *ctx, const void *data);
 	int (*setup_rgmii_delay)(const void *ctx, int port);
 	/* Prototypes from include/net/dsa.h */
@@ -79,6 +83,8 @@ struct sja1105_info {
 			   const unsigned char *addr, u16 vid);
 	int (*fdb_del_cmd)(struct dsa_switch *ds, int port,
 			   const unsigned char *addr, u16 vid);
+	void (*ptp_cmd_packing)(u8 *buf, struct sja1105_ptp_cmd *cmd,
+				enum packing_op op);
 	const char *name;
 };
 
@@ -108,11 +114,6 @@ struct sja1105_spi_message {
 	u64 address;
 };
 
-typedef enum {
-	SPI_READ = 0,
-	SPI_WRITE = 1,
-} sja1105_spi_rw_mode_t;
-
 /* From sja1105_main.c */
 int sja1105_static_config_reload(struct sja1105_private *priv);
 
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index f85f44bdab31..ed80278a3521 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -59,42 +59,50 @@ int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 	return 0;
 }
 
-int sja1105et_ptp_cmd(const void *ctx, const void *data)
+void sja1105et_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
+			       enum packing_op op)
 {
-	const struct sja1105_ptp_cmd *cmd = data;
-	const struct sja1105_private *priv = ctx;
-	const struct sja1105_regs *regs = priv->info->regs;
 	const int size = SJA1105_SIZE_PTP_CMD;
-	u8 buf[SJA1105_SIZE_PTP_CMD] = {0};
 	/* No need to keep this as part of the structure */
 	u64 valid = 1;
 
-	sja1105_pack(buf, &valid,           31, 31, size);
-	sja1105_pack(buf, &cmd->resptp,      2,  2, size);
-	sja1105_pack(buf, &cmd->corrclk4ts,  1,  1, size);
-	sja1105_pack(buf, &cmd->ptpclkadd,   0,  0, size);
-
-	return sja1105_spi_send_packed_buf(priv, SPI_WRITE, regs->ptp_control,
-					   buf, SJA1105_SIZE_PTP_CMD);
+	sja1105_packing(buf, &valid,           31, 31, size, op);
+	sja1105_packing(buf, &cmd->resptp,      2,  2, size, op);
+	sja1105_packing(buf, &cmd->corrclk4ts,  1,  1, size, op);
+	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
 }
 
-int sja1105pqrs_ptp_cmd(const void *ctx, const void *data)
+void sja1105pqrs_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
+				 enum packing_op op)
 {
-	const struct sja1105_ptp_cmd *cmd = data;
-	const struct sja1105_private *priv = ctx;
-	const struct sja1105_regs *regs = priv->info->regs;
 	const int size = SJA1105_SIZE_PTP_CMD;
-	u8 buf[SJA1105_SIZE_PTP_CMD] = {0};
 	/* No need to keep this as part of the structure */
 	u64 valid = 1;
 
-	sja1105_pack(buf, &valid,           31, 31, size);
-	sja1105_pack(buf, &cmd->resptp,      3,  3, size);
-	sja1105_pack(buf, &cmd->corrclk4ts,  2,  2, size);
-	sja1105_pack(buf, &cmd->ptpclkadd,   0,  0, size);
+	sja1105_packing(buf, &valid,           31, 31, size, op);
+	sja1105_packing(buf, &cmd->resptp,      3,  3, size, op);
+	sja1105_packing(buf, &cmd->corrclk4ts,  2,  2, size, op);
+	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
+}
 
-	return sja1105_spi_send_packed_buf(priv, SPI_WRITE, regs->ptp_control,
-					   buf, SJA1105_SIZE_PTP_CMD);
+static int sja1105_ptp_commit(struct sja1105_private *priv,
+			      struct sja1105_ptp_cmd *cmd,
+			      sja1105_spi_rw_mode_t rw)
+{
+	const struct sja1105_regs *regs = priv->info->regs;
+	u8 buf[SJA1105_SIZE_PTP_CMD] = {0};
+	int rc;
+
+	if (rw == SPI_WRITE)
+		priv->info->ptp_cmd_packing(buf, cmd, PACK);
+
+	rc = sja1105_spi_send_packed_buf(priv, rw, regs->ptp_control,
+					 buf, SJA1105_SIZE_PTP_CMD);
+
+	if (rw == SPI_READ)
+		priv->info->ptp_cmd_packing(buf, cmd, UNPACK);
+
+	return rc;
 }
 
 /* The switch returns partial timestamps (24 bits for SJA1105 E/T, which wrap
@@ -212,7 +220,7 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
 	cmd.resptp = 1;
 
 	dev_dbg(priv->ds->dev, "Resetting PTP clock\n");
-	rc = priv->info->ptp_cmd(priv, &cmd);
+	rc = sja1105_ptp_commit(priv, &cmd, SPI_WRITE);
 
 	mutex_unlock(&ptp_data->lock);
 
@@ -250,12 +258,14 @@ static int sja1105_ptp_gettimex(struct ptp_clock_info *ptp,
 static int sja1105_ptp_mode_set(struct sja1105_private *priv,
 				enum sja1105_ptp_clk_mode mode)
 {
-	if (priv->ptp_data.cmd.ptpclkadd == mode)
+	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
+
+	if (ptp_data->cmd.ptpclkadd == mode)
 		return 0;
 
-	priv->ptp_data.cmd.ptpclkadd = mode;
+	ptp_data->cmd.ptpclkadd = mode;
 
-	return priv->info->ptp_cmd(priv, &priv->ptp_data.cmd);
+	return sja1105_ptp_commit(priv, &ptp_data->cmd, SPI_WRITE);
 }
 
 /* Caller must hold priv->ptp_data.lock */
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index dfe856200394..c24c40115650 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -48,9 +48,11 @@ void sja1105_ptp_clock_unregister(struct sja1105_private *priv);
 
 int sja1105_ptpegr_ts_poll(struct sja1105_private *priv, int port, u64 *ts);
 
-int sja1105et_ptp_cmd(const void *ctx, const void *data);
+void sja1105et_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
+			       enum packing_op op);
 
-int sja1105pqrs_ptp_cmd(const void *ctx, const void *data);
+void sja1105pqrs_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
+				 enum packing_op op);
 
 int sja1105_get_ts_info(struct dsa_switch *ds, int port,
 			struct ethtool_ts_info *ts);
@@ -73,6 +75,8 @@ int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta);
 
 #else
 
+struct sja1105_ptp_cmd;
+
 /* Structures cannot be empty in C. Bah!
  * Keep the mutex as the only element, which is a bit more difficult to
  * refactor out of sja1105_main.c anyway.
@@ -131,9 +135,9 @@ static inline int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
 	return 0;
 }
 
-#define sja1105et_ptp_cmd NULL
+#define sja1105et_ptp_cmd_packing NULL
 
-#define sja1105pqrs_ptp_cmd NULL
+#define sja1105pqrs_ptp_cmd_packing NULL
 
 #define sja1105_get_ts_info NULL
 
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index eae9c9baa189..794cc5077565 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -571,7 +571,7 @@ struct sja1105_info sja1105e_info = {
 	.reset_cmd		= sja1105et_reset_cmd,
 	.fdb_add_cmd		= sja1105et_fdb_add,
 	.fdb_del_cmd		= sja1105et_fdb_del,
-	.ptp_cmd		= sja1105et_ptp_cmd,
+	.ptp_cmd_packing	= sja1105et_ptp_cmd_packing,
 	.regs			= &sja1105et_regs,
 	.name			= "SJA1105E",
 };
@@ -585,7 +585,7 @@ struct sja1105_info sja1105t_info = {
 	.reset_cmd		= sja1105et_reset_cmd,
 	.fdb_add_cmd		= sja1105et_fdb_add,
 	.fdb_del_cmd		= sja1105et_fdb_del,
-	.ptp_cmd		= sja1105et_ptp_cmd,
+	.ptp_cmd_packing	= sja1105et_ptp_cmd_packing,
 	.regs			= &sja1105et_regs,
 	.name			= "SJA1105T",
 };
@@ -600,7 +600,7 @@ struct sja1105_info sja1105p_info = {
 	.reset_cmd		= sja1105pqrs_reset_cmd,
 	.fdb_add_cmd		= sja1105pqrs_fdb_add,
 	.fdb_del_cmd		= sja1105pqrs_fdb_del,
-	.ptp_cmd		= sja1105pqrs_ptp_cmd,
+	.ptp_cmd_packing	= sja1105pqrs_ptp_cmd_packing,
 	.regs			= &sja1105pqrs_regs,
 	.name			= "SJA1105P",
 };
@@ -615,7 +615,7 @@ struct sja1105_info sja1105q_info = {
 	.reset_cmd		= sja1105pqrs_reset_cmd,
 	.fdb_add_cmd		= sja1105pqrs_fdb_add,
 	.fdb_del_cmd		= sja1105pqrs_fdb_del,
-	.ptp_cmd		= sja1105pqrs_ptp_cmd,
+	.ptp_cmd_packing	= sja1105pqrs_ptp_cmd_packing,
 	.regs			= &sja1105pqrs_regs,
 	.name			= "SJA1105Q",
 };
@@ -630,7 +630,7 @@ struct sja1105_info sja1105r_info = {
 	.reset_cmd		= sja1105pqrs_reset_cmd,
 	.fdb_add_cmd		= sja1105pqrs_fdb_add,
 	.fdb_del_cmd		= sja1105pqrs_fdb_del,
-	.ptp_cmd		= sja1105pqrs_ptp_cmd,
+	.ptp_cmd_packing	= sja1105pqrs_ptp_cmd_packing,
 	.regs			= &sja1105pqrs_regs,
 	.name			= "SJA1105R",
 };
@@ -646,6 +646,6 @@ struct sja1105_info sja1105s_info = {
 	.reset_cmd		= sja1105pqrs_reset_cmd,
 	.fdb_add_cmd		= sja1105pqrs_fdb_add,
 	.fdb_del_cmd		= sja1105pqrs_fdb_del,
-	.ptp_cmd		= sja1105pqrs_ptp_cmd,
+	.ptp_cmd_packing	= sja1105pqrs_ptp_cmd_packing,
 	.name			= "SJA1105S",
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (13 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 14/15] net: dsa: sja1105: Make the PTP command read-write Vladimir Oltean
@ 2019-09-02 16:25 ` Vladimir Oltean
  2019-09-11 19:43   ` Vinicius Costa Gomes
  2019-09-06 12:54 ` [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA David Miller
  2019-09-07 13:55 ` David Miller
  16 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-02 16:25 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Tested using the following bash script and the tc from iproute2-next:

	#!/bin/bash

	set -e -u -o pipefail

	NSEC_PER_SEC="1000000000"

	gatemask() {
		local tc_list="$1"
		local mask=0

		for tc in ${tc_list}; do
			mask=$((${mask} | (1 << ${tc})))
		done

		printf "%02x" ${mask}
	}

	if ! systemctl is-active --quiet ptp4l; then
		echo "Please start the ptp4l service"
		exit
	fi

	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
	# Phase-align the base time to the start of the next second.
	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"

	echo 'file drivers/net/dsa/sja1105/sja1105_tas.c +plm' | \
		sudo tee /sys/kernel/debug/dynamic_debug/control

	tc qdisc add dev swp5 parent root handle 100 taprio \
		num_tc 8 \
		map 0 1 2 3 5 6 7 \
		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
		base-time ${base_time} \
		sched-entry S $(gatemask 7) 100000 \
		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
		clockid CLOCK_TAI flags 2

The "state machine" is a workqueue invoked after each manipulation
command on the PTP clock (reset, adjust time, set time, adjust
frequency) which checks over the state of the time-aware scheduler.
So it is not monitored periodically, only in reaction to a PTP command
typically triggered from a userspace daemon (linuxptp). Otherwise there
is no reason for things to go wrong.

Now that the timecounter/cyclecounter has been replaced with hardware
operations on the PTP clock, the TAS Kconfig now depends upon PTP and
the standalone clocksource operating mode has been removed.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since RFC:
- Used the "delta" terminology instead of "TAS cycle" to be more
  consistent and avoid confusion with the cyclic schedule (of which the
  "delta" is only the most granular unit, there is no other connection).

 drivers/net/dsa/sja1105/Kconfig       |   2 +-
 drivers/net/dsa/sja1105/sja1105.h     |   2 +
 drivers/net/dsa/sja1105/sja1105_ptp.c |  26 +-
 drivers/net/dsa/sja1105/sja1105_ptp.h |  13 +
 drivers/net/dsa/sja1105/sja1105_spi.c |   4 +
 drivers/net/dsa/sja1105/sja1105_tas.c | 426 +++++++++++++++++++++++++-
 drivers/net/dsa/sja1105/sja1105_tas.h |  27 ++
 7 files changed, 486 insertions(+), 14 deletions(-)

diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
index 4dc873e985e6..9316a23b7c30 100644
--- a/drivers/net/dsa/sja1105/Kconfig
+++ b/drivers/net/dsa/sja1105/Kconfig
@@ -35,7 +35,7 @@ config NET_DSA_SJA1105_PTP
 
 config NET_DSA_SJA1105_TAS
 	bool "Support for the Time-Aware Scheduler on NXP SJA1105"
-	depends on NET_DSA_SJA1105
+	depends on NET_DSA_SJA1105_PTP
 	help
 	  This enables support for the TTEthernet-based egress scheduling
 	  engine in the SJA1105 DSA driver, which is controlled using a
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index 44f7385c51b5..e8f95b6fadfa 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -40,6 +40,8 @@ struct sja1105_regs {
 	u64 ptp_control;
 	u64 ptpclk;
 	u64 ptpclkrate;
+	u64 ptpclkcorp;
+	u64 ptpschtm;
 	u64 ptpegr_ts[SJA1105_NUM_PORTS];
 	u64 pad_mii_tx[SJA1105_NUM_PORTS];
 	u64 pad_mii_id[SJA1105_NUM_PORTS];
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index ed80278a3521..b037834ff820 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -67,6 +67,8 @@ void sja1105et_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
 	u64 valid = 1;
 
 	sja1105_packing(buf, &valid,           31, 31, size, op);
+	sja1105_packing(buf, &cmd->ptpstrtsch, 30, 30, size, op);
+	sja1105_packing(buf, &cmd->ptpstopsch, 29, 29, size, op);
 	sja1105_packing(buf, &cmd->resptp,      2,  2, size, op);
 	sja1105_packing(buf, &cmd->corrclk4ts,  1,  1, size, op);
 	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
@@ -80,14 +82,16 @@ void sja1105pqrs_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
 	u64 valid = 1;
 
 	sja1105_packing(buf, &valid,           31, 31, size, op);
+	sja1105_packing(buf, &cmd->ptpstrtsch, 30, 30, size, op);
+	sja1105_packing(buf, &cmd->ptpstopsch, 29, 29, size, op);
 	sja1105_packing(buf, &cmd->resptp,      3,  3, size, op);
 	sja1105_packing(buf, &cmd->corrclk4ts,  2,  2, size, op);
 	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
 }
 
-static int sja1105_ptp_commit(struct sja1105_private *priv,
-			      struct sja1105_ptp_cmd *cmd,
-			      sja1105_spi_rw_mode_t rw)
+int sja1105_ptp_commit(struct sja1105_private *priv,
+		       struct sja1105_ptp_cmd *cmd,
+		       sja1105_spi_rw_mode_t rw)
 {
 	const struct sja1105_regs *regs = priv->info->regs;
 	u8 buf[SJA1105_SIZE_PTP_CMD] = {0};
@@ -222,6 +226,8 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
 	dev_dbg(priv->ds->dev, "Resetting PTP clock\n");
 	rc = sja1105_ptp_commit(priv, &cmd, SPI_WRITE);
 
+	sja1105_tas_clockstep(priv);
+
 	mutex_unlock(&ptp_data->lock);
 
 	return rc;
@@ -291,7 +297,11 @@ int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
 		return rc;
 	}
 
-	return sja1105_ptpclkval_write(priv, ticks, ptp_sts);
+	rc = sja1105_ptpclkval_write(priv, ticks, ptp_sts);
+
+	sja1105_tas_clockstep(priv);
+
+	return rc;
 }
 
 static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
@@ -331,6 +341,8 @@ static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
 				  &clkrate, 4, NULL);
 
+	sja1105_tas_adjfreq(priv);
+
 	mutex_unlock(&priv->ptp_data.lock);
 
 	return rc;
@@ -366,7 +378,11 @@ int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
 		return rc;
 	}
 
-	return sja1105_ptpclkval_write(priv, ticks, NULL);
+	rc = sja1105_ptpclkval_write(priv, ticks, NULL);
+
+	sja1105_tas_clockstep(priv);
+
+	return rc;
 }
 
 static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
index c24c40115650..da68e5881e5f 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.h
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
@@ -29,6 +29,8 @@ enum sja1105_ptp_clk_mode {
 };
 
 struct sja1105_ptp_cmd {
+	u64 ptpstrtsch;		/* start schedule */
+	u64 ptpstopsch;		/* stop schedule */
 	u64 resptp;		/* reset */
 	u64 corrclk4ts;		/* use the corrected clock for timestamps */
 	u64 ptpclkadd;		/* enum sja1105_ptp_clk_mode */
@@ -73,6 +75,10 @@ int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
 
 int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta);
 
+int sja1105_ptp_commit(struct sja1105_private *priv,
+		       struct sja1105_ptp_cmd *cmd,
+		       sja1105_spi_rw_mode_t rw);
+
 #else
 
 struct sja1105_ptp_cmd;
@@ -135,6 +141,13 @@ static inline int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
 	return 0;
 }
 
+static inline int sja1105_ptp_commit(struct sja1105_private *priv,
+				     struct sja1105_ptp_cmd *cmd,
+				     sja1105_spi_rw_mode_t rw)
+{
+	return 0;
+}
+
 #define sja1105et_ptp_cmd_packing NULL
 
 #define sja1105pqrs_ptp_cmd_packing NULL
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 794cc5077565..f6df050c15ec 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -526,9 +526,11 @@ static struct sja1105_regs sja1105et_regs = {
 	.rmii_ref_clk = {0x100015, 0x10001C, 0x100023, 0x10002A, 0x100031},
 	.rmii_ext_tx_clk = {0x100018, 0x10001F, 0x100026, 0x10002D, 0x100034},
 	.ptpegr_ts = {0xC0, 0xC2, 0xC4, 0xC6, 0xC8},
+	.ptpschtm = 0x12, /* Spans 0x12 to 0x13 */
 	.ptp_control = 0x17,
 	.ptpclk = 0x18, /* Spans 0x18 to 0x19 */
 	.ptpclkrate = 0x1A,
+	.ptpclkcorp = 0x1D,
 };
 
 static struct sja1105_regs sja1105pqrs_regs = {
@@ -556,9 +558,11 @@ static struct sja1105_regs sja1105pqrs_regs = {
 	.rmii_ext_tx_clk = {0x100017, 0x10001D, 0x100023, 0x100029, 0x10002F},
 	.qlevel = {0x604, 0x614, 0x624, 0x634, 0x644},
 	.ptpegr_ts = {0xC0, 0xC4, 0xC8, 0xCC, 0xD0},
+	.ptpschtm = 0x13, /* Spans 0x13 to 0x14 */
 	.ptp_control = 0x18,
 	.ptpclk = 0x19,
 	.ptpclkrate = 0x1B,
+	.ptpclkcorp = 0x1E,
 };
 
 struct sja1105_info sja1105e_info = {
diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
index 769e1d8e5e8f..ed0c3f00c09d 100644
--- a/drivers/net/dsa/sja1105/sja1105_tas.c
+++ b/drivers/net/dsa/sja1105/sja1105_tas.c
@@ -10,6 +10,11 @@
 #define SJA1105_GATE_MASK		GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
 #define SJA1105_TAS_MAX_DELTA		BIT(19)
 
+#define work_to_sja1105_tas(d) \
+	container_of((d), struct sja1105_tas_data, tas_work)
+#define tas_to_sja1105(d) \
+	container_of((d), struct sja1105_private, tas_data)
+
 /* This is not a preprocessor macro because the "ns" argument may or may not be
  * s64 at caller side. This ensures it is properly type-cast before div_s64.
  */
@@ -18,6 +23,102 @@ static s64 ns_to_sja1105_delta(s64 ns)
 	return div_s64(ns, 200);
 }
 
+static s64 sja1105_delta_to_ns(s64 delta)
+{
+	return delta * 200;
+}
+
+/* Calculate the first base_time in the future that satisfies this
+ * relationship:
+ *
+ * future_base_time = base_time + N x cycle_time >= now, or
+ *
+ *      now - base_time
+ * N >= ---------------
+ *         cycle_time
+ *
+ * Because N is an integer, the ceiling value of the above "a / b" ratio
+ * is in fact precisely the floor value of "(a + b - 1) / b", which is
+ * easier to calculate only having integer division tools.
+ */
+static s64 future_base_time(s64 base_time, s64 cycle_time, s64 now)
+{
+	s64 a, b, n;
+
+	if (base_time >= now)
+		return base_time;
+
+	a = now - base_time;
+	b = cycle_time;
+	n = div_s64(a + b - 1, b);
+
+	return base_time + n * cycle_time;
+}
+
+static int sja1105_tas_set_runtime_params(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	s64 earliest_base_time = S64_MAX;
+	s64 latest_base_time = 0;
+	s64 its_cycle_time = 0;
+	s64 max_cycle_time = 0;
+	int port;
+
+	tas_data->enabled = false;
+
+	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
+		const struct tc_taprio_qopt_offload *tas_config;
+
+		tas_config = tas_data->config[port];
+		if (!tas_config)
+			continue;
+
+		tas_data->enabled = true;
+
+		if (max_cycle_time < tas_config->cycle_time)
+			max_cycle_time = tas_config->cycle_time;
+		if (latest_base_time < tas_config->base_time)
+			latest_base_time = tas_config->base_time;
+		if (earliest_base_time > tas_config->base_time) {
+			earliest_base_time = tas_config->base_time;
+			its_cycle_time = tas_config->cycle_time;
+		}
+	}
+
+	if (!tas_data->enabled)
+		return 0;
+
+	/* Roll the earliest base time over until it is in a comparable
+	 * time base with the latest, then compare their deltas.
+	 * We want to enforce that all ports' base times are within
+	 * SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
+	 */
+	earliest_base_time = future_base_time(earliest_base_time,
+					      its_cycle_time,
+					      latest_base_time);
+	while (earliest_base_time > latest_base_time)
+		earliest_base_time -= its_cycle_time;
+	if (latest_base_time - earliest_base_time >
+	    sja1105_delta_to_ns(SJA1105_TAS_MAX_DELTA)) {
+		dev_err(priv->ds->dev,
+			"Base times too far apart: min %llu max %llu\n",
+			earliest_base_time, latest_base_time);
+		return -ERANGE;
+	}
+
+	tas_data->earliest_base_time = earliest_base_time;
+	tas_data->max_cycle_time = max_cycle_time;
+
+	dev_dbg(priv->ds->dev, "earliest base time %lld ns\n",
+		tas_data->earliest_base_time);
+	dev_dbg(priv->ds->dev, "latest base time %lld ns\n",
+		tas_data->earliest_base_time);
+	dev_dbg(priv->ds->dev, "longest cycle time %lld ns\n",
+		tas_data->max_cycle_time);
+
+	return 0;
+}
+
 /* Lo and behold: the egress scheduler from hell.
  *
  * At the hardware level, the Time-Aware Shaper holds a global linear arrray of
@@ -100,7 +201,11 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
 	int num_cycles = 0;
 	int cycle = 0;
 	int i, k = 0;
-	int port;
+	int port, rc;
+
+	rc = sja1105_tas_set_runtime_params(priv);
+	if (rc < 0)
+		return rc;
 
 	/* Discard previous Schedule Table */
 	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
@@ -181,11 +286,13 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
 	schedule_entry_points = table->entries;
 
 	/* Finally start populating the static config tables */
-	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_STANDALONE;
+	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_PTP;
 	schedule_entry_points_params->actsubsch = num_cycles - 1;
 
 	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
 		const struct tc_taprio_qopt_offload *tas_config;
+		/* Relative base time */
+		s64 rbt;
 
 		tas_config = tas_data->config[port];
 		if (!tas_config)
@@ -193,13 +300,20 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
 
 		schedule_start_idx = k;
 		schedule_end_idx = k + tas_config->num_entries - 1;
-		/* TODO this is only a relative base time for the subschedule
-		 * (relative to PTPSCHTM). But as we're using standalone and
-		 * not PTP clock as time reference, leave it like this for now.
-		 * Later we'll have to enforce that all ports' base times are
-		 * within SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
+		/* This is only a relative base time for the subschedule
+		 * (relative to PTPSCHTM - aka the operational base time).
 		 */
-		entry_point_delta = ns_to_sja1105_delta(tas_config->base_time);
+		rbt = future_base_time(tas_config->base_time,
+				       tas_config->cycle_time,
+				       tas_data->earliest_base_time);
+		rbt -= tas_data->earliest_base_time;
+		/* UM10944.pdf 4.2.2. Schedule Entry Points table says that
+		 * delta cannot be zero, which is shitty. Advance all relative
+		 * base times by 1 TAS delta, so that even the earliest base
+		 * time becomes 1 in relative terms. Then start the operational
+		 * base time (PTPSCHTM) one TAS delta earlier than planned.
+		 */
+		entry_point_delta = ns_to_sja1105_delta(rbt) + 1;
 
 		schedule_entry_points[cycle].subschindx = cycle;
 		schedule_entry_points[cycle].delta = entry_point_delta;
@@ -405,8 +519,302 @@ int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
 	return sja1105_static_config_reload(priv);
 }
 
+static int sja1105_tas_check_running(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	struct sja1105_ptp_cmd cmd = {0};
+	int rc;
+
+	rc = sja1105_ptp_commit(priv, &cmd, SPI_READ);
+	if (rc < 0)
+		return rc;
+
+	if (cmd.ptpstrtsch == 1)
+		/* Schedule successfully started */
+		tas_data->state = SJA1105_TAS_STATE_RUNNING;
+	else if (cmd.ptpstopsch == 1)
+		/* Schedule is stopped */
+		tas_data->state = SJA1105_TAS_STATE_DISABLED;
+	else
+		/* Schedule is probably not configured with PTP clock source */
+		rc = -EINVAL;
+
+	return rc;
+}
+
+/* Write to PTPCLKCORP */
+static int sja1105_tas_adjust_drift(struct sja1105_private *priv,
+				    u64 correction)
+{
+	const struct sja1105_regs *regs = priv->info->regs;
+	u64 ptpclkcorp = ns_to_sja1105_ticks(correction);
+
+	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkcorp,
+				    &ptpclkcorp, 4, NULL);
+}
+
+/* Write to PTPSCHTM */
+static int sja1105_tas_set_base_time(struct sja1105_private *priv,
+				     u64 base_time)
+{
+	const struct sja1105_regs *regs = priv->info->regs;
+	u64 ptpschtm = ns_to_sja1105_ticks(base_time);
+
+	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpschtm,
+				    &ptpschtm, 8, NULL);
+}
+
+static int sja1105_tas_start(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	struct sja1105_ptp_cmd *cmd = &priv->ptp_data.cmd;
+	int rc;
+
+	dev_dbg(priv->ds->dev, "Starting the TAS\n");
+
+	if (tas_data->state == SJA1105_TAS_STATE_ENABLED_NOT_RUNNING ||
+	    tas_data->state == SJA1105_TAS_STATE_RUNNING) {
+		dev_err(priv->ds->dev, "TAS already started\n");
+		return -EINVAL;
+	}
+
+	cmd->ptpstrtsch = 1;
+	cmd->ptpstopsch = 0;
+
+	rc = sja1105_ptp_commit(priv, cmd, SPI_WRITE);
+	if (rc < 0)
+		return rc;
+
+	tas_data->state = SJA1105_TAS_STATE_ENABLED_NOT_RUNNING;
+
+	return 0;
+}
+
+static int sja1105_tas_stop(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+	struct sja1105_ptp_cmd *cmd = &priv->ptp_data.cmd;
+	int rc;
+
+	dev_dbg(priv->ds->dev, "Stopping the TAS\n");
+
+	if (tas_data->state == SJA1105_TAS_STATE_DISABLED) {
+		dev_err(priv->ds->dev, "TAS already disabled\n");
+		return -EINVAL;
+	}
+
+	cmd->ptpstopsch = 1;
+	cmd->ptpstrtsch = 0;
+
+	rc = sja1105_ptp_commit(priv, cmd, SPI_WRITE);
+	if (rc < 0)
+		return rc;
+
+	tas_data->state = SJA1105_TAS_STATE_DISABLED;
+
+	return 0;
+}
+
+/* The schedule engine and the PTP clock are driven by the same oscillator, and
+ * they run in parallel. But whilst the PTP clock can keep an absolute
+ * time-of-day, the schedule engine is only running in 'ticks' (25 ticks make
+ * up a delta, which is 200ns), and wrapping around at the end of each cycle.
+ * The schedule engine is started when the PTP clock reaches the PTPSCHTM time
+ * (in PTP domain).
+ * Because the PTP clock can be rate-corrected (accelerated or slowed down) by
+ * a software servo, and the schedule engine clock runs in parallel to the PTP
+ * clock, there is logic internal to the switch that periodically keeps the
+ * schedule engine from drifting away. The frequency with which this internal
+ * syntonization happens is the PTP clock correction period (PTPCLKCORP). It is
+ * a value also in the PTP clock domain, and is also rate-corrected.
+ * To be precise, during a correction period, there is logic to determine by
+ * how many scheduler clock ticks has the PTP clock drifted. At the end of each
+ * correction period/beginning of new one, the length of a delta is shrunk or
+ * expanded with an integer number of ticks, compared with the typical 25.
+ * So a delta lasts for 200ns (or 25 ticks) only on average.
+ * Sometimes it is longer, sometimes it is shorter. The internal syntonization
+ * logic can adjust for at most 5 ticks each 20 ticks.
+ *
+ * The first implication is that you should choose your schedule correction
+ * period to be an integer multiple of the schedule length. Preferably one.
+ * In case there are schedules of multiple ports active, then the correction
+ * period needs to be a multiple of them all. Given the restriction that the
+ * cycle times have to be multiples of one another anyway, this means the
+ * correction period can simply be the largest cycle time, hence the current
+ * choice. This way, the updates are always synchronous to the transmission
+ * cycle, and therefore predictable.
+ *
+ * The second implication is that at the beginning of a correction period, the
+ * first few deltas will be modulated in time, until the schedule engine is
+ * properly phase-aligned with the PTP clock. For this reason, you should place
+ * your best-effort traffic at the beginning of a cycle, and your
+ * time-triggered traffic afterwards.
+ *
+ * The third implication is that once the schedule engine is started, it can
+ * only adjust for so much drift within a correction period. In the servo you
+ * can only change the PTPCLKRATE, but not step the clock (PTPCLKADD). If you
+ * want to do the latter, you need to stop and restart the schedule engine,
+ * which is what the state machine handles.
+ */
+static void sja1105_tas_state_machine(struct work_struct *work)
+{
+	struct sja1105_tas_data *tas_data = work_to_sja1105_tas(work);
+	struct sja1105_private *priv = tas_to_sja1105(tas_data);
+	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
+	struct timespec64 base_time_ts, now_ts;
+	struct dsa_switch *ds = priv->ds;
+	struct timespec64 diff;
+	s64 base_time, now;
+	int rc = 0;
+
+	mutex_lock(&ptp_data->lock);
+
+	switch (tas_data->state) {
+	case SJA1105_TAS_STATE_DISABLED:
+
+		dev_dbg(ds->dev, "TAS state: disabled\n");
+		/* Can't do anything at all if clock is still being stepped */
+		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ)
+			break;
+
+		rc = sja1105_tas_adjust_drift(priv, tas_data->max_cycle_time);
+		if (rc < 0)
+			break;
+
+		now = __sja1105_ptp_gettimex(priv, NULL);
+
+		/* Plan to start the earliest schedule first. The others
+		 * will be started in hardware, by way of their respective
+		 * entry points delta.
+		 * Try our best to avoid fringe cases (race condition between
+		 * ptpschtm and ptpstrtsch) by pushing the oper_base_time at
+		 * least one second in the future from now. This is not ideal,
+		 * but this only needs to buy us time until the
+		 * sja1105_tas_start command below gets executed.
+		 */
+		base_time = future_base_time(tas_data->earliest_base_time,
+					     tas_data->max_cycle_time,
+					     now + 1ull * NSEC_PER_SEC);
+		base_time -= sja1105_delta_to_ns(1);
+
+		rc = sja1105_tas_set_base_time(priv, base_time);
+		if (rc < 0)
+			break;
+
+		tas_data->oper_base_time = base_time;
+
+		rc = sja1105_tas_start(priv);
+		if (rc < 0)
+			break;
+
+		base_time_ts = ns_to_timespec64(base_time);
+		now_ts = ns_to_timespec64(now);
+
+		dev_dbg(ds->dev, "OPER base time %lld.%09ld (now %lld.%09ld)\n",
+			base_time_ts.tv_sec, base_time_ts.tv_nsec,
+			now_ts.tv_sec, now_ts.tv_nsec);
+
+		break;
+
+	case SJA1105_TAS_STATE_ENABLED_NOT_RUNNING:
+		/* Check if TAS has actually started, by comparing the
+		 * scheduled start time with the SJA1105 PTP clock
+		 */
+		dev_dbg(ds->dev, "TAS state: enabled but not running\n");
+
+		/* Clock was stepped.. bad news for TAS */
+		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ) {
+			sja1105_tas_stop(priv);
+			break;
+		}
+
+		now = __sja1105_ptp_gettimex(priv, NULL);
+
+		if (now < tas_data->oper_base_time) {
+			/* TAS has not started yet */
+			diff = ns_to_timespec64(tas_data->oper_base_time - now);
+			dev_dbg(ds->dev, "time to start: [%lld.%09ld]",
+				diff.tv_sec, diff.tv_nsec);
+			break;
+		}
+
+		/* Time elapsed, what happened? */
+		rc = sja1105_tas_check_running(priv);
+		if (rc < 0)
+			break;
+
+		if (tas_data->state == SJA1105_TAS_STATE_RUNNING)
+			/* TAS has started */
+			dev_dbg(ds->dev, "TAS state: transitioned to running\n");
+		else
+			dev_err(ds->dev, "TAS state: not started despite time elapsed\n");
+
+		break;
+
+	case SJA1105_TAS_STATE_RUNNING:
+		dev_dbg(ds->dev, "TAS state: running\n");
+
+		/* Clock was stepped.. bad news for TAS */
+		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ) {
+			sja1105_tas_stop(priv);
+			break;
+		}
+
+		rc = sja1105_tas_check_running(priv);
+		if (rc < 0)
+			break;
+
+		if (tas_data->state != SJA1105_TAS_STATE_RUNNING) {
+			dev_err(ds->dev, "TAS surprisingly stopped\n");
+			break;
+		}
+
+		now = __sja1105_ptp_gettimex(priv, NULL);
+
+		diff = ns_to_timespec64(now - tas_data->oper_base_time);
+
+		dev_dbg(ds->dev, "Time since TAS started: [%lld.%09ld]\n",
+			diff.tv_sec, diff.tv_nsec);
+		break;
+
+	default:
+		if (net_ratelimit())
+			dev_err(ds->dev, "TAS in an invalid state (incorrect use of API)!\n");
+	}
+
+	if (rc && net_ratelimit())
+		dev_err(ds->dev, "An operation returned %d\n", rc);
+
+	mutex_unlock(&ptp_data->lock);
+}
+
+void sja1105_tas_clockstep(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+
+	if (!tas_data->enabled)
+		return;
+
+	tas_data->last_op = SJA1105_PTP_CLOCKSTEP;
+	schedule_work(&tas_data->tas_work);
+}
+
+void sja1105_tas_adjfreq(struct sja1105_private *priv)
+{
+	struct sja1105_tas_data *tas_data = &priv->tas_data;
+
+	if (!tas_data->enabled)
+		return;
+
+	tas_data->last_op = SJA1105_PTP_ADJUSTFREQ;
+	schedule_work(&tas_data->tas_work);
+}
+
 void sja1105_tas_setup(struct sja1105_private *priv)
 {
+	INIT_WORK(&priv->tas_data.tas_work, sja1105_tas_state_machine);
+	priv->tas_data.state = SJA1105_TAS_STATE_DISABLED;
+	priv->tas_data.last_op = SJA1105_PTP_NONE;
 }
 
 void sja1105_tas_teardown(struct sja1105_private *priv)
@@ -414,6 +822,8 @@ void sja1105_tas_teardown(struct sja1105_private *priv)
 	struct sja1105_tas_data *tas_data = &priv->tas_data;
 	int port;
 
+	cancel_work_sync(&tas_data->tas_work);
+
 	for (port = 0; port < SJA1105_NUM_PORTS; port++)
 		if (tas_data->config[port])
 			taprio_free(tas_data->config[port]);
diff --git a/drivers/net/dsa/sja1105/sja1105_tas.h b/drivers/net/dsa/sja1105/sja1105_tas.h
index 0ef82810d9d7..ecc95624e3f6 100644
--- a/drivers/net/dsa/sja1105/sja1105_tas.h
+++ b/drivers/net/dsa/sja1105/sja1105_tas.h
@@ -8,8 +8,27 @@
 
 #if IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS)
 
+enum sja1105_tas_state {
+	SJA1105_TAS_STATE_DISABLED,
+	SJA1105_TAS_STATE_ENABLED_NOT_RUNNING,
+	SJA1105_TAS_STATE_RUNNING,
+};
+
+enum sja1105_ptp_op {
+	SJA1105_PTP_NONE,
+	SJA1105_PTP_CLOCKSTEP,
+	SJA1105_PTP_ADJUSTFREQ,
+};
+
 struct sja1105_tas_data {
 	struct tc_taprio_qopt_offload *config[SJA1105_NUM_PORTS];
+	enum sja1105_tas_state state;
+	enum sja1105_ptp_op last_op;
+	struct work_struct tas_work;
+	s64 earliest_base_time;
+	s64 oper_base_time;
+	u64 max_cycle_time;
+	bool enabled;
 };
 
 int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
@@ -19,6 +38,10 @@ void sja1105_tas_setup(struct sja1105_private *priv);
 
 void sja1105_tas_teardown(struct sja1105_private *priv);
 
+void sja1105_tas_clockstep(struct sja1105_private *priv);
+
+void sja1105_tas_adjfreq(struct sja1105_private *priv);
+
 #else
 
 /* C doesn't allow empty structures, bah! */
@@ -37,6 +60,10 @@ static inline void sja1105_tas_setup(struct sja1105_private *priv) { }
 
 static inline void sja1105_tas_teardown(struct sja1105_private *priv) { }
 
+static inline void sja1105_tas_clockstep(struct sja1105_private *priv) { }
+
+static inline void sja1105_tas_adjfreq(struct sja1105_private *priv) { }
+
 #endif /* IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS) */
 
 #endif /* _SJA1105_TAS_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers
  2019-09-02 16:25 ` [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers Vladimir Oltean
@ 2019-09-04  7:50   ` Kurt Kanzenbach
  0 siblings, 0 replies; 33+ messages in thread
From: Kurt Kanzenbach @ 2019-09-04  7:50 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong, netdev

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Mon, Sep 02, 2019 at 07:25:39PM +0300, Vladimir Oltean wrote:
> DSA currently handles shared block filters (for the classifier-action
> qdisc) in the core due to what I believe are simply pragmatic reasons -
> hiding the complexity from drivers and offerring a simple API for port
> mirroring.
>
> Extend the dsa_slave_setup_tc function by passing all other qdisc
> offloads to the driver layer, where the driver may choose what it
> implements and how. DSA is simply a pass-through in this case.
>
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>

Acked-by: Kurt Kanzenbach <kurt@linutronix.de>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (14 preceding siblings ...)
  2019-09-02 16:25 ` [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source Vladimir Oltean
@ 2019-09-06 12:54 ` David Miller
  2019-09-07 14:45   ` Andrew Lunn
  2019-09-07 13:55 ` David Miller
  16 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2019-09-06 12:54 UTC (permalink / raw)
  To: olteanv
  Cc: f.fainelli, vivien.didelot, andrew, vinicius.gomes, vedang.patel,
	richardcochran, weifeng.voon, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

From: Vladimir Oltean <olteanv@gmail.com>
Date: Mon,  2 Sep 2019 19:25:29 +0300

> This is the first attempt to submit the tc-taprio offload model for
> inclusion in the net tree.

Someone really needs to review this.

I'm not applying this patch series until someone knowledgable in this
area does some kind of review.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
                   ` (15 preceding siblings ...)
  2019-09-06 12:54 ` [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA David Miller
@ 2019-09-07 13:55 ` David Miller
  2019-09-09 23:49   ` Gomes, Vinicius
  16 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2019-09-07 13:55 UTC (permalink / raw)
  To: olteanv
  Cc: f.fainelli, vivien.didelot, andrew, vinicius.gomes, vedang.patel,
	richardcochran, weifeng.voon, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev


This is a warning that I will toss this patch series if it receives no series
review in the next couple of days.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-06 12:54 ` [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA David Miller
@ 2019-09-07 14:45   ` Andrew Lunn
  2019-09-08 11:07     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Lunn @ 2019-09-07 14:45 UTC (permalink / raw)
  To: David Miller, olteanv
  Cc: olteanv, f.fainelli, vivien.didelot, vinicius.gomes,
	vedang.patel, richardcochran, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

On Fri, Sep 06, 2019 at 02:54:03PM +0200, David Miller wrote:
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Mon,  2 Sep 2019 19:25:29 +0300
> 
> > This is the first attempt to submit the tc-taprio offload model for
> > inclusion in the net tree.
> 
> Someone really needs to review this.

Hi Vladimir

You might have more chance getting this reviewed if you split it up
into a number of smaller series. Richard could probably review the
plain PTP changes. Who else has worked on tc-taprio recently? A series
purely about tc-taprio might be more likely reviewed by a tc-taprio
person, if it does not contain PTP changes.

    Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-07 14:45   ` Andrew Lunn
@ 2019-09-08 11:07     ` Vladimir Oltean
  2019-09-08 20:42       ` Andrew Lunn
  2019-09-09  7:04       ` Richard Cochran
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-08 11:07 UTC (permalink / raw)
  To: Andrew Lunn, David Miller
  Cc: f.fainelli, vivien.didelot, vinicius.gomes, vedang.patel,
	richardcochran, weifeng.voon, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Andrew, David,

On Sep 7, 2019, at 3:46 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Fri, Sep 06, 2019 at 02:54:03PM +0200, David Miller wrote:
>>
>>  From: Vladimir Oltean <olteanv@gmail.com>
>>  Date: Mon,  2 Sep 2019 19:25:29 +0300
>>
>>>
>>>  This is the first attempt to submit the tc-taprio offload model for
>>>  inclusion in the net tree.
>>
>>
>>  Someone really needs to review this.
>
> Hi Vladimir
>
> You might have more chance getting this reviewed if you split it up
> into a number of smaller series. Richard could probably review the
> plain PTP changes. Who else has worked on tc-taprio recently? A series
> purely about tc-taprio might be more likely reviewed by a tc-taprio
> person, if it does not contain PTP changes.
>
>     Andrew

I think Richard has been there when the taprio, etf qdiscs, SO_TXTIME
were first defined and developed:
https://patchwork.ozlabs.org/cover/808504/
I expect he is capable of delivering a competent review of the entire
series, possibly way more competent than my patch set itself.

The reason why I'm not splitting it up is because I lose around 10 ns
of synchronization offset when using the hardware-corrected PTPCLKVAL
clock for timestamping rather than the PTPTSCLK free-running counter.
This is mostly due to the fact that SPI interaction is reduced to a
minimum when correcting the switch's PHC in software - OTOH when that
correction translates into SPI writes to PTPCLKADD/PTPCLKVAL and
PTPCLKRATE, that's when things go a bit downhill with the precision.
Now the compromise is fully acceptable if the PTP clock is to be used
as the trigger source for the time-aware scheduler, but the conversion
would be quite pointless with no user to really require the hardware
clock.

Additionally, the 802.1AS PTP profile even calls for switches and
end-stations to use timestamping counters that are free-running, and
scale&rate-correct those in software - due to a perceived "double
feedback loop", or "changing the ruler while measuring with it". Now
I'm no expert at all, but it would be interesting if we went on with
the discussion in the direction of what Linux is currently
understanding by a "free-running" PTP counter. On one hand there's the
timecounter/cyclecounter in the kernel which makes for a
software-corrected PHC, and on the other there's the free_running
option in linuxptp which makes for a "nowhere-corrected" PHC that is
only being used in the E2E_TC and P2P_TC profiles. But user space
otherwise has no insight into the PHC implementation from the kernel,
and "free_running" from ptp4l can't really be used to implement the
synchronization mechanism required by 802.1AS.

To me, the most striking aspect is that this particular recommendation
from 802.1AS is at direct odds with 802.1Qbv (time-based egress) /
802.1Qci (time-based ingress policing) which clearly require a PTP
counter in the NIC that ticks to the wall clock, and not to a random
free-running time since boot up. I simply can't seem to reconcile the
two.
What this particular switch does is that it permits RX and TX
timestamps to be taken in either corrected or uncorrected timebases
(but unfortunately not both at the same time). I think the hardware
designers' idea was to take timestamps off the uncorrected clock
(PTPTSCLK) and then do a sort of phc2sys-to-itself: write the
software-corrected value of the timecounter/cyclecounter into the
PTPCLKVAL hardware registers which get used for Qbv/Qci.
Actually I hate to use those terms when talking about SJA1105 hardware
support, since it's more "in the style of" IEEE rather than strict
compliance (timing of the design vs the standard might have played a
role as well).

But let's leave 802.1AS aside for a second - that's not what the patch
set is about, but rather a bit of background on why there are 2 PTP
clocks in this switch, and why I'm switching from one to the other.
Richard didn't really warm up to the phc2sys-to-itself idea in the
past, and opted for simplicity: just use the hardware-corrected
PTPCLKVAL for everything, which is exactly what I'm doing as of now.

The only people whom I know are working on TSN stuff are mostly
entrenched in papers, standards and generally in the hardware-only
mentality. There is obviously a lot to be done for Linux to be a
proper TSN endpoint, and RT is a big one. For a switch in particular,
things are a bit easier due to the fact that it just needs to ensure
the real-time guarantees of a frame that was supposedly already
delivered in-band with the schedule. And there's no other way to do
that rather than through a hardware offload - otherwise the software
tc-taprio would only shape the frames egressed by the management CPU
of the switch. The tc-taprio offload for a switch only makes sense
when taken together with the bridging offload, if you will.

I "dared" to submit this for merging maybe because I don't see the
subtleties that prevent it from going in, at least for a switch - it
just works and does the job. I would have loved to see this in 5.4
just so I would have to lug around a bit less patches when finally
starting to evaluate the endpoint side of things with the 5.4-rt
patch. But nonetheless, there's no hurry and getting a healthy
discussion going is surely more important than the patches themselves
are. On the other hand there needs to be a balance, and just talking
with no code is no good either - fixes, improvements, rework can
always come later once we commit to the basic offload model.

I happen to be around at Plumbers during the following days to learn
what else is going on in the Linux community, and develop a more
complete mental model for myself for how TSN fits in with all of that.
If anybody happens to also be around, I'd be more than happy to talk.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-08 11:07     ` Vladimir Oltean
@ 2019-09-08 20:42       ` Andrew Lunn
  2019-09-09  6:52         ` Richard Cochran
  2019-09-09 12:36         ` Joergen Andreasen
  2019-09-09  7:04       ` Richard Cochran
  1 sibling, 2 replies; 33+ messages in thread
From: Andrew Lunn @ 2019-09-08 20:42 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David Miller, f.fainelli, vivien.didelot, vinicius.gomes,
	vedang.patel, richardcochran, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

On Sun, Sep 08, 2019 at 12:07:27PM +0100, Vladimir Oltean wrote:
> I think Richard has been there when the taprio, etf qdiscs, SO_TXTIME
> were first defined and developed:
> https://patchwork.ozlabs.org/cover/808504/
> I expect he is capable of delivering a competent review of the entire
> series, possibly way more competent than my patch set itself.
> 
> The reason why I'm not splitting it up is because I lose around 10 ns
> of synchronization offset when using the hardware-corrected PTPCLKVAL
> clock for timestamping rather than the PTPTSCLK free-running counter.

Hi Vladimir

I'm not suggesting anything is wrong with your concept, when i say
split it up. It is more than when somebody sees 15 patches, they
decide they don't have the time at the moment, and put it off until
later. And often later never happens. If however they see a smaller
number of patches, they think that yes they have time now, and do the
review.

So if you are struggling to get something reviewed, make it more
appealing for the reviewer. Salami tactics.

    Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-08 20:42       ` Andrew Lunn
@ 2019-09-09  6:52         ` Richard Cochran
  2019-09-09 12:36         ` Joergen Andreasen
  1 sibling, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2019-09-09  6:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, David Miller, f.fainelli, vivien.didelot,
	vinicius.gomes, vedang.patel, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

On Sun, Sep 08, 2019 at 10:42:24PM +0200, Andrew Lunn wrote:
> So if you are struggling to get something reviewed, make it more
> appealing for the reviewer. Salami tactics.

+1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-08 11:07     ` Vladimir Oltean
  2019-09-08 20:42       ` Andrew Lunn
@ 2019-09-09  7:04       ` Richard Cochran
  1 sibling, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2019-09-09  7:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, David Miller, f.fainelli, vivien.didelot,
	vinicius.gomes, vedang.patel, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

On Sun, Sep 08, 2019 at 12:07:27PM +0100, Vladimir Oltean wrote:
> I think Richard has been there when the taprio, etf qdiscs, SO_TXTIME
> were first defined and developed:
> https://patchwork.ozlabs.org/cover/808504/
> I expect he is capable of delivering a competent review of the entire
> series, possibly way more competent than my patch set itself.

I am really not familiar with the taprio/qdisc stuff.  Sorry.
 
> Additionally, the 802.1AS PTP profile even calls for switches and
> end-stations to use timestamping counters that are free-running, and
> scale&rate-correct those in software - due to a perceived "double
> feedback loop", or "changing the ruler while measuring with it". Now
> I'm no expert at all, but it would be interesting if we went on with
> the discussion in the direction of what Linux is currently
> understanding by a "free-running" PTP counter. On one hand there's the
> timecounter/cyclecounter in the kernel which makes for a
> software-corrected PHC, and on the other there's the free_running
> option in linuxptp which makes for a "nowhere-corrected" PHC that is
> only being used in the E2E_TC and P2P_TC profiles. But user space
> otherwise has no insight into the PHC implementation from the kernel,
> and "free_running" from ptp4l can't really be used to implement the
> synchronization mechanism required by 802.1AS.

That just isn't true.  We have already done this for end stations.

> To me, the most striking aspect is that this particular recommendation
> from 802.1AS is at direct odds with 802.1Qbv (time-based egress) /
> 802.1Qci (time-based ingress policing) which clearly require a PTP
> counter in the NIC that ticks to the wall clock, and not to a random
> free-running time since boot up. I simply can't seem to reconcile the
> two.

Well, yeah.  The various PTP standards and profiles dream up whatever
they want.  The HW we get dictates what is actually possible.

> But let's leave 802.1AS aside for a second - that's not what the patch
> set is about, but rather a bit of background on why there are 2 PTP
> clocks in this switch, and why I'm switching from one to the other.
> Richard didn't really warm up to the phc2sys-to-itself idea in the
> past, and opted for simplicity: just use the hardware-corrected
> PTPCLKVAL for everything, which is exactly what I'm doing as of now.

If you really want to make an 802.1-AS bridge, then

1. You can leave the clock free running, and

2. you don't need to synchronize the Linux system clock at all.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-08 20:42       ` Andrew Lunn
  2019-09-09  6:52         ` Richard Cochran
@ 2019-09-09 12:36         ` Joergen Andreasen
  2019-09-10  1:46           ` Vladimir Oltean
  1 sibling, 1 reply; 33+ messages in thread
From: Joergen Andreasen @ 2019-09-09 12:36 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, David Miller, f.fainelli, vivien.didelot,
	vinicius.gomes, vedang.patel, richardcochran, weifeng.voon, jiri,
	m-karicheri2, Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

The 09/08/2019 22:42, Andrew Lunn wrote:
> On Sun, Sep 08, 2019 at 12:07:27PM +0100, Vladimir Oltean wrote:
> > I think Richard has been there when the taprio, etf qdiscs, SO_TXTIME
> > were first defined and developed:
> > https://patchwork.ozlabs.org/cover/808504/
> > I expect he is capable of delivering a competent review of the entire
> > series, possibly way more competent than my patch set itself.
> > 
> > The reason why I'm not splitting it up is because I lose around 10 ns
> > of synchronization offset when using the hardware-corrected PTPCLKVAL
> > clock for timestamping rather than the PTPTSCLK free-running counter.
> 
> Hi Vladimir
> 
> I'm not suggesting anything is wrong with your concept, when i say
> split it up. It is more than when somebody sees 15 patches, they
> decide they don't have the time at the moment, and put it off until
> later. And often later never happens. If however they see a smaller
> number of patches, they think that yes they have time now, and do the
> review.
> 
> So if you are struggling to get something reviewed, make it more
> appealing for the reviewer. Salami tactics.
> 
>     Andrew

I vote for splitting it up.
I don't know enough about PTP and taprio/qdisc to review the entire series
but the interface presented in patch 09/15 fits well with our future TSN
switches.

Joergen Andreasen, Microchip

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-07 13:55 ` David Miller
@ 2019-09-09 23:49   ` Gomes, Vinicius
  2019-09-10  1:06     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Gomes, Vinicius @ 2019-09-09 23:49 UTC (permalink / raw)
  To: David Miller, olteanv
  Cc: f.fainelli, vivien.didelot, andrew, Patel, Vedang,
	richardcochran, Voon, Weifeng, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Vladimir,

> This is a warning that I will toss this patch series if it receives no series review in
> the next couple of days.

Sorry about the delay on reviewing this. On top on the usual business, some changes to the
IT infrastructure here have hit my email workflow pretty hard.

I am taking a look at the datasheet in the meantime, it's been a long time since I looked at it, 
the idea is to help review the scheduler from hell :-)

One thing that wasn't clear is what you did to test this series.

Cheers,
--
Vinicius



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-09 23:49   ` Gomes, Vinicius
@ 2019-09-10  1:06     ` Vladimir Oltean
  2019-09-11  0:45       ` Gomes, Vinicius
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-10  1:06 UTC (permalink / raw)
  To: Gomes, Vinicius
  Cc: David Miller, f.fainelli, vivien.didelot, andrew, Patel, Vedang,
	richardcochran, Voon, Weifeng, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Vinicius!

On 10/09/2019, Gomes, Vinicius <vinicius.gomes@intel.com> wrote:
> Hi Vladimir,
>
>> This is a warning that I will toss this patch series if it receives no
>> series review in
>> the next couple of days.
>
> Sorry about the delay on reviewing this. On top on the usual business, some
> changes to the
> IT infrastructure here have hit my email workflow pretty hard.
>

No problem, I've also been traveling and hence delaying patching some
taprio issues we discussed last week.

> I am taking a look at the datasheet in the meantime, it's been a long time
> since I looked at it,
> the idea is to help review the scheduler from hell :-)
>

Ok, but don't get hung up on it :)

> One thing that wasn't clear is what you did to test this series.
>

Right, this is one particular aspect I didn't really insist on a lot,
and I hope I'm not going to lose everybody when explaining it, because
it requires a bit of understanding of how sja1105 integrates with DSA
overall.
The basic idea is that none of the switch's ports is special in any
way from a hardware perspective, and that includes the "CPU port". But
to support the DSA paradigm of annotating frames that go towards the
CPU with information about the source port they came from, I am
repurposing VLAN tags with a customized EtherType (0xdadb instead of
0x8100).
This is relevant because to a 802.1Q bridge, the QoS hints come from:
- The 3-bit PCP field from the VLAN header
- A default, port-based VLAN PCP in case RX traffic is untagged (I
would also like to have a knob to change this, currently hardcoded to
0 in the driver!)
So to inject a frame into a sja1105 TX queue means to annotate it with
a VLAN PCP which maps to that queue. In the datasheet there is a
VLAN_PMAP register that manages the ingress-priority ->
egress-priority -> egress-queue mapping. I'm keeping that hardcoded to
1:1:1 for sanity.
Now back to the driver's use of the VLAN header.
- When the sja1105 operates as a bridge with vlan_filtering=1, the
VLANs are installed by the user (via the bridge command from
iproute2), parsed by the switch and VLAN-tagged traffic is expected to
be received from the connected ports. So it honors the VLAN PCP in
this mode.
- When it isn't (it is either a VLAN-unaware bridge, or 4x standalone
ports), then the VLAN header (with custom EtherType) is used to route
frames from the CPU towards the correct egress switch port. A
consequence of it still being parsed by the switch as VLAN is that the
host Linux system is able to specify the VLAN PCP to mean "inject in
this egress queue".

Now because the EtherType changes between these modes of operation,
the switch can either expose the VLAN PCP to (a) the host Linux netdev
queues (as DSA sees them*), or (b) to the devices connected to its
external ports.
* When operating in vlan_filtering=1 mode, technically the sja1105
becomes a "managed dumb switch" (control traffic: PTP, STP etc still
works, but for general purpose traffic you must now open your socket
on the DSA master netdevice, not the switch ports). So the DSA master
netdevice is in fact just another node connected to the switch in this
mode, for all the hardware cares. So technically you _can_ still do
QoS from the host Linux if you put a VLAN sub-interface on top of the
DSA master netdevice.

Now, to finally answer your question. I have used the sja1105 as a
bridge between two endpoints who are sending/receiving VLAN-tagged
traffic in a 3-board network synchronized by PTP. There is a schedule
configured on the switch that is aligned to the beginning of the
second, and the cycle time is known. PTP uses traffic class 7, and the
scheduled traffic uses traffic class 5.
The traffic sender is not too complicated: it's a raw L2 socket that
is sending scheduled traffic based on calls to
clock_nanosleep(CLOCK_REALTIME) and an a-priori knowledge of the
network schedule (it's invoked from a script), minus an advance time.
The reason I'm not sharing too many details about the traffic sender
now is that I just configured the advance time experimentally and
there's no hard guarantee that its egress latency will be smaller and
that the frames will always be sent on time. But the sender's
CLOCK_REALTIME is in sync with its /dev/ptp0 by phc2sys, that's why I
can poll it instead of polling the hardware clock.
Then I am taking TX and RX timestamps for the scheduled traffic on the
sender and on the receiver. I can do a reasonable diff between the 2
timestamps because the PHCs are kept in sync by PTP, and that is my
path delay. I expect it to be more or less 2x a single link's path
delay (sender -> bridge + bridge -> receiver), and not in any case a
multiple of the cycle time (which is a sign that cycles were missed).

As for the sja1105-as-endpoint use case, I checked that I can inject
traffic into each particular queue, but I didn't really explore it
further.

I'll make sure this subtlety is more clearly formulated in the next
version of the patch.

> Cheers,
> --
> Vinicius
>
>
>

Actually let me ask you a few questions as well:

- I'm trying to understand what is the correct use of the tc-mqprio
"queues" argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
1@6 1@7", which I believe is equivalent to not specifying it at all? I
believe it should be interpreted as: "allocate this many netdev queues
for each traffic class", where "traffic class" means a group of queues
having the same priority (equal to the traffic class's number), but
engaged in a strict priority scheme with other groups of queues
(traffic classes). Right?

- DSA can only formally support multi-queue, because its connection to
the Linux host is through an Ethernet MAC (FIFO). Even if the DSA
master netdevice may be multi-queue, allocating and separating those
queues for each front-panel switch port is a task best left to the
user/administrator. This means that DSA should reject all other
"queues" mappings except the trivial one I pointed to above?

- I'm looking at the "tc_mask_to_queue_mask" function that I'm
carrying along from your initial offload RFC. Are you sure this is the
right approach? I don't feel a need to translate from traffic class to
netdev queues, considering that in the general case, a traffic class
is a group of queues, and 802.1Qbv doesn't really specify that you can
gate individual queues from a traffic class. In the software
implementation you are only looking at netdev_get_prio_tc_map, which
is not equivalent as far as my understanding goes, but saner.
Actually 802.1Q-2018 does not really clarify this either. It looks to
me like they use the term "queue" and "traffic class" interchangeably.
See two examples below (emphasis mine):

Q.2 Using gate operations to create protected windows
The enhancements for scheduled traffic described in 8.6.8.4 allow
transmission to be switched on and off on a timed basis for each
_traffic class_ that is implemented on a port. This switching is
achieved by means of individual on/off transmission gates associated
with each _traffic class_ and a list of gate operations that control
the gates; an individual SetGateStates operation has a time delay
parameter that indicates the delay after the gate operation is
executed until the next operation is to occur, and a GateState
parameter that defines a vector of up to eight state values (open or
closed) that is to be applied to each gate when the operation is
executed. The gate operations allow any combination of open/closed
states to be defined, and the mechanism makes no assumptions about
which _traffic classes_ are being “protected” and which are
“unprotected”; any such assumptions are left to the designer of the
sequence of gate operations.

Table 8-7—Gate operations
The GateState parameter indicates a value, open or closed, for each of
the Port’s _queues_.

- What happens with the "clockid" argument now that hardware offload
is possible? Do we allow "/dev/ptp0" to be specified as input?
Actually this question is relevant to your txtime-assist mode as well:
doesn't it assume that there is an implicit phc2sys instance running
to keep the system time in sync with the PHC?

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-09 12:36         ` Joergen Andreasen
@ 2019-09-10  1:46           ` Vladimir Oltean
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-10  1:46 UTC (permalink / raw)
  To: Joergen Andreasen
  Cc: Andrew Lunn, David Miller, f.fainelli, vivien.didelot,
	vinicius.gomes, vedang.patel, richardcochran, weifeng.voon, jiri,
	m-karicheri2, Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong,
	kurt.kanzenbach, netdev

Hi Andrew, Joergen, Richard,

On 09/09/2019, Joergen Andreasen <joergen.andreasen@microchip.com> wrote:
> The 09/08/2019 22:42, Andrew Lunn wrote:
>> On Sun, Sep 08, 2019 at 12:07:27PM +0100, Vladimir Oltean wrote:
>> > I think Richard has been there when the taprio, etf qdiscs, SO_TXTIME
>> > were first defined and developed:
>> > https://patchwork.ozlabs.org/cover/808504/
>> > I expect he is capable of delivering a competent review of the entire
>> > series, possibly way more competent than my patch set itself.
>> >
>> > The reason why I'm not splitting it up is because I lose around 10 ns
>> > of synchronization offset when using the hardware-corrected PTPCLKVAL
>> > clock for timestamping rather than the PTPTSCLK free-running counter.
>>
>> Hi Vladimir
>>
>> I'm not suggesting anything is wrong with your concept, when i say
>> split it up. It is more than when somebody sees 15 patches, they
>> decide they don't have the time at the moment, and put it off until
>> later. And often later never happens. If however they see a smaller
>> number of patches, they think that yes they have time now, and do the
>> review.
>>
>> So if you are struggling to get something reviewed, make it more
>> appealing for the reviewer. Salami tactics.
>>
>>     Andrew
>
> I vote for splitting it up.
> I don't know enough about PTP and taprio/qdisc to review the entire series
> but the interface presented in patch 09/15 fits well with our future TSN
> switches.
>
> Joergen Andreasen, Microchip
>

Thanks for the feedback. I split the PTP portion that is loosely
coupled (patches 01-07) into a different series. The rest is qdisc
stuff and hardware implementation details. They belong together
because it would be otherwise strange to provide an interface with no
user. You can still review only the patches you are interested in,
however.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-10  1:06     ` Vladimir Oltean
@ 2019-09-11  0:45       ` Gomes, Vinicius
  2019-09-11 11:51         ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Gomes, Vinicius @ 2019-09-11  0:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David Miller, f.fainelli, vivien.didelot, andrew, Patel, Vedang,
	richardcochran, Voon, Weifeng, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Vladimir,

[...]

> 
> I'll make sure this subtlety is more clearly formulated in the next version of the
> patch.
> 

Ack.

> Actually let me ask you a few questions as well:
> 
> - I'm trying to understand what is the correct use of the tc-mqprio "queues"
> argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
> 1@6 1@7", which I believe is equivalent to not specifying it at all? I believe it
> should be interpreted as: "allocate this many netdev queues for each traffic
> class", where "traffic class" means a group of queues having the same priority
> (equal to the traffic class's number), but engaged in a strict priority scheme with
> other groups of queues (traffic classes). Right?

Specifying the "queues" is mandatory, IIRC. Yeah, your reading of those arguments
for you example matches mine.

So you mean, that you only tested situations when only one queue is "open" at a time?
I think this is another good thing to test.

> 
> - DSA can only formally support multi-queue, because its connection to the Linux
> host is through an Ethernet MAC (FIFO). Even if the DSA master netdevice may
> be multi-queue, allocating and separating those queues for each front-panel
> switch port is a task best left to the user/administrator. This means that DSA
> should reject all other "queues" mappings except the trivial one I pointed to
> above?
> 
> - I'm looking at the "tc_mask_to_queue_mask" function that I'm carrying along
> from your initial offload RFC. Are you sure this is the right approach? I don't feel
> a need to translate from traffic class to netdev queues, considering that in the
> general case, a traffic class is a group of queues, and 802.1Qbv doesn't really
> specify that you can gate individual queues from a traffic class. In the software
> implementation you are only looking at netdev_get_prio_tc_map, which is not
> equivalent as far as my understanding goes, but saner.
> Actually 802.1Q-2018 does not really clarify this either. It looks to me like they
> use the term "queue" and "traffic class" interchangeably.
> See two examples below (emphasis mine):

I spent quite a long time thinking about this, still not sure that I got it right. Let me begin
with the objective for that "translation". Scheduled traffic only makes sense when
the whole network shares the same schedule, so, I wanted a way so I minimize the
amount of information of each schedule that's controller dependent, Linux already 
does most of it with the separation of traffic classes and queues (you are right that 
802.1Q is confusing on this), the idea is that the only thing that needs to change from 
one node to another in the network is the "queues" parameter. Because each node might 
have different number of queues, or assign different priorities to different queues.  

So, that's the idea of doing that intermediate "transformation" step: taprio knows about
traffic classes and HW queues, but the driver only knows about HW queues. And unless I made
a mistake, tc_mask_to_queue_mask() should be equivalent to:  

netdev_get_prio_tc_map() + scanning the gatemask for BIT(tc).

(Thinking more about this, I am having a few ideas about ways to simplify software mode :-)

> 
> Q.2 Using gate operations to create protected windows The enhancements for
> scheduled traffic described in 8.6.8.4 allow transmission to be switched on and
> off on a timed basis for each _traffic class_ that is implemented on a port. This
> switching is achieved by means of individual on/off transmission gates
> associated with each _traffic class_ and a list of gate operations that control the
> gates; an individual SetGateStates operation has a time delay parameter that
> indicates the delay after the gate operation is executed until the next operation
> is to occur, and a GateState parameter that defines a vector of up to eight state
> values (open or
> closed) that is to be applied to each gate when the operation is executed. The
> gate operations allow any combination of open/closed states to be defined, and
> the mechanism makes no assumptions about which _traffic classes_ are being
> “protected” and which are “unprotected”; any such assumptions are left to the
> designer of the sequence of gate operations.
> 
> Table 8-7—Gate operations
> The GateState parameter indicates a value, open or closed, for each of the
> Port’s _queues_.
> 
> - What happens with the "clockid" argument now that hardware offload is
> possible? Do we allow "/dev/ptp0" to be specified as input?
> Actually this question is relevant to your txtime-assist mode as well:
> doesn't it assume that there is an implicit phc2sys instance running to keep the
> system time in sync with the PHC?

That's a very interesting question. I think, for now, allowing specifying /dev/ptp* clocks
won't work "always": if the driver or something needs to add a timer to be able to run 
the schedule, it won't be able to use /dev/ptp* clocks (hrtimers and ptp clocks don’t mix).
But for "full" offloads, it should work.

So, you are right, taprio and txtime-assisted (and ETF) require the system clock and phc 
clock to be synchronized, via something like phc2sys.

Hope I got all your questions.

Cheers,
--
Vinicius


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
  2019-09-11  0:45       ` Gomes, Vinicius
@ 2019-09-11 11:51         ` Vladimir Oltean
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-11 11:51 UTC (permalink / raw)
  To: Gomes, Vinicius
  Cc: David Miller, f.fainelli, vivien.didelot, andrew, Patel, Vedang,
	richardcochran, Voon, Weifeng, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Vinicius,

On 11/09/2019, Gomes, Vinicius <vinicius.gomes@intel.com> wrote:
> Hi Vladimir,
>
> [...]
>
>>
>> I'll make sure this subtlety is more clearly formulated in the next
>> version of the
>> patch.
>>
>
> Ack.
>
>> Actually let me ask you a few questions as well:
>>
>> - I'm trying to understand what is the correct use of the tc-mqprio
>> "queues"
>> argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
>> 1@6 1@7", which I believe is equivalent to not specifying it at all? I
>> believe it
>> should be interpreted as: "allocate this many netdev queues for each
>> traffic
>> class", where "traffic class" means a group of queues having the same
>> priority
>> (equal to the traffic class's number), but engaged in a strict priority
>> scheme with
>> other groups of queues (traffic classes). Right?
>
> Specifying the "queues" is mandatory, IIRC. Yeah, your reading of those
> arguments
> for you example matches mine.
>
> So you mean, that you only tested situations when only one queue is "open"
> at a time?
> I think this is another good thing to test.
>

No, I tested (using the "gatemask" shell function I wrote as a wrapper
for the SetGateStates command in tc-taprio) a schedule comprised of:
gatemask 7 # PTP
gatemask 5 # My scheduled traffic with clock_nanosleep()
gatemask "0 1 2 3 4 6" # Everything else

>>
>> - DSA can only formally support multi-queue, because its connection to the
>> Linux
>> host is through an Ethernet MAC (FIFO). Even if the DSA master netdevice
>> may
>> be multi-queue, allocating and separating those queues for each
>> front-panel
>> switch port is a task best left to the user/administrator. This means that
>> DSA
>> should reject all other "queues" mappings except the trivial one I pointed
>> to
>> above?
>>
>> - I'm looking at the "tc_mask_to_queue_mask" function that I'm carrying
>> along
>> from your initial offload RFC. Are you sure this is the right approach? I
>> don't feel
>> a need to translate from traffic class to netdev queues, considering that
>> in the
>> general case, a traffic class is a group of queues, and 802.1Qbv doesn't
>> really
>> specify that you can gate individual queues from a traffic class. In the
>> software
>> implementation you are only looking at netdev_get_prio_tc_map, which is
>> not
>> equivalent as far as my understanding goes, but saner.
>> Actually 802.1Q-2018 does not really clarify this either. It looks to me
>> like they
>> use the term "queue" and "traffic class" interchangeably.
>> See two examples below (emphasis mine):
>
> I spent quite a long time thinking about this, still not sure that I got it
> right. Let me begin
> with the objective for that "translation". Scheduled traffic only makes
> sense when
> the whole network shares the same schedule, so, I wanted a way so I minimize
> the
> amount of information of each schedule that's controller dependent, Linux
> already
> does most of it with the separation of traffic classes and queues (you are
> right that
> 802.1Q is confusing on this), the idea is that the only thing that needs to
> change from
> one node to another in the network is the "queues" parameter. Because each
> node might
> have different number of queues, or assign different priorities to different
> queues.
>
> So, that's the idea of doing that intermediate "transformation" step: taprio
> knows about
> traffic classes and HW queues, but the driver only knows about HW queues.

Not necessarily, I think.
The "other" TSN-capable SoC I know of - the NXP LS1028A, has a
standalone Ethernet controller (drivers/net/ethernet/freescale/enetc)
and an embedded L2 switch (not upstream yet). The ENETC has a
configurable number of TX rings per port. Each TX ring has an
"internal priority value" (IPV) and there is an IPV-to-TC mapping
register. The enetc driver keeps the rings with equal priorities under
normal circumstances (and affines 1 TX ring per core) - the idea being
to spread the load. In ndo_setup_tc for mqprio, they allocate num_tc
TX rings and they put them in strict priority mode by configuring the
IPV (internally mapped 1-to-1 to TC) as increasing values for each
ring.
Then the TSN egress scheduler is wired to look at the traffic class of
each frame, via the TX ring is was enqueued on, mapped to the IPV,
mapped to the TC.
The embedded switch in LS1028A is mostly the same if I just consider
the egress portion.
And the sja1105 doesn't really make a distinction between egress
priority queue and traffic class. They are hardcoded 1-to-1 in the
egress port.

> And unless I made
> a mistake, tc_mask_to_queue_mask() should be equivalent to:
>
> netdev_get_prio_tc_map() + scanning the gatemask for BIT(tc).
>

Yes, but my point is: do you know of any hardware implementation that
schedules traffic per-queue (in a situation where the queue-to-tc
mapping is not 1-to-1)? I know of 3 that don't. So if you translate
traffic class into netdev queue, then these drivers would just need to
translate it back into traffic class for programming the full offload.
The hardware doesn't know anything about the netdev queues.
Or are you saying that the driver doesn't need to care (or may not
care) about the traffic class and you're trying to make their life
easier? But my point is that with an mqprio-type offload, both the
driver and the stack already need to be fully aware of the traffic
class. See for example this snippet from skb_tx_hash, which is called
from netdev_pick_tx:

	if (dev->num_tc) {
		u8 tc = netdev_get_prio_tc_map(dev, skb->priority);

		qoffset = sb_dev->tc_to_txq[tc].offset;
		qcount = sb_dev->tc_to_txq[tc].count;
	}

So the stack does tx hashing to pick a queue only from the queue pool
that the driver is supposed to assign a strict hardware priority. It
has this awareness because it's not supposed to hash between queues of
different priorities (which is akin to playing Russian roulette). And
of course the driver needs to ensure that each netdev queue is
correctly assigned to a traffic class (which may mean something to do,
or not).
My suggestion is: let's keep the SetGateStates semantics operate on
traffic classes for the full offload, just like for the software
implementation. If for whatever reason the driver needs to associate
the tc with a tx queue, let them do it privately and not imprint it
into the qdisc interface.

I think your mindset that the driver does not know about the traffic
class is because the taprio offload structure does not pass that info
to it, like mqprio does? But you kindly provide that info indirectly
to both the stack and the driver, through the netdev_set_tc_queue and
netdev_set_prio_tc_map calls, so the driver should have all the rope
it needs (maybe except num_tcs). In the future, maybe we can move
those calls them before taprio_enable_offload? Right now there would
be no justification to do so. And also perhaps maybe there should be a
call to netdev_reset_tc in case the qdisc is removed?

> (Thinking more about this, I am having a few ideas about ways to simplify
> software mode :-)
>
>>
>> Q.2 Using gate operations to create protected windows The enhancements
>> for
>> scheduled traffic described in 8.6.8.4 allow transmission to be switched
>> on and
>> off on a timed basis for each _traffic class_ that is implemented on a
>> port. This
>> switching is achieved by means of individual on/off transmission gates
>> associated with each _traffic class_ and a list of gate operations that
>> control the
>> gates; an individual SetGateStates operation has a time delay parameter
>> that
>> indicates the delay after the gate operation is executed until the next
>> operation
>> is to occur, and a GateState parameter that defines a vector of up to
>> eight state
>> values (open or
>> closed) that is to be applied to each gate when the operation is executed.
>> The
>> gate operations allow any combination of open/closed states to be defined,
>> and
>> the mechanism makes no assumptions about which _traffic classes_ are
>> being
>> “protected” and which are “unprotected”; any such assumptions are left to
>> the
>> designer of the sequence of gate operations.
>>
>> Table 8-7—Gate operations
>> The GateState parameter indicates a value, open or closed, for each of
>> the
>> Port’s _queues_.
>>
>> - What happens with the "clockid" argument now that hardware offload is
>> possible? Do we allow "/dev/ptp0" to be specified as input?
>> Actually this question is relevant to your txtime-assist mode as well:
>> doesn't it assume that there is an implicit phc2sys instance running to
>> keep the
>> system time in sync with the PHC?
>
> That's a very interesting question. I think, for now, allowing specifying
> /dev/ptp* clocks
> won't work "always": if the driver or something needs to add a timer to be
> able to run
> the schedule, it won't be able to use /dev/ptp* clocks (hrtimers and ptp
> clocks don’t mix).
> But for "full" offloads, it should work.
>

But since the full offload could only work with the interface's PHC as
clockid, that kind of makes specifying any clockid redundant, right? I
think the right behavior would be to ignore that parameter (allow the
user to not specify it)?

> So, you are right, taprio and txtime-assisted (and ETF) require the system
> clock and phc
> clock to be synchronized, via something like phc2sys.
>
> Hope I got all your questions.
>
> Cheers,
> --
> Vinicius
>
>

I only have a very superficial understanding of the qdisc and
discussing these aspects with you helps a lot. There are a lot of
subtleties I'm missing, so I'm looking forward to your response. One
thing I would like to avoid is introduce more complexity than is
needed to solve the task at hand - hopefully I'm not oversimplifying.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source
  2019-09-02 16:25 ` [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source Vladimir Oltean
@ 2019-09-11 19:43   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 33+ messages in thread
From: Vinicius Costa Gomes @ 2019-09-11 19:43 UTC (permalink / raw)
  To: Vladimir Oltean, f.fainelli, vivien.didelot, andrew, davem,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Hi Vladimir,

Vladimir Oltean <olteanv@gmail.com> writes:

> Tested using the following bash script and the tc from iproute2-next:
>
> 	#!/bin/bash
>
> 	set -e -u -o pipefail
>
> 	NSEC_PER_SEC="1000000000"
>
> 	gatemask() {
> 		local tc_list="$1"
> 		local mask=0
>
> 		for tc in ${tc_list}; do
> 			mask=$((${mask} | (1 << ${tc})))
> 		done
>
> 		printf "%02x" ${mask}
> 	}
>
> 	if ! systemctl is-active --quiet ptp4l; then
> 		echo "Please start the ptp4l service"
> 		exit
> 	fi
>
> 	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
> 	# Phase-align the base time to the start of the next second.
> 	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
> 	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
>
> 	echo 'file drivers/net/dsa/sja1105/sja1105_tas.c +plm' | \
> 		sudo tee /sys/kernel/debug/dynamic_debug/control
>
> 	tc qdisc add dev swp5 parent root handle 100 taprio \
> 		num_tc 8 \
> 		map 0 1 2 3 5 6 7 \
> 		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 		base-time ${base_time} \
> 		sched-entry S $(gatemask 7) 100000 \
> 		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
> 		clockid CLOCK_TAI flags 2
>
> The "state machine" is a workqueue invoked after each manipulation
> command on the PTP clock (reset, adjust time, set time, adjust
> frequency) which checks over the state of the time-aware scheduler.
> So it is not monitored periodically, only in reaction to a PTP command
> typically triggered from a userspace daemon (linuxptp). Otherwise there
> is no reason for things to go wrong.
>
> Now that the timecounter/cyclecounter has been replaced with hardware
> operations on the PTP clock, the TAS Kconfig now depends upon PTP and
> the standalone clocksource operating mode has been removed.
>
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> ---
> Changes since RFC:
> - Used the "delta" terminology instead of "TAS cycle" to be more
>   consistent and avoid confusion with the cyclic schedule (of which the
>   "delta" is only the most granular unit, there is no other connection).
>
>  drivers/net/dsa/sja1105/Kconfig       |   2 +-
>  drivers/net/dsa/sja1105/sja1105.h     |   2 +
>  drivers/net/dsa/sja1105/sja1105_ptp.c |  26 +-
>  drivers/net/dsa/sja1105/sja1105_ptp.h |  13 +
>  drivers/net/dsa/sja1105/sja1105_spi.c |   4 +
>  drivers/net/dsa/sja1105/sja1105_tas.c | 426 +++++++++++++++++++++++++-
>  drivers/net/dsa/sja1105/sja1105_tas.h |  27 ++
>  7 files changed, 486 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
> index 4dc873e985e6..9316a23b7c30 100644
> --- a/drivers/net/dsa/sja1105/Kconfig
> +++ b/drivers/net/dsa/sja1105/Kconfig
> @@ -35,7 +35,7 @@ config NET_DSA_SJA1105_PTP
>  
>  config NET_DSA_SJA1105_TAS
>  	bool "Support for the Time-Aware Scheduler on NXP SJA1105"
> -	depends on NET_DSA_SJA1105
> +	depends on NET_DSA_SJA1105_PTP
>  	help
>  	  This enables support for the TTEthernet-based egress scheduling
>  	  engine in the SJA1105 DSA driver, which is controlled using a
> diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
> index 44f7385c51b5..e8f95b6fadfa 100644
> --- a/drivers/net/dsa/sja1105/sja1105.h
> +++ b/drivers/net/dsa/sja1105/sja1105.h
> @@ -40,6 +40,8 @@ struct sja1105_regs {
>  	u64 ptp_control;
>  	u64 ptpclk;
>  	u64 ptpclkrate;
> +	u64 ptpclkcorp;
> +	u64 ptpschtm;
>  	u64 ptpegr_ts[SJA1105_NUM_PORTS];
>  	u64 pad_mii_tx[SJA1105_NUM_PORTS];
>  	u64 pad_mii_id[SJA1105_NUM_PORTS];
> diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
> index ed80278a3521..b037834ff820 100644
> --- a/drivers/net/dsa/sja1105/sja1105_ptp.c
> +++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
> @@ -67,6 +67,8 @@ void sja1105et_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
>  	u64 valid = 1;
>  
>  	sja1105_packing(buf, &valid,           31, 31, size, op);
> +	sja1105_packing(buf, &cmd->ptpstrtsch, 30, 30, size, op);
> +	sja1105_packing(buf, &cmd->ptpstopsch, 29, 29, size, op);
>  	sja1105_packing(buf, &cmd->resptp,      2,  2, size, op);
>  	sja1105_packing(buf, &cmd->corrclk4ts,  1,  1, size, op);
>  	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
> @@ -80,14 +82,16 @@ void sja1105pqrs_ptp_cmd_packing(u8 *buf, struct sja1105_ptp_cmd *cmd,
>  	u64 valid = 1;
>  
>  	sja1105_packing(buf, &valid,           31, 31, size, op);
> +	sja1105_packing(buf, &cmd->ptpstrtsch, 30, 30, size, op);
> +	sja1105_packing(buf, &cmd->ptpstopsch, 29, 29, size, op);
>  	sja1105_packing(buf, &cmd->resptp,      3,  3, size, op);
>  	sja1105_packing(buf, &cmd->corrclk4ts,  2,  2, size, op);
>  	sja1105_packing(buf, &cmd->ptpclkadd,   0,  0, size, op);
>  }
>  
> -static int sja1105_ptp_commit(struct sja1105_private *priv,
> -			      struct sja1105_ptp_cmd *cmd,
> -			      sja1105_spi_rw_mode_t rw)
> +int sja1105_ptp_commit(struct sja1105_private *priv,
> +		       struct sja1105_ptp_cmd *cmd,
> +		       sja1105_spi_rw_mode_t rw)
>  {
>  	const struct sja1105_regs *regs = priv->info->regs;
>  	u8 buf[SJA1105_SIZE_PTP_CMD] = {0};
> @@ -222,6 +226,8 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
>  	dev_dbg(priv->ds->dev, "Resetting PTP clock\n");
>  	rc = sja1105_ptp_commit(priv, &cmd, SPI_WRITE);
>  
> +	sja1105_tas_clockstep(priv);
> +
>  	mutex_unlock(&ptp_data->lock);
>  
>  	return rc;
> @@ -291,7 +297,11 @@ int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
>  		return rc;
>  	}
>  
> -	return sja1105_ptpclkval_write(priv, ticks, ptp_sts);
> +	rc = sja1105_ptpclkval_write(priv, ticks, ptp_sts);
> +
> +	sja1105_tas_clockstep(priv);
> +
> +	return rc;
>  }
>  
>  static int sja1105_ptp_settime(struct ptp_clock_info *ptp,
> @@ -331,6 +341,8 @@ static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
>  	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
>  				  &clkrate, 4, NULL);
>  
> +	sja1105_tas_adjfreq(priv);
> +
>  	mutex_unlock(&priv->ptp_data.lock);
>  
>  	return rc;
> @@ -366,7 +378,11 @@ int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
>  		return rc;
>  	}
>  
> -	return sja1105_ptpclkval_write(priv, ticks, NULL);
> +	rc = sja1105_ptpclkval_write(priv, ticks, NULL);
> +
> +	sja1105_tas_clockstep(priv);
> +
> +	return rc;
>  }
>  
>  static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
> diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h b/drivers/net/dsa/sja1105/sja1105_ptp.h
> index c24c40115650..da68e5881e5f 100644
> --- a/drivers/net/dsa/sja1105/sja1105_ptp.h
> +++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
> @@ -29,6 +29,8 @@ enum sja1105_ptp_clk_mode {
>  };
>  
>  struct sja1105_ptp_cmd {
> +	u64 ptpstrtsch;		/* start schedule */
> +	u64 ptpstopsch;		/* stop schedule */
>  	u64 resptp;		/* reset */
>  	u64 corrclk4ts;		/* use the corrected clock for timestamps */
>  	u64 ptpclkadd;		/* enum sja1105_ptp_clk_mode */
> @@ -73,6 +75,10 @@ int __sja1105_ptp_settime(struct sja1105_private *priv, u64 ns,
>  
>  int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta);
>  
> +int sja1105_ptp_commit(struct sja1105_private *priv,
> +		       struct sja1105_ptp_cmd *cmd,
> +		       sja1105_spi_rw_mode_t rw);
> +
>  #else
>  
>  struct sja1105_ptp_cmd;
> @@ -135,6 +141,13 @@ static inline int __sja1105_ptp_adjtime(struct sja1105_private *priv, s64 delta)
>  	return 0;
>  }
>  
> +static inline int sja1105_ptp_commit(struct sja1105_private *priv,
> +				     struct sja1105_ptp_cmd *cmd,
> +				     sja1105_spi_rw_mode_t rw)
> +{
> +	return 0;
> +}
> +
>  #define sja1105et_ptp_cmd_packing NULL
>  
>  #define sja1105pqrs_ptp_cmd_packing NULL
> diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
> index 794cc5077565..f6df050c15ec 100644
> --- a/drivers/net/dsa/sja1105/sja1105_spi.c
> +++ b/drivers/net/dsa/sja1105/sja1105_spi.c
> @@ -526,9 +526,11 @@ static struct sja1105_regs sja1105et_regs = {
>  	.rmii_ref_clk = {0x100015, 0x10001C, 0x100023, 0x10002A, 0x100031},
>  	.rmii_ext_tx_clk = {0x100018, 0x10001F, 0x100026, 0x10002D, 0x100034},
>  	.ptpegr_ts = {0xC0, 0xC2, 0xC4, 0xC6, 0xC8},
> +	.ptpschtm = 0x12, /* Spans 0x12 to 0x13 */
>  	.ptp_control = 0x17,
>  	.ptpclk = 0x18, /* Spans 0x18 to 0x19 */
>  	.ptpclkrate = 0x1A,
> +	.ptpclkcorp = 0x1D,
>  };
>  
>  static struct sja1105_regs sja1105pqrs_regs = {
> @@ -556,9 +558,11 @@ static struct sja1105_regs sja1105pqrs_regs = {
>  	.rmii_ext_tx_clk = {0x100017, 0x10001D, 0x100023, 0x100029, 0x10002F},
>  	.qlevel = {0x604, 0x614, 0x624, 0x634, 0x644},
>  	.ptpegr_ts = {0xC0, 0xC4, 0xC8, 0xCC, 0xD0},
> +	.ptpschtm = 0x13, /* Spans 0x13 to 0x14 */
>  	.ptp_control = 0x18,
>  	.ptpclk = 0x19,
>  	.ptpclkrate = 0x1B,
> +	.ptpclkcorp = 0x1E,
>  };
>  
>  struct sja1105_info sja1105e_info = {
> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
> index 769e1d8e5e8f..ed0c3f00c09d 100644
> --- a/drivers/net/dsa/sja1105/sja1105_tas.c
> +++ b/drivers/net/dsa/sja1105/sja1105_tas.c
> @@ -10,6 +10,11 @@
>  #define SJA1105_GATE_MASK		GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
>  #define SJA1105_TAS_MAX_DELTA		BIT(19)
>  
> +#define work_to_sja1105_tas(d) \
> +	container_of((d), struct sja1105_tas_data, tas_work)
> +#define tas_to_sja1105(d) \
> +	container_of((d), struct sja1105_private, tas_data)
> +
>  /* This is not a preprocessor macro because the "ns" argument may or may not be
>   * s64 at caller side. This ensures it is properly type-cast before div_s64.
>   */
> @@ -18,6 +23,102 @@ static s64 ns_to_sja1105_delta(s64 ns)
>  	return div_s64(ns, 200);
>  }
>  
> +static s64 sja1105_delta_to_ns(s64 delta)
> +{
> +	return delta * 200;
> +}
> +
> +/* Calculate the first base_time in the future that satisfies this
> + * relationship:
> + *
> + * future_base_time = base_time + N x cycle_time >= now, or
> + *
> + *      now - base_time
> + * N >= ---------------
> + *         cycle_time
> + *
> + * Because N is an integer, the ceiling value of the above "a / b" ratio
> + * is in fact precisely the floor value of "(a + b - 1) / b", which is
> + * easier to calculate only having integer division tools.
> + */
> +static s64 future_base_time(s64 base_time, s64 cycle_time, s64 now)
> +{
> +	s64 a, b, n;
> +
> +	if (base_time >= now)
> +		return base_time;
> +
> +	a = now - base_time;
> +	b = cycle_time;
> +	n = div_s64(a + b - 1, b);
> +
> +	return base_time + n * cycle_time;
> +}
> +
> +static int sja1105_tas_set_runtime_params(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	s64 earliest_base_time = S64_MAX;
> +	s64 latest_base_time = 0;
> +	s64 its_cycle_time = 0;
> +	s64 max_cycle_time = 0;
> +	int port;
> +
> +	tas_data->enabled = false;
> +
> +	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
> +		const struct tc_taprio_qopt_offload *tas_config;
> +
> +		tas_config = tas_data->config[port];
> +		if (!tas_config)
> +			continue;
> +
> +		tas_data->enabled = true;
> +
> +		if (max_cycle_time < tas_config->cycle_time)
> +			max_cycle_time = tas_config->cycle_time;
> +		if (latest_base_time < tas_config->base_time)
> +			latest_base_time = tas_config->base_time;
> +		if (earliest_base_time > tas_config->base_time) {
> +			earliest_base_time = tas_config->base_time;
> +			its_cycle_time = tas_config->cycle_time;
> +		}
> +	}
> +
> +	if (!tas_data->enabled)
> +		return 0;
> +
> +	/* Roll the earliest base time over until it is in a comparable
> +	 * time base with the latest, then compare their deltas.
> +	 * We want to enforce that all ports' base times are within
> +	 * SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
> +	 */
> +	earliest_base_time = future_base_time(earliest_base_time,
> +					      its_cycle_time,
> +					      latest_base_time);
> +	while (earliest_base_time > latest_base_time)
> +		earliest_base_time -= its_cycle_time;
> +	if (latest_base_time - earliest_base_time >
> +	    sja1105_delta_to_ns(SJA1105_TAS_MAX_DELTA)) {
> +		dev_err(priv->ds->dev,
> +			"Base times too far apart: min %llu max %llu\n",
> +			earliest_base_time, latest_base_time);
> +		return -ERANGE;
> +	}
> +
> +	tas_data->earliest_base_time = earliest_base_time;
> +	tas_data->max_cycle_time = max_cycle_time;
> +
> +	dev_dbg(priv->ds->dev, "earliest base time %lld ns\n",
> +		tas_data->earliest_base_time);
> +	dev_dbg(priv->ds->dev, "latest base time %lld ns\n",
> +		tas_data->earliest_base_time);
> +	dev_dbg(priv->ds->dev, "longest cycle time %lld ns\n",
> +		tas_data->max_cycle_time);
> +
> +	return 0;
> +}
> +
>  /* Lo and behold: the egress scheduler from hell.
>   *
>   * At the hardware level, the Time-Aware Shaper holds a global linear arrray of
> @@ -100,7 +201,11 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
>  	int num_cycles = 0;
>  	int cycle = 0;
>  	int i, k = 0;
> -	int port;
> +	int port, rc;
> +
> +	rc = sja1105_tas_set_runtime_params(priv);
> +	if (rc < 0)
> +		return rc;
>  
>  	/* Discard previous Schedule Table */
>  	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
> @@ -181,11 +286,13 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
>  	schedule_entry_points = table->entries;
>  
>  	/* Finally start populating the static config tables */
> -	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_STANDALONE;
> +	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_PTP;
>  	schedule_entry_points_params->actsubsch = num_cycles - 1;
>  
>  	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
>  		const struct tc_taprio_qopt_offload *tas_config;
> +		/* Relative base time */
> +		s64 rbt;
>  
>  		tas_config = tas_data->config[port];
>  		if (!tas_config)
> @@ -193,13 +300,20 @@ static int sja1105_init_scheduling(struct sja1105_private *priv)
>  
>  		schedule_start_idx = k;
>  		schedule_end_idx = k + tas_config->num_entries - 1;
> -		/* TODO this is only a relative base time for the subschedule
> -		 * (relative to PTPSCHTM). But as we're using standalone and
> -		 * not PTP clock as time reference, leave it like this for now.
> -		 * Later we'll have to enforce that all ports' base times are
> -		 * within SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
> +		/* This is only a relative base time for the subschedule
> +		 * (relative to PTPSCHTM - aka the operational base time).
>  		 */
> -		entry_point_delta = ns_to_sja1105_delta(tas_config->base_time);
> +		rbt = future_base_time(tas_config->base_time,
> +				       tas_config->cycle_time,
> +				       tas_data->earliest_base_time);
> +		rbt -= tas_data->earliest_base_time;
> +		/* UM10944.pdf 4.2.2. Schedule Entry Points table says that
> +		 * delta cannot be zero, which is shitty. Advance all relative
> +		 * base times by 1 TAS delta, so that even the earliest base
> +		 * time becomes 1 in relative terms. Then start the operational
> +		 * base time (PTPSCHTM) one TAS delta earlier than planned.
> +		 */
> +		entry_point_delta = ns_to_sja1105_delta(rbt) + 1;
>  
>  		schedule_entry_points[cycle].subschindx = cycle;
>  		schedule_entry_points[cycle].delta = entry_point_delta;
> @@ -405,8 +519,302 @@ int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
>  	return sja1105_static_config_reload(priv);
>  }
>  
> +static int sja1105_tas_check_running(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	struct sja1105_ptp_cmd cmd = {0};
> +	int rc;
> +
> +	rc = sja1105_ptp_commit(priv, &cmd, SPI_READ);
> +	if (rc < 0)
> +		return rc;
> +
> +	if (cmd.ptpstrtsch == 1)
> +		/* Schedule successfully started */
> +		tas_data->state = SJA1105_TAS_STATE_RUNNING;
> +	else if (cmd.ptpstopsch == 1)
> +		/* Schedule is stopped */
> +		tas_data->state = SJA1105_TAS_STATE_DISABLED;
> +	else
> +		/* Schedule is probably not configured with PTP clock source */
> +		rc = -EINVAL;
> +
> +	return rc;
> +}
> +
> +/* Write to PTPCLKCORP */
> +static int sja1105_tas_adjust_drift(struct sja1105_private *priv,
> +				    u64 correction)
> +{
> +	const struct sja1105_regs *regs = priv->info->regs;
> +	u64 ptpclkcorp = ns_to_sja1105_ticks(correction);
> +
> +	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkcorp,
> +				    &ptpclkcorp, 4, NULL);
> +}
> +
> +/* Write to PTPSCHTM */
> +static int sja1105_tas_set_base_time(struct sja1105_private *priv,
> +				     u64 base_time)
> +{
> +	const struct sja1105_regs *regs = priv->info->regs;
> +	u64 ptpschtm = ns_to_sja1105_ticks(base_time);
> +
> +	return sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpschtm,
> +				    &ptpschtm, 8, NULL);
> +}
> +
> +static int sja1105_tas_start(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	struct sja1105_ptp_cmd *cmd = &priv->ptp_data.cmd;
> +	int rc;
> +
> +	dev_dbg(priv->ds->dev, "Starting the TAS\n");
> +
> +	if (tas_data->state == SJA1105_TAS_STATE_ENABLED_NOT_RUNNING ||
> +	    tas_data->state == SJA1105_TAS_STATE_RUNNING) {
> +		dev_err(priv->ds->dev, "TAS already started\n");
> +		return -EINVAL;
> +	}
> +
> +	cmd->ptpstrtsch = 1;
> +	cmd->ptpstopsch = 0;
> +
> +	rc = sja1105_ptp_commit(priv, cmd, SPI_WRITE);
> +	if (rc < 0)
> +		return rc;
> +
> +	tas_data->state = SJA1105_TAS_STATE_ENABLED_NOT_RUNNING;
> +
> +	return 0;
> +}
> +
> +static int sja1105_tas_stop(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	struct sja1105_ptp_cmd *cmd = &priv->ptp_data.cmd;
> +	int rc;
> +
> +	dev_dbg(priv->ds->dev, "Stopping the TAS\n");
> +
> +	if (tas_data->state == SJA1105_TAS_STATE_DISABLED) {
> +		dev_err(priv->ds->dev, "TAS already disabled\n");
> +		return -EINVAL;
> +	}
> +
> +	cmd->ptpstopsch = 1;
> +	cmd->ptpstrtsch = 0;
> +
> +	rc = sja1105_ptp_commit(priv, cmd, SPI_WRITE);
> +	if (rc < 0)
> +		return rc;
> +
> +	tas_data->state = SJA1105_TAS_STATE_DISABLED;
> +
> +	return 0;
> +}
> +
> +/* The schedule engine and the PTP clock are driven by the same oscillator, and
> + * they run in parallel. But whilst the PTP clock can keep an absolute
> + * time-of-day, the schedule engine is only running in 'ticks' (25 ticks make
> + * up a delta, which is 200ns), and wrapping around at the end of each cycle.
> + * The schedule engine is started when the PTP clock reaches the PTPSCHTM time
> + * (in PTP domain).
> + * Because the PTP clock can be rate-corrected (accelerated or slowed down) by
> + * a software servo, and the schedule engine clock runs in parallel to the PTP
> + * clock, there is logic internal to the switch that periodically keeps the
> + * schedule engine from drifting away. The frequency with which this internal
> + * syntonization happens is the PTP clock correction period (PTPCLKCORP). It is
> + * a value also in the PTP clock domain, and is also rate-corrected.
> + * To be precise, during a correction period, there is logic to determine by
> + * how many scheduler clock ticks has the PTP clock drifted. At the end of each
> + * correction period/beginning of new one, the length of a delta is shrunk or
> + * expanded with an integer number of ticks, compared with the typical 25.
> + * So a delta lasts for 200ns (or 25 ticks) only on average.
> + * Sometimes it is longer, sometimes it is shorter. The internal syntonization
> + * logic can adjust for at most 5 ticks each 20 ticks.
> + *
> + * The first implication is that you should choose your schedule correction
> + * period to be an integer multiple of the schedule length. Preferably one.
> + * In case there are schedules of multiple ports active, then the correction
> + * period needs to be a multiple of them all. Given the restriction that the
> + * cycle times have to be multiples of one another anyway, this means the
> + * correction period can simply be the largest cycle time, hence the current
> + * choice. This way, the updates are always synchronous to the transmission
> + * cycle, and therefore predictable.
> + *
> + * The second implication is that at the beginning of a correction period, the
> + * first few deltas will be modulated in time, until the schedule engine is
> + * properly phase-aligned with the PTP clock. For this reason, you should place
> + * your best-effort traffic at the beginning of a cycle, and your
> + * time-triggered traffic afterwards.
> + *
> + * The third implication is that once the schedule engine is started, it can
> + * only adjust for so much drift within a correction period. In the servo you
> + * can only change the PTPCLKRATE, but not step the clock (PTPCLKADD). If you
> + * want to do the latter, you need to stop and restart the schedule engine,
> + * which is what the state machine handles.
> + */
> +static void sja1105_tas_state_machine(struct work_struct *work)
> +{
> +	struct sja1105_tas_data *tas_data = work_to_sja1105_tas(work);
> +	struct sja1105_private *priv = tas_to_sja1105(tas_data);
> +	struct sja1105_ptp_data *ptp_data = &priv->ptp_data;
> +	struct timespec64 base_time_ts, now_ts;
> +	struct dsa_switch *ds = priv->ds;
> +	struct timespec64 diff;
> +	s64 base_time, now;
> +	int rc = 0;
> +
> +	mutex_lock(&ptp_data->lock);
> +
> +	switch (tas_data->state) {
> +	case SJA1105_TAS_STATE_DISABLED:
> +
> +		dev_dbg(ds->dev, "TAS state: disabled\n");
> +		/* Can't do anything at all if clock is still being stepped */
> +		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ)
> +			break;
> +
> +		rc = sja1105_tas_adjust_drift(priv, tas_data->max_cycle_time);
> +		if (rc < 0)
> +			break;
> +
> +		now = __sja1105_ptp_gettimex(priv, NULL);
> +
> +		/* Plan to start the earliest schedule first. The others
> +		 * will be started in hardware, by way of their respective
> +		 * entry points delta.
> +		 * Try our best to avoid fringe cases (race condition between
> +		 * ptpschtm and ptpstrtsch) by pushing the oper_base_time at
> +		 * least one second in the future from now. This is not ideal,
> +		 * but this only needs to buy us time until the
> +		 * sja1105_tas_start command below gets executed.
> +		 */
> +		base_time = future_base_time(tas_data->earliest_base_time,
> +					     tas_data->max_cycle_time,
> +					     now + 1ull * NSEC_PER_SEC);
> +		base_time -= sja1105_delta_to_ns(1);
> +
> +		rc = sja1105_tas_set_base_time(priv, base_time);
> +		if (rc < 0)
> +			break;
> +
> +		tas_data->oper_base_time = base_time;
> +
> +		rc = sja1105_tas_start(priv);
> +		if (rc < 0)
> +			break;
> +
> +		base_time_ts = ns_to_timespec64(base_time);
> +		now_ts = ns_to_timespec64(now);
> +
> +		dev_dbg(ds->dev, "OPER base time %lld.%09ld (now %lld.%09ld)\n",
> +			base_time_ts.tv_sec, base_time_ts.tv_nsec,
> +			now_ts.tv_sec, now_ts.tv_nsec);
> +
> +		break;
> +
> +	case SJA1105_TAS_STATE_ENABLED_NOT_RUNNING:
> +		/* Check if TAS has actually started, by comparing the
> +		 * scheduled start time with the SJA1105 PTP clock
> +		 */
> +		dev_dbg(ds->dev, "TAS state: enabled but not running\n");
> +
> +		/* Clock was stepped.. bad news for TAS */
> +		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ) {
> +			sja1105_tas_stop(priv);
> +			break;
> +		}
> +
> +		now = __sja1105_ptp_gettimex(priv, NULL);
> +
> +		if (now < tas_data->oper_base_time) {
> +			/* TAS has not started yet */
> +			diff = ns_to_timespec64(tas_data->oper_base_time - now);
> +			dev_dbg(ds->dev, "time to start: [%lld.%09ld]",
> +				diff.tv_sec, diff.tv_nsec);
> +			break;
> +		}
> +
> +		/* Time elapsed, what happened? */
> +		rc = sja1105_tas_check_running(priv);
> +		if (rc < 0)
> +			break;
> +
> +		if (tas_data->state == SJA1105_TAS_STATE_RUNNING)
> +			/* TAS has started */
> +			dev_dbg(ds->dev, "TAS state: transitioned to running\n");
> +		else
> +			dev_err(ds->dev, "TAS state: not started despite time elapsed\n");
> +
> +		break;
> +
> +	case SJA1105_TAS_STATE_RUNNING:
> +		dev_dbg(ds->dev, "TAS state: running\n");
> +
> +		/* Clock was stepped.. bad news for TAS */
> +		if (tas_data->last_op != SJA1105_PTP_ADJUSTFREQ) {
> +			sja1105_tas_stop(priv);
> +			break;
> +		}
> +
> +		rc = sja1105_tas_check_running(priv);
> +		if (rc < 0)
> +			break;
> +
> +		if (tas_data->state != SJA1105_TAS_STATE_RUNNING) {
> +			dev_err(ds->dev, "TAS surprisingly stopped\n");
> +			break;
> +		}
> +
> +		now = __sja1105_ptp_gettimex(priv, NULL);
> +
> +		diff = ns_to_timespec64(now - tas_data->oper_base_time);
> +
> +		dev_dbg(ds->dev, "Time since TAS started: [%lld.%09ld]\n",
> +			diff.tv_sec, diff.tv_nsec);
> +		break;

I got the feeling that some of the debug statements are more leftovers
from development that things that could help debug issues.

> +
> +	default:
> +		if (net_ratelimit())
> +			dev_err(ds->dev, "TAS in an invalid state (incorrect use of API)!\n");
> +	}
> +
> +	if (rc && net_ratelimit())
> +		dev_err(ds->dev, "An operation returned %d\n", rc);
> +
> +	mutex_unlock(&ptp_data->lock);
> +}
> +
> +void sja1105_tas_clockstep(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +
> +	if (!tas_data->enabled)
> +		return;
> +
> +	tas_data->last_op = SJA1105_PTP_CLOCKSTEP;
> +	schedule_work(&tas_data->tas_work);
> +}
> +
> +void sja1105_tas_adjfreq(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +
> +	if (!tas_data->enabled)
> +		return;
> +
> +	tas_data->last_op = SJA1105_PTP_ADJUSTFREQ;
> +	schedule_work(&tas_data->tas_work);
> +}
> +
>  void sja1105_tas_setup(struct sja1105_private *priv)
>  {
> +	INIT_WORK(&priv->tas_data.tas_work, sja1105_tas_state_machine);
> +	priv->tas_data.state = SJA1105_TAS_STATE_DISABLED;
> +	priv->tas_data.last_op = SJA1105_PTP_NONE;
>  }
>  
>  void sja1105_tas_teardown(struct sja1105_private *priv)
> @@ -414,6 +822,8 @@ void sja1105_tas_teardown(struct sja1105_private *priv)
>  	struct sja1105_tas_data *tas_data = &priv->tas_data;
>  	int port;
>  
> +	cancel_work_sync(&tas_data->tas_work);
> +

I think you should set 'tas_data->enabled' to false somewhere around
here: wondering if it's possible for a PTP function (via a ioctl() or
something) to call sja1105_tas_clockstep() at the very wrong time, and
re-start the workqueue.

>  	for (port = 0; port < SJA1105_NUM_PORTS; port++)
>  		if (tas_data->config[port])
>  			taprio_free(tas_data->config[port]);
> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.h b/drivers/net/dsa/sja1105/sja1105_tas.h
> index 0ef82810d9d7..ecc95624e3f6 100644
> --- a/drivers/net/dsa/sja1105/sja1105_tas.h
> +++ b/drivers/net/dsa/sja1105/sja1105_tas.h
> @@ -8,8 +8,27 @@
>  
>  #if IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS)
>  
> +enum sja1105_tas_state {
> +	SJA1105_TAS_STATE_DISABLED,
> +	SJA1105_TAS_STATE_ENABLED_NOT_RUNNING,
> +	SJA1105_TAS_STATE_RUNNING,
> +};
> +
> +enum sja1105_ptp_op {
> +	SJA1105_PTP_NONE,
> +	SJA1105_PTP_CLOCKSTEP,
> +	SJA1105_PTP_ADJUSTFREQ,
> +};
> +
>  struct sja1105_tas_data {
>  	struct tc_taprio_qopt_offload *config[SJA1105_NUM_PORTS];
> +	enum sja1105_tas_state state;
> +	enum sja1105_ptp_op last_op;
> +	struct work_struct tas_work;
> +	s64 earliest_base_time;
> +	s64 oper_base_time;
> +	u64 max_cycle_time;
> +	bool enabled;
>  };
>  
>  int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
> @@ -19,6 +38,10 @@ void sja1105_tas_setup(struct sja1105_private *priv);
>  
>  void sja1105_tas_teardown(struct sja1105_private *priv);
>  
> +void sja1105_tas_clockstep(struct sja1105_private *priv);
> +
> +void sja1105_tas_adjfreq(struct sja1105_private *priv);
> +
>  #else
>  
>  /* C doesn't allow empty structures, bah! */
> @@ -37,6 +60,10 @@ static inline void sja1105_tas_setup(struct sja1105_private *priv) { }
>  
>  static inline void sja1105_tas_teardown(struct sja1105_private *priv) { }
>  
> +static inline void sja1105_tas_clockstep(struct sja1105_private *priv) { }
> +
> +static inline void sja1105_tas_adjfreq(struct sja1105_private *priv) { }
> +
>  #endif /* IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS) */
>  
>  #endif /* _SJA1105_TAS_H */
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload
  2019-09-02 16:25 ` [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload Vladimir Oltean
@ 2019-09-11 19:45   ` Vinicius Costa Gomes
  2019-09-12  1:30     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Vinicius Costa Gomes @ 2019-09-11 19:45 UTC (permalink / raw)
  To: Vladimir Oltean, f.fainelli, vivien.didelot, andrew, davem,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, netdev, Vladimir Oltean

Hi,

Vladimir Oltean <olteanv@gmail.com> writes:

> This qdisc offload is the closest thing to what the SJA1105 supports in
> hardware for time-based egress shaping. The switch core really is built
> around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
> operate similarly to IEEE 802.1Qbv with some constraints:
>
> - The gate control list is a global list for all ports. There are 8
>   execution threads that iterate through this global list in parallel.
>   I don't know why 8, there are only 4 front-panel ports.
>
> - Care must be taken by the user to make sure that two execution threads
>   never get to execute a GCL entry simultaneously. I created a O(n^4)
>   checker for this hardware limitation, prior to accepting a taprio
>   offload configuration as valid.
>
> - The spec says that if a GCL entry's interval is shorter than the frame
>   length, you shouldn't send it (and end up in head-of-line blocking).
>   Well, this switch does anyway.
>
> - The switch has no concept of ADMIN and OPER configurations. Because
>   it's so simple, the TAS settings are loaded through the static config
>   tables interface, so there isn't even place for any discussion about
>   'graceful switchover between ADMIN and OPER'. You just reset the
>   switch and upload a new OPER config.
>
> - The switch accepts multiple time sources for the gate events. Right
>   now I am using the standalone clock source as opposed to PTP. So the
>   base time parameter doesn't really do much. Support for the PTP clock
>   source will be added in the next patch.
>
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> ---
> Changes since RFC:
> - Removed the sja1105_tas_config_work workqueue.
> - Allocating memory with GFP_KERNEL.
> - Made the ASCII art drawing fit in < 80 characters.
> - Made most of the time-holding variables s64 instead of u64 (for fear
>   of them not holding the result of signed arithmetics properly).
>
>  drivers/net/dsa/sja1105/Kconfig        |   8 +
>  drivers/net/dsa/sja1105/Makefile       |   4 +
>  drivers/net/dsa/sja1105/sja1105.h      |   5 +
>  drivers/net/dsa/sja1105/sja1105_main.c |  19 +-
>  drivers/net/dsa/sja1105/sja1105_tas.c  | 420 +++++++++++++++++++++++++
>  drivers/net/dsa/sja1105/sja1105_tas.h  |  42 +++
>  6 files changed, 497 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
>  create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h
>
> diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
> index 770134a66e48..55424f39cb0d 100644
> --- a/drivers/net/dsa/sja1105/Kconfig
> +++ b/drivers/net/dsa/sja1105/Kconfig
> @@ -23,3 +23,11 @@ config NET_DSA_SJA1105_PTP
>  	help
>  	  This enables support for timestamping and PTP clock manipulations in
>  	  the SJA1105 DSA driver.
> +
> +config NET_DSA_SJA1105_TAS
> +	bool "Support for the Time-Aware Scheduler on NXP SJA1105"
> +	depends on NET_DSA_SJA1105
> +	help
> +	  This enables support for the TTEthernet-based egress scheduling
> +	  engine in the SJA1105 DSA driver, which is controlled using a
> +	  hardware offload of the tc-tqprio qdisc.
> diff --git a/drivers/net/dsa/sja1105/Makefile b/drivers/net/dsa/sja1105/Makefile
> index 4483113e6259..66161e874344 100644
> --- a/drivers/net/dsa/sja1105/Makefile
> +++ b/drivers/net/dsa/sja1105/Makefile
> @@ -12,3 +12,7 @@ sja1105-objs := \
>  ifdef CONFIG_NET_DSA_SJA1105_PTP
>  sja1105-objs += sja1105_ptp.o
>  endif
> +
> +ifdef CONFIG_NET_DSA_SJA1105_TAS
> +sja1105-objs += sja1105_tas.o
> +endif
> diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
> index 3ca0b87aa3e4..d95f9ce3b4f9 100644
> --- a/drivers/net/dsa/sja1105/sja1105.h
> +++ b/drivers/net/dsa/sja1105/sja1105.h
> @@ -21,6 +21,7 @@
>  #define SJA1105_AGEING_TIME_MS(ms)	((ms) / 10)
>  
>  #include "sja1105_ptp.h"
> +#include "sja1105_tas.h"
>  
>  /* Keeps the different addresses between E/T and P/Q/R/S */
>  struct sja1105_regs {
> @@ -96,6 +97,7 @@ struct sja1105_private {
>  	struct mutex mgmt_lock;
>  	struct sja1105_tagger_data tagger_data;
>  	struct sja1105_ptp_data ptp_data;
> +	struct sja1105_tas_data tas_data;
>  };
>  
>  #include "sja1105_dynamic_config.h"
> @@ -111,6 +113,9 @@ typedef enum {
>  	SPI_WRITE = 1,
>  } sja1105_spi_rw_mode_t;
>  
> +/* From sja1105_main.c */
> +int sja1105_static_config_reload(struct sja1105_private *priv);
> +
>  /* From sja1105_spi.c */
>  int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
>  				sja1105_spi_rw_mode_t rw, u64 reg_addr,
> diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
> index 8b930cc2dabc..4b393782cc84 100644
> --- a/drivers/net/dsa/sja1105/sja1105_main.c
> +++ b/drivers/net/dsa/sja1105/sja1105_main.c
> @@ -22,6 +22,7 @@
>  #include <linux/if_ether.h>
>  #include <linux/dsa/8021q.h>
>  #include "sja1105.h"
> +#include "sja1105_tas.h"
>  
>  static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int pulse_len,
>  			     unsigned int startup_delay)
> @@ -1382,7 +1383,7 @@ static void sja1105_bridge_leave(struct dsa_switch *ds, int port,
>   * modify at runtime (currently only MAC) and restore them after uploading,
>   * such that this operation is relatively seamless.
>   */
> -static int sja1105_static_config_reload(struct sja1105_private *priv)
> +int sja1105_static_config_reload(struct sja1105_private *priv)
>  {
>  	struct ptp_system_timestamp ptp_sts_before;
>  	struct ptp_system_timestamp ptp_sts_after;
> @@ -1761,6 +1762,7 @@ static void sja1105_teardown(struct dsa_switch *ds)
>  {
>  	struct sja1105_private *priv = ds->priv;
>  
> +	sja1105_tas_teardown(priv);
>  	cancel_work_sync(&priv->tagger_data.rxtstamp_work);
>  	skb_queue_purge(&priv->tagger_data.skb_rxtstamp_queue);
>  	sja1105_ptp_clock_unregister(priv);
> @@ -2088,6 +2090,18 @@ static bool sja1105_port_txtstamp(struct dsa_switch *ds, int port,
>  	return true;
>  }
>  
> +static int sja1105_port_setup_tc(struct dsa_switch *ds, int port,
> +				 enum tc_setup_type type,
> +				 void *type_data)
> +{
> +	switch (type) {
> +	case TC_SETUP_QDISC_TAPRIO:
> +		return sja1105_setup_tc_taprio(ds, port, type_data);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
>  static const struct dsa_switch_ops sja1105_switch_ops = {
>  	.get_tag_protocol	= sja1105_get_tag_protocol,
>  	.setup			= sja1105_setup,
> @@ -2120,6 +2134,7 @@ static const struct dsa_switch_ops sja1105_switch_ops = {
>  	.port_hwtstamp_set	= sja1105_hwtstamp_set,
>  	.port_rxtstamp		= sja1105_port_rxtstamp,
>  	.port_txtstamp		= sja1105_port_txtstamp,
> +	.port_setup_tc		= sja1105_port_setup_tc,
>  };
>  
>  static int sja1105_check_device_id(struct sja1105_private *priv)
> @@ -2229,6 +2244,8 @@ static int sja1105_probe(struct spi_device *spi)
>  	}
>  	mutex_init(&priv->mgmt_lock);
>  
> +	sja1105_tas_setup(priv);
> +
>  	return dsa_register_switch(priv->ds);
>  }
>  
> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
> new file mode 100644
> index 000000000000..769e1d8e5e8f
> --- /dev/null
> +++ b/drivers/net/dsa/sja1105/sja1105_tas.c
> @@ -0,0 +1,420 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
> + */
> +#include "sja1105.h"
> +
> +#define SJA1105_TAS_CLKSRC_DISABLED	0
> +#define SJA1105_TAS_CLKSRC_STANDALONE	1
> +#define SJA1105_TAS_CLKSRC_AS6802	2
> +#define SJA1105_TAS_CLKSRC_PTP		3
> +#define SJA1105_GATE_MASK		GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
> +#define SJA1105_TAS_MAX_DELTA		BIT(19)
> +
> +/* This is not a preprocessor macro because the "ns" argument may or may not be
> + * s64 at caller side. This ensures it is properly type-cast before div_s64.
> + */
> +static s64 ns_to_sja1105_delta(s64 ns)
> +{
> +	return div_s64(ns, 200);
> +}
> +
> +/* Lo and behold: the egress scheduler from hell.
> + *
> + * At the hardware level, the Time-Aware Shaper holds a global linear arrray of
> + * all schedule entries for all ports. These are the Gate Control List (GCL)
> + * entries, let's call them "timeslots" for short. This linear array of
> + * timeslots is held in BLK_IDX_SCHEDULE.
> + *
> + * Then there are a maximum of 8 "execution threads" inside the switch, which
> + * iterate cyclically through the "schedule". Each "cycle" has an entry point
> + * and an exit point, both being timeslot indices in the schedule table. The
> + * hardware calls each cycle a "subschedule".
> + *
> + * Subschedule (cycle) i starts when
> + *   ptpclkval >= ptpschtm + BLK_IDX_SCHEDULE_ENTRY_POINTS[i].delta.
> + *
> + * The hardware scheduler iterates BLK_IDX_SCHEDULE with a k ranging from
> + *   k = BLK_IDX_SCHEDULE_ENTRY_POINTS[i].address to
> + *   k = BLK_IDX_SCHEDULE_PARAMS.subscheind[i]
> + *
> + * For each schedule entry (timeslot) k, the engine executes the gate control
> + * list entry for the duration of BLK_IDX_SCHEDULE[k].delta.
> + *
> + *         +---------+
> + *         |         | BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS
> + *         +---------+
> + *              |
> + *              +-----------------+
> + *                                | .actsubsch
> + *  BLK_IDX_SCHEDULE_ENTRY_POINTS v
> + *                 +-------+-------+
> + *                 |cycle 0|cycle 1|
> + *                 +-------+-------+
> + *                   |  |      |  |
> + *  +----------------+  |      |  +-------------------------------------+
> + *  |   .subschindx     |      |             .subschindx                |
> + *  |                   |      +---------------+                        |
> + *  |          .address |        .address      |                        |
> + *  |                   |                      |                        |
> + *  |                   |                      |                        |
> + *  |  BLK_IDX_SCHEDULE v                      v                        |
> + *  |              +-------+-------+-------+-------+-------+------+     |
> + *  |              |entry 0|entry 1|entry 2|entry 3|entry 4|entry5|     |
> + *  |              +-------+-------+-------+-------+-------+------+     |
> + *  |                                  ^                    ^  ^  ^     |
> + *  |                                  |                    |  |  |     |
> + *  |        +-------------------------+                    |  |  |     |
> + *  |        |              +-------------------------------+  |  |     |
> + *  |        |              |              +-------------------+  |     |
> + *  |        |              |              |                      |     |
> + *  | +---------------------------------------------------------------+ |
> + *  | |subscheind[0]<=subscheind[1]<=subscheind[2]<=...<=subscheind[7]| |
> + *  | +---------------------------------------------------------------+ |
> + *  |        ^              ^                BLK_IDX_SCHEDULE_PARAMS    |
> + *  |        |              |                                           |
> + *  +--------+              +-------------------------------------------+
> + *
> + *  In the above picture there are two subschedules (cycles):
> + *
> + *  - cycle 0: iterates the schedule table from 0 to 2 (and back)
> + *  - cycle 1: iterates the schedule table from 3 to 5 (and back)
> + *
> + *  All other possible execution threads must be marked as unused by making
> + *  their "subschedule end index" (subscheind) equal to the last valid
> + *  subschedule's end index (in this case 5).
> + */
> +static int sja1105_init_scheduling(struct sja1105_private *priv)
> +{
> +	struct sja1105_schedule_entry_points_entry *schedule_entry_points;
> +	struct sja1105_schedule_entry_points_params_entry
> +					*schedule_entry_points_params;
> +	struct sja1105_schedule_params_entry *schedule_params;
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	struct sja1105_schedule_entry *schedule;
> +	struct sja1105_table *table;
> +	int subscheind[8] = {0};
> +	int schedule_start_idx;
> +	s64 entry_point_delta;
> +	int schedule_end_idx;
> +	int num_entries = 0;
> +	int num_cycles = 0;
> +	int cycle = 0;
> +	int i, k = 0;
> +	int port;
> +
> +	/* Discard previous Schedule Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
> +	if (table->entry_count) {
> +		kfree(table->entries);
> +		table->entry_count = 0;
> +	}
> +
> +	/* Discard previous Schedule Entry Points Parameters Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
> +	if (table->entry_count) {
> +		kfree(table->entries);
> +		table->entry_count = 0;
> +	}
> +
> +	/* Discard previous Schedule Parameters Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
> +	if (table->entry_count) {
> +		kfree(table->entries);
> +		table->entry_count = 0;
> +	}
> +
> +	/* Discard previous Schedule Entry Points Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
> +	if (table->entry_count) {
> +		kfree(table->entries);
> +		table->entry_count = 0;
> +	}
> +
> +	/* Figure out the dimensioning of the problem */
> +	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
> +		if (tas_data->config[port]) {
> +			num_entries += tas_data->config[port]->num_entries;
> +			num_cycles++;
> +		}
> +	}
> +
> +	/* Nothing to do */
> +	if (!num_cycles)
> +		return 0;
> +
> +	/* Pre-allocate space in the static config tables */
> +
> +	/* Schedule Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
> +	table->entries = kcalloc(num_entries, table->ops->unpacked_entry_size,
> +				 GFP_KERNEL);
> +	if (!table->entries)
> +		return -ENOMEM;
> +	table->entry_count = num_entries;
> +	schedule = table->entries;
> +
> +	/* Schedule Points Parameters Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
> +	table->entries = kcalloc(SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
> +				 table->ops->unpacked_entry_size, GFP_KERNEL);
> +	if (!table->entries)
> +		return -ENOMEM;

Should this free the previous allocation, in case this one fails?
(also applies to the statements below)

> +	table->entry_count = SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT;
> +	schedule_entry_points_params = table->entries;
> +
> +	/* Schedule Parameters Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
> +	table->entries = kcalloc(SJA1105_MAX_SCHEDULE_PARAMS_COUNT,
> +				 table->ops->unpacked_entry_size, GFP_KERNEL);
> +	if (!table->entries)
> +		return -ENOMEM;
> +	table->entry_count = SJA1105_MAX_SCHEDULE_PARAMS_COUNT;
> +	schedule_params = table->entries;
> +
> +	/* Schedule Entry Points Table */
> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
> +	table->entries = kcalloc(num_cycles, table->ops->unpacked_entry_size,
> +				 GFP_KERNEL);
> +	if (!table->entries)
> +		return -ENOMEM;
> +	table->entry_count = num_cycles;
> +	schedule_entry_points = table->entries;
> +
> +	/* Finally start populating the static config tables */
> +	schedule_entry_points_params->clksrc = SJA1105_TAS_CLKSRC_STANDALONE;
> +	schedule_entry_points_params->actsubsch = num_cycles - 1;
> +
> +	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
> +		const struct tc_taprio_qopt_offload *tas_config;
> +
> +		tas_config = tas_data->config[port];
> +		if (!tas_config)
> +			continue;
> +
> +		schedule_start_idx = k;
> +		schedule_end_idx = k + tas_config->num_entries - 1;
> +		/* TODO this is only a relative base time for the subschedule
> +		 * (relative to PTPSCHTM). But as we're using standalone and
> +		 * not PTP clock as time reference, leave it like this for now.
> +		 * Later we'll have to enforce that all ports' base times are
> +		 * within SJA1105_TAS_MAX_DELTA 200ns cycles of one another.
> +		 */
> +		entry_point_delta = ns_to_sja1105_delta(tas_config->base_time);
> +
> +		schedule_entry_points[cycle].subschindx = cycle;
> +		schedule_entry_points[cycle].delta = entry_point_delta;
> +		schedule_entry_points[cycle].address = schedule_start_idx;
> +
> +		for (i = cycle; i < 8; i++)
> +			subscheind[i] = schedule_end_idx;
> +
> +		for (i = 0; i < tas_config->num_entries; i++, k++) {
> +			s64 delta_ns = tas_config->entries[i].interval;
> +
> +			schedule[k].delta = ns_to_sja1105_delta(delta_ns);
> +			schedule[k].destports = BIT(port);
> +			schedule[k].resmedia_en = true;
> +			schedule[k].resmedia = SJA1105_GATE_MASK &
> +					~tas_config->entries[i].gate_mask;
> +		}
> +		cycle++;
> +	}
> +
> +	for (i = 0; i < 8; i++)
> +		schedule_params->subscheind[i] = subscheind[i];
> +
> +	return 0;
> +}
> +
> +/* Be there 2 port subschedules, each executing an arbitrary number of gate
> + * open/close events cyclically.
> + * None of those gate events must ever occur at the exact same time, otherwise
> + * the switch is known to act in exotically strange ways.
> + * However the hardware doesn't bother performing these integrity checks - the
> + * designers probably said "nah, let's leave that to the experts" - oh well,
> + * now we're the experts.
> + * So here we are with the task of validating whether the new @qopt has any
> + * conflict with the already established TAS configuration in tas_data->config.
> + * We already know the other ports are in harmony with one another, otherwise
> + * we wouldn't have saved them.
> + * Each gate event executes periodically, with a period of @cycle_time and a
> + * phase given by its cycle's @base_time plus its offset within the cycle
> + * (which in turn is given by the length of the events prior to it).
> + * There are two aspects to possible collisions:
> + * - Collisions within one cycle's (actually the longest cycle's) time frame.
> + *   For that, we need to compare the cartesian product of each possible
> + *   occurrence of each event within one cycle time.
> + * - Collisions in the future. Events may not collide within one cycle time,
> + *   but if two port schedules don't have the same periodicity (aka the cycle
> + *   times aren't multiples of one another), they surely will some time in the
> + *   future (actually they will collide an infinite amount of times).
> + */
> +static bool
> +sja1105_tas_check_conflicts(struct sja1105_private *priv,
> +			    const struct tc_taprio_qopt_offload *qopt)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	int port;
> +
> +	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
> +		const struct tc_taprio_qopt_offload *tas_config;
> +		s64 max_cycle_time, min_cycle_time;
> +		s64 delta1, delta2;
> +		s64 rbt1, rbt2;
> +		s64 stop_time;
> +		s64 t1, t2;
> +		int i, j;
> +		s32 rem;
> +
> +		tas_config = tas_data->config[port];
> +
> +		if (!tas_config)
> +			continue;
> +
> +		/* Check if the two cycle times are multiples of one another.
> +		 * If they aren't, then they will surely collide.
> +		 */
> +		max_cycle_time = max(tas_config->cycle_time, qopt->cycle_time);
> +		min_cycle_time = min(tas_config->cycle_time, qopt->cycle_time);
> +		div_s64_rem(max_cycle_time, min_cycle_time, &rem);
> +		if (rem)
> +			return true;
> +
> +		/* Calculate the "reduced" base time of each of the two cycles
> +		 * (transposed back as close to 0 as possible) by dividing to
> +		 * the cycle time.
> +		 */
> +		div_s64_rem(tas_config->base_time, tas_config->cycle_time,
> +			    &rem);
> +		rbt1 = rem;
> +
> +		div_s64_rem(qopt->base_time, qopt->cycle_time, &rem);
> +		rbt2 = rem;
> +
> +		stop_time = max_cycle_time + max(rbt1, rbt2);
> +
> +		/* delta1 is the relative base time of each GCL entry within
> +		 * the established ports' TAS config.
> +		 */
> +		for (i = 0, delta1 = 0;
> +		     i < tas_config->num_entries;
> +		     delta1 += tas_config->entries[i].interval, i++) {
> +
> +			/* delta2 is the relative base time of each GCL entry
> +			 * within the newly added TAS config.
> +			 */
> +			for (j = 0, delta2 = 0;
> +			     j < qopt->num_entries;
> +			     delta2 += qopt->entries[j].interval, j++) {
> +
> +				/* t1 follows all possible occurrences of the
> +				 * established ports' GCL entry i within the
> +				 * first cycle time.
> +				 */
> +				for (t1 = rbt1 + delta1;
> +				     t1 <= stop_time;
> +				     t1 += tas_config->cycle_time) {
> +
> +					/* t2 follows all possible occurrences
> +					 * of the newly added GCL entry j
> +					 * within the first cycle time.
> +					 */
> +					for (t2 = rbt2 + delta2;
> +					     t2 <= stop_time;
> +					     t2 += qopt->cycle_time) {
> +
> +						if (t1 == t2) {
> +							dev_warn(priv->ds->dev,
> +								 "GCL entry %d collides with entry %d of port %d\n",
> +								 j, i, port);
> +							return true;
> +						}
> +					}
> +				}
> +			}
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
> +			    struct tc_taprio_qopt_offload *tas_config)
> +{
> +	struct sja1105_private *priv = ds->priv;
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	int rc, i;
> +
> +	/* Can't change an already configured port (must delete qdisc first).
> +	 * Can't delete the qdisc from an unconfigured port.
> +	 */
> +	if (!!tas_data->config[port] == tas_config->enable)
> +		return -EINVAL;
> +
> +	if (!tas_config->enable) {
> +		taprio_free(tas_data->config[port]);
> +		tas_data->config[port] = NULL;
> +
> +		rc = sja1105_init_scheduling(priv);
> +		if (rc < 0)
> +			return rc;
> +
> +		return sja1105_static_config_reload(priv);
> +	}
> +
> +	/* The cycle time extension is the amount of time the last cycle from
> +	 * the old OPER needs to be extended in order to phase-align with the
> +	 * base time of the ADMIN when that becomes the new OPER.
> +	 * But of course our switch needs to be reset to switch-over between
> +	 * the ADMIN and the OPER configs - so much for a seamless transition.
> +	 * So don't add insult over injury and just say we don't support cycle
> +	 * time extension.
> +	 */
> +	if (tas_config->cycle_time_extension)
> +		return -ENOTSUPP;
> +
> +	if (!ns_to_sja1105_delta(tas_config->base_time)) {
> +		dev_err(ds->dev, "A base time of zero is not hardware-allowed\n");
> +		return -ERANGE;
> +	}
> +
> +	for (i = 0; i < tas_config->num_entries; i++) {
> +		s64 delta_ns = tas_config->entries[i].interval;
> +		s64 delta_cycles = ns_to_sja1105_delta(delta_ns);
> +		bool too_long, too_short;
> +
> +		too_long = (delta_cycles >= SJA1105_TAS_MAX_DELTA);
> +		too_short = (delta_cycles == 0);
> +		if (too_long || too_short) {
> +			dev_err(priv->ds->dev,
> +				"Interval %llu too %s for GCL entry %d\n",
> +				delta_ns, too_long ? "long" : "short", i);
> +			return -ERANGE;
> +		}
> +	}
> +
> +	if (sja1105_tas_check_conflicts(priv, tas_config))
> +		return -ERANGE;
> +
> +	tas_data->config[port] = taprio_get(tas_config);
> +
> +	rc = sja1105_init_scheduling(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sja1105_static_config_reload(priv);
> +}
> +
> +void sja1105_tas_setup(struct sja1105_private *priv)
> +{
> +}
> +
> +void sja1105_tas_teardown(struct sja1105_private *priv)
> +{
> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
> +	int port;
> +
> +	for (port = 0; port < SJA1105_NUM_PORTS; port++)
> +		if (tas_data->config[port])
> +			taprio_free(tas_data->config[port]);
> +}
> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.h b/drivers/net/dsa/sja1105/sja1105_tas.h
> new file mode 100644
> index 000000000000..0ef82810d9d7
> --- /dev/null
> +++ b/drivers/net/dsa/sja1105/sja1105_tas.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
> + */
> +#ifndef _SJA1105_TAS_H
> +#define _SJA1105_TAS_H
> +
> +#include <net/pkt_sched.h>
> +
> +#if IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS)
> +
> +struct sja1105_tas_data {
> +	struct tc_taprio_qopt_offload *config[SJA1105_NUM_PORTS];
> +};
> +
> +int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
> +			    struct tc_taprio_qopt_offload *qopt);
> +
> +void sja1105_tas_setup(struct sja1105_private *priv);
> +
> +void sja1105_tas_teardown(struct sja1105_private *priv);
> +
> +#else
> +
> +/* C doesn't allow empty structures, bah! */
> +struct sja1105_tas_data {
> +	u8 dummy;
> +};
> +
> +static inline int
> +sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
> +			struct tc_taprio_qopt_offload *qopt)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline void sja1105_tas_setup(struct sja1105_private *priv) { }
> +
> +static inline void sja1105_tas_teardown(struct sja1105_private *priv) { }
> +
> +#endif /* IS_ENABLED(CONFIG_NET_DSA_SJA1105_TAS) */
> +
> +#endif /* _SJA1105_TAS_H */
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload
  2019-09-11 19:45   ` Vinicius Costa Gomes
@ 2019-09-12  1:30     ` Vladimir Oltean
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2019-09-12  1:30 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: f.fainelli, vivien.didelot, andrew, davem, vedang.patel,
	richardcochran, weifeng.voon, jiri, m-karicheri2, Jose.Abreu,
	ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev

Hi Vinicius,

On 11/09/2019, Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote:
> Hi,
>
> Vladimir Oltean <olteanv@gmail.com> writes:
>
>> This qdisc offload is the closest thing to what the SJA1105 supports in
>> hardware for time-based egress shaping. The switch core really is built
>> around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
>> operate similarly to IEEE 802.1Qbv with some constraints:
>>
>> - The gate control list is a global list for all ports. There are 8
>>   execution threads that iterate through this global list in parallel.
>>   I don't know why 8, there are only 4 front-panel ports.
>>
>> - Care must be taken by the user to make sure that two execution threads
>>   never get to execute a GCL entry simultaneously. I created a O(n^4)
>>   checker for this hardware limitation, prior to accepting a taprio
>>   offload configuration as valid.
>>
>> - The spec says that if a GCL entry's interval is shorter than the frame
>>   length, you shouldn't send it (and end up in head-of-line blocking).
>>   Well, this switch does anyway.
>>
>> - The switch has no concept of ADMIN and OPER configurations. Because
>>   it's so simple, the TAS settings are loaded through the static config
>>   tables interface, so there isn't even place for any discussion about
>>   'graceful switchover between ADMIN and OPER'. You just reset the
>>   switch and upload a new OPER config.
>>
>> - The switch accepts multiple time sources for the gate events. Right
>>   now I am using the standalone clock source as opposed to PTP. So the
>>   base time parameter doesn't really do much. Support for the PTP clock
>>   source will be added in the next patch.
>>
>> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
>> ---
>> Changes since RFC:
>> - Removed the sja1105_tas_config_work workqueue.
>> - Allocating memory with GFP_KERNEL.
>> - Made the ASCII art drawing fit in < 80 characters.
>> - Made most of the time-holding variables s64 instead of u64 (for fear
>>   of them not holding the result of signed arithmetics properly).
>>
>>  drivers/net/dsa/sja1105/Kconfig        |   8 +
>>  drivers/net/dsa/sja1105/Makefile       |   4 +
>>  drivers/net/dsa/sja1105/sja1105.h      |   5 +
>>  drivers/net/dsa/sja1105/sja1105_main.c |  19 +-
>>  drivers/net/dsa/sja1105/sja1105_tas.c  | 420 +++++++++++++++++++++++++
>>  drivers/net/dsa/sja1105/sja1105_tas.h  |  42 +++
>>  6 files changed, 497 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
>>  create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h
>>
>> diff --git a/drivers/net/dsa/sja1105/Kconfig
>> b/drivers/net/dsa/sja1105/Kconfig
>> index 770134a66e48..55424f39cb0d 100644
>> --- a/drivers/net/dsa/sja1105/Kconfig
>> +++ b/drivers/net/dsa/sja1105/Kconfig
>> @@ -23,3 +23,11 @@ config NET_DSA_SJA1105_PTP
>>  	help
>>  	  This enables support for timestamping and PTP clock manipulations in
>>  	  the SJA1105 DSA driver.
>> +
>> +config NET_DSA_SJA1105_TAS
>> +	bool "Support for the Time-Aware Scheduler on NXP SJA1105"
>> +	depends on NET_DSA_SJA1105
>> +	help
>> +	  This enables support for the TTEthernet-based egress scheduling
>> +	  engine in the SJA1105 DSA driver, which is controlled using a
>> +	  hardware offload of the tc-tqprio qdisc.
>> diff --git a/drivers/net/dsa/sja1105/Makefile
>> b/drivers/net/dsa/sja1105/Makefile
>> index 4483113e6259..66161e874344 100644
>> --- a/drivers/net/dsa/sja1105/Makefile
>> +++ b/drivers/net/dsa/sja1105/Makefile
>> @@ -12,3 +12,7 @@ sja1105-objs := \
>>  ifdef CONFIG_NET_DSA_SJA1105_PTP
>>  sja1105-objs += sja1105_ptp.o
>>  endif
>> +
>> +ifdef CONFIG_NET_DSA_SJA1105_TAS
>> +sja1105-objs += sja1105_tas.o
>> +endif
>> diff --git a/drivers/net/dsa/sja1105/sja1105.h
>> b/drivers/net/dsa/sja1105/sja1105.h
>> index 3ca0b87aa3e4..d95f9ce3b4f9 100644
>> --- a/drivers/net/dsa/sja1105/sja1105.h
>> +++ b/drivers/net/dsa/sja1105/sja1105.h
>> @@ -21,6 +21,7 @@
>>  #define SJA1105_AGEING_TIME_MS(ms)	((ms) / 10)
>>
>>  #include "sja1105_ptp.h"
>> +#include "sja1105_tas.h"
>>
>>  /* Keeps the different addresses between E/T and P/Q/R/S */
>>  struct sja1105_regs {
>> @@ -96,6 +97,7 @@ struct sja1105_private {
>>  	struct mutex mgmt_lock;
>>  	struct sja1105_tagger_data tagger_data;
>>  	struct sja1105_ptp_data ptp_data;
>> +	struct sja1105_tas_data tas_data;
>>  };
>>
>>  #include "sja1105_dynamic_config.h"
>> @@ -111,6 +113,9 @@ typedef enum {
>>  	SPI_WRITE = 1,
>>  } sja1105_spi_rw_mode_t;
>>
>> +/* From sja1105_main.c */
>> +int sja1105_static_config_reload(struct sja1105_private *priv);
>> +
>>  /* From sja1105_spi.c */
>>  int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
>>  				sja1105_spi_rw_mode_t rw, u64 reg_addr,
>> diff --git a/drivers/net/dsa/sja1105/sja1105_main.c
>> b/drivers/net/dsa/sja1105/sja1105_main.c
>> index 8b930cc2dabc..4b393782cc84 100644
>> --- a/drivers/net/dsa/sja1105/sja1105_main.c
>> +++ b/drivers/net/dsa/sja1105/sja1105_main.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/if_ether.h>
>>  #include <linux/dsa/8021q.h>
>>  #include "sja1105.h"
>> +#include "sja1105_tas.h"
>>
>>  static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int
>> pulse_len,
>>  			     unsigned int startup_delay)
>> @@ -1382,7 +1383,7 @@ static void sja1105_bridge_leave(struct dsa_switch
>> *ds, int port,
>>   * modify at runtime (currently only MAC) and restore them after
>> uploading,
>>   * such that this operation is relatively seamless.
>>   */
>> -static int sja1105_static_config_reload(struct sja1105_private *priv)
>> +int sja1105_static_config_reload(struct sja1105_private *priv)
>>  {
>>  	struct ptp_system_timestamp ptp_sts_before;
>>  	struct ptp_system_timestamp ptp_sts_after;
>> @@ -1761,6 +1762,7 @@ static void sja1105_teardown(struct dsa_switch *ds)
>>  {
>>  	struct sja1105_private *priv = ds->priv;
>>
>> +	sja1105_tas_teardown(priv);
>>  	cancel_work_sync(&priv->tagger_data.rxtstamp_work);
>>  	skb_queue_purge(&priv->tagger_data.skb_rxtstamp_queue);
>>  	sja1105_ptp_clock_unregister(priv);
>> @@ -2088,6 +2090,18 @@ static bool sja1105_port_txtstamp(struct dsa_switch
>> *ds, int port,
>>  	return true;
>>  }
>>
>> +static int sja1105_port_setup_tc(struct dsa_switch *ds, int port,
>> +				 enum tc_setup_type type,
>> +				 void *type_data)
>> +{
>> +	switch (type) {
>> +	case TC_SETUP_QDISC_TAPRIO:
>> +		return sja1105_setup_tc_taprio(ds, port, type_data);
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +}
>> +
>>  static const struct dsa_switch_ops sja1105_switch_ops = {
>>  	.get_tag_protocol	= sja1105_get_tag_protocol,
>>  	.setup			= sja1105_setup,
>> @@ -2120,6 +2134,7 @@ static const struct dsa_switch_ops
>> sja1105_switch_ops = {
>>  	.port_hwtstamp_set	= sja1105_hwtstamp_set,
>>  	.port_rxtstamp		= sja1105_port_rxtstamp,
>>  	.port_txtstamp		= sja1105_port_txtstamp,
>> +	.port_setup_tc		= sja1105_port_setup_tc,
>>  };
>>
>>  static int sja1105_check_device_id(struct sja1105_private *priv)
>> @@ -2229,6 +2244,8 @@ static int sja1105_probe(struct spi_device *spi)
>>  	}
>>  	mutex_init(&priv->mgmt_lock);
>>
>> +	sja1105_tas_setup(priv);
>> +
>>  	return dsa_register_switch(priv->ds);
>>  }
>>
>> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c
>> b/drivers/net/dsa/sja1105/sja1105_tas.c
>> new file mode 100644
>> index 000000000000..769e1d8e5e8f
>> --- /dev/null
>> +++ b/drivers/net/dsa/sja1105/sja1105_tas.c
>> @@ -0,0 +1,420 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
>> + */
>> +#include "sja1105.h"
>> +
>> +#define SJA1105_TAS_CLKSRC_DISABLED	0
>> +#define SJA1105_TAS_CLKSRC_STANDALONE	1
>> +#define SJA1105_TAS_CLKSRC_AS6802	2
>> +#define SJA1105_TAS_CLKSRC_PTP		3
>> +#define SJA1105_GATE_MASK		GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
>> +#define SJA1105_TAS_MAX_DELTA		BIT(19)
>> +
>> +/* This is not a preprocessor macro because the "ns" argument may or may
>> not be
>> + * s64 at caller side. This ensures it is properly type-cast before
>> div_s64.
>> + */
>> +static s64 ns_to_sja1105_delta(s64 ns)
>> +{
>> +	return div_s64(ns, 200);
>> +}
>> +
>> +/* Lo and behold: the egress scheduler from hell.
>> + *
>> + * At the hardware level, the Time-Aware Shaper holds a global linear
>> arrray of
>> + * all schedule entries for all ports. These are the Gate Control List
>> (GCL)
>> + * entries, let's call them "timeslots" for short. This linear array of
>> + * timeslots is held in BLK_IDX_SCHEDULE.
>> + *
>> + * Then there are a maximum of 8 "execution threads" inside the switch,
>> which
>> + * iterate cyclically through the "schedule". Each "cycle" has an entry
>> point
>> + * and an exit point, both being timeslot indices in the schedule table.
>> The
>> + * hardware calls each cycle a "subschedule".
>> + *
>> + * Subschedule (cycle) i starts when
>> + *   ptpclkval >= ptpschtm + BLK_IDX_SCHEDULE_ENTRY_POINTS[i].delta.
>> + *
>> + * The hardware scheduler iterates BLK_IDX_SCHEDULE with a k ranging
>> from
>> + *   k = BLK_IDX_SCHEDULE_ENTRY_POINTS[i].address to
>> + *   k = BLK_IDX_SCHEDULE_PARAMS.subscheind[i]
>> + *
>> + * For each schedule entry (timeslot) k, the engine executes the gate
>> control
>> + * list entry for the duration of BLK_IDX_SCHEDULE[k].delta.
>> + *
>> + *         +---------+
>> + *         |         | BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS
>> + *         +---------+
>> + *              |
>> + *              +-----------------+
>> + *                                | .actsubsch
>> + *  BLK_IDX_SCHEDULE_ENTRY_POINTS v
>> + *                 +-------+-------+
>> + *                 |cycle 0|cycle 1|
>> + *                 +-------+-------+
>> + *                   |  |      |  |
>> + *  +----------------+  |      |
>> +-------------------------------------+
>> + *  |   .subschindx     |      |             .subschindx
>> |
>> + *  |                   |      +---------------+
>> |
>> + *  |          .address |        .address      |
>> |
>> + *  |                   |                      |
>> |
>> + *  |                   |                      |
>> |
>> + *  |  BLK_IDX_SCHEDULE v                      v
>> |
>> + *  |              +-------+-------+-------+-------+-------+------+
>> |
>> + *  |              |entry 0|entry 1|entry 2|entry 3|entry 4|entry5|
>> |
>> + *  |              +-------+-------+-------+-------+-------+------+
>> |
>> + *  |                                  ^                    ^  ^  ^
>> |
>> + *  |                                  |                    |  |  |
>> |
>> + *  |        +-------------------------+                    |  |  |
>> |
>> + *  |        |              +-------------------------------+  |  |
>> |
>> + *  |        |              |              +-------------------+  |
>> |
>> + *  |        |              |              |                      |
>> |
>> + *  | +---------------------------------------------------------------+
>> |
>> + *  | |subscheind[0]<=subscheind[1]<=subscheind[2]<=...<=subscheind[7]|
>> |
>> + *  | +---------------------------------------------------------------+
>> |
>> + *  |        ^              ^                BLK_IDX_SCHEDULE_PARAMS
>> |
>> + *  |        |              |
>> |
>> + *  +--------+
>> +-------------------------------------------+
>> + *
>> + *  In the above picture there are two subschedules (cycles):
>> + *
>> + *  - cycle 0: iterates the schedule table from 0 to 2 (and back)
>> + *  - cycle 1: iterates the schedule table from 3 to 5 (and back)
>> + *
>> + *  All other possible execution threads must be marked as unused by
>> making
>> + *  their "subschedule end index" (subscheind) equal to the last valid
>> + *  subschedule's end index (in this case 5).
>> + */
>> +static int sja1105_init_scheduling(struct sja1105_private *priv)
>> +{
>> +	struct sja1105_schedule_entry_points_entry *schedule_entry_points;
>> +	struct sja1105_schedule_entry_points_params_entry
>> +					*schedule_entry_points_params;
>> +	struct sja1105_schedule_params_entry *schedule_params;
>> +	struct sja1105_tas_data *tas_data = &priv->tas_data;
>> +	struct sja1105_schedule_entry *schedule;
>> +	struct sja1105_table *table;
>> +	int subscheind[8] = {0};
>> +	int schedule_start_idx;
>> +	s64 entry_point_delta;
>> +	int schedule_end_idx;
>> +	int num_entries = 0;
>> +	int num_cycles = 0;
>> +	int cycle = 0;
>> +	int i, k = 0;
>> +	int port;
>> +
>> +	/* Discard previous Schedule Table */
>> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
>> +	if (table->entry_count) {
>> +		kfree(table->entries);
>> +		table->entry_count = 0;
>> +	}
>> +
>> +	/* Discard previous Schedule Entry Points Parameters Table */
>> +	table =
>> &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
>> +	if (table->entry_count) {
>> +		kfree(table->entries);
>> +		table->entry_count = 0;
>> +	}
>> +
>> +	/* Discard previous Schedule Parameters Table */
>> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
>> +	if (table->entry_count) {
>> +		kfree(table->entries);
>> +		table->entry_count = 0;
>> +	}
>> +
>> +	/* Discard previous Schedule Entry Points Table */
>> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
>> +	if (table->entry_count) {
>> +		kfree(table->entries);
>> +		table->entry_count = 0;
>> +	}
>> +
>> +	/* Figure out the dimensioning of the problem */
>> +	for (port = 0; port < SJA1105_NUM_PORTS; port++) {
>> +		if (tas_data->config[port]) {
>> +			num_entries += tas_data->config[port]->num_entries;
>> +			num_cycles++;
>> +		}
>> +	}
>> +
>> +	/* Nothing to do */
>> +	if (!num_cycles)
>> +		return 0;
>> +
>> +	/* Pre-allocate space in the static config tables */
>> +
>> +	/* Schedule Table */
>> +	table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
>> +	table->entries = kcalloc(num_entries, table->ops->unpacked_entry_size,
>> +				 GFP_KERNEL);
>> +	if (!table->entries)
>> +		return -ENOMEM;
>> +	table->entry_count = num_entries;
>> +	schedule = table->entries;
>> +
>> +	/* Schedule Points Parameters Table */
>> +	table =
>> &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
>> +	table->entries =
>> kcalloc(SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
>> +				 table->ops->unpacked_entry_size, GFP_KERNEL);
>> +	if (!table->entries)
>> +		return -ENOMEM;
>
> Should this free the previous allocation, in case this one fails?
> (also applies to the statements below)
>

I had to take a look at the overall driver code again, since it's
already been a while since I added it and I couldn't remember exactly.
All memory is freed automagically in sja1105_static_config_free from
sja1105_static_config.c. That simplifies driver code considerably,
although it's so generic that I forgot that it's there.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 16:25 [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 01/15] net: dsa: sja1105: Change the PTP command access pattern Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 02/15] net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_info Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 03/15] net: dsa: sja1105: Switch to hardware operations for PTP Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 04/15] net: dsa: sja1105: Implement the .gettimex64 system call " Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 05/15] net: dsa: sja1105: Restore PTP time after switch reset Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 06/15] net: dsa: sja1105: Disallow management xmit during " Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 07/15] net: dsa: sja1105: Move PTP data to its own private structure Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 08/15] net: dsa: sja1105: Advertise the 8 TX queues Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 09/15] taprio: Add support for hardware offloading Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 10/15] net: dsa: Pass ndo_setup_tc slave callback to drivers Vladimir Oltean
2019-09-04  7:50   ` Kurt Kanzenbach
2019-09-02 16:25 ` [PATCH v1 net-next 11/15] net: dsa: sja1105: Add static config tables for scheduling Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload Vladimir Oltean
2019-09-11 19:45   ` Vinicius Costa Gomes
2019-09-12  1:30     ` Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 13/15] net: dsa: sja1105: Make HOSTPRIO a kernel config Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 14/15] net: dsa: sja1105: Make the PTP command read-write Vladimir Oltean
2019-09-02 16:25 ` [PATCH v1 net-next 15/15] net: dsa: sja1105: Implement state machine for TAS with PTP clock source Vladimir Oltean
2019-09-11 19:43   ` Vinicius Costa Gomes
2019-09-06 12:54 ` [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA David Miller
2019-09-07 14:45   ` Andrew Lunn
2019-09-08 11:07     ` Vladimir Oltean
2019-09-08 20:42       ` Andrew Lunn
2019-09-09  6:52         ` Richard Cochran
2019-09-09 12:36         ` Joergen Andreasen
2019-09-10  1:46           ` Vladimir Oltean
2019-09-09  7:04       ` Richard Cochran
2019-09-07 13:55 ` David Miller
2019-09-09 23:49   ` Gomes, Vinicius
2019-09-10  1:06     ` Vladimir Oltean
2019-09-11  0:45       ` Gomes, Vinicius
2019-09-11 11:51         ` Vladimir Oltean

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox