linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] TC-ETF support PTP clocks series
@ 2020-10-01 20:51 Erez Geva
  2020-10-01 20:51 ` [PATCH 1/7] POSIX clock ID check function Erez Geva
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Add support for using PTP clock with
 Traffic control Earliest TxTime First (ETF) Qdisc.

Why do we need ETF to use PTP clock?
Current ETF requires to synchronization the system clock
 to the PTP Hardware clock (PHC) we want to send through.
But there are cases that we can not synchronize the system clock with
 the desire NIC PHC.
1. We use several NICs with several PTP domains that our device
    is not allowed to be PTP master.
   And we are not allowed to synchronize these PTP domains.
2. We are using another clock source which we need for our system.
   Yet our device is not allowed to be PTP master.
Regardless of the exact topology, as the Linux tradition is to allow
 the user the freedom to choose, we propose a patch that will allow
 the user to configure the TC-ETF with a PTP clock as well as
 using the system clock.
* NOTE: we do encourage the users to synchronize the system clock with
  a PTP clock.
 As the ETF watchdog uses the system clock.
 Synchronizing the system clock with a PTP clock will probably
  reduce the frequency different of the PHC and the system clock.
 As sequence, the user might be able to reduce the ETF delta time
  and the packets latency cross the network.

Follow the decision to derive a dynamic clock ID from the file description
 of an opened PTP clock device file.
We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
We will submit a patch to the "tc" tool from the iproute2 project
 once this patch is accepted.

The patches contain:
1. Add function to verify that a dynamic clock ID is derived
    from file description.
   The function follows the clock ID convention for dynamic clock ID
    for file description.
  The function will be used in the second patch.

2. Function to get main system oscillator calibration state.

3. Functions to get and put POSIX clock reference of
    a PTP Hardware Clock (PHC).
   The get function uses a dynamic clock ID created by application space.
   The purpose is that a module can hold a POSIX clock reference after the
    configuration application closed the PTP clock device file,
    and though the dynamic clock ID can not be used any further.
   The POSIX clock refereces are used by the TC-ETF.

4. A fix of the range check in qdisc_watchdog_schedule_range_ns().

5. During testing of ETF, we notice issue with the high-resolution timer
    the ETF Qdisc watchdog uses.
   The timer was set for a sleep of 300 nanoseconds but
    end up sleeping for 3 milliseconds.
   The problem happens when the timer is already active and
    the current expire is earlier then a new expire.
   So, we add a new TC schedule function that do not reprogram the timer
    under these conditions.
   The use of the function make sense as the Qdisc watchdog does act as
    watchdog.
   The Qdisc watchdog can expire earlier.
   However, if the watchdog is late, packets are dropped.

6. Add kernel configuration for TC-ETF watchdog range.
   As the range is characteristic of Hardware,
   that seems to be the proper way.

7. Add support for using PHC clock with TC-ETF.

Erez Geva (7):
  POSIX clock ID check function
  Function to retrieve main clock state
  Functions to fetch POSIX dynamic clock object
  Fix qdisc_watchdog_schedule_range_ns range check
  Traffic control using high-resolution timer issue
  TC-ETF code improvements
  TC-ETF support PTP clocks

 include/linux/posix-clock.h     |  39 +++++++++
 include/linux/posix-timers.h    |   5 ++
 include/linux/timex.h           |   1 +
 include/net/pkt_sched.h         |   2 +
 include/uapi/linux/net_tstamp.h |   5 ++
 kernel/time/posix-clock.c       |  76 ++++++++++++++++
 kernel/time/posix-timers.c      |   2 +-
 kernel/time/timekeeping.c       |   9 ++
 net/sched/Kconfig               |   8 ++
 net/sched/sch_api.c             |  36 +++++++-
 net/sched/sch_etf.c             | 148 +++++++++++++++++++++++++-------
 11 files changed, 298 insertions(+), 33 deletions(-)


base-commit: a1b8638ba1320e6684aa98233c15255eb803fac7
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/7] POSIX clock ID check function
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 21:56   ` Thomas Gleixner
  2020-10-01 20:51 ` [PATCH 2/7] Function to retrieve main clock state Erez Geva
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Add function to check whether a clock ID refer to
a file descriptor of a POSIX dynamic clock.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 include/linux/posix-timers.h | 5 +++++
 kernel/time/posix-timers.c   | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 896c16d2c5fb..7cb551bbb763 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -57,6 +57,11 @@ static inline int clockid_to_fd(const clockid_t clk)
 	return ~(clk >> 3);
 }
 
+static inline bool is_clockid_fd_clock(const clockid_t clk)
+{
+	return (clk < 0) && ((clk & CLOCKFD_MASK) == CLOCKFD);
+}
+
 #ifdef CONFIG_POSIX_TIMERS
 
 /**
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index bf540f5a4115..806465233303 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1400,7 +1400,7 @@ static const struct k_clock *clockid_to_kclock(const clockid_t id)
 	clockid_t idx = id;
 
 	if (id < 0) {
-		return (id & CLOCKFD_MASK) == CLOCKFD ?
+		return is_clockid_fd_clock(id) ?
 			&clock_posix_dynamic : &clock_posix_cpu;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/7] Function to retrieve main clock state
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
  2020-10-01 20:51 ` [PATCH 1/7] POSIX clock ID check function Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 22:05   ` Thomas Gleixner
  2020-10-01 20:51 ` [PATCH 3/7] Functions to fetch POSIX dynamic clock object Erez Geva
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Add kernel function to retrieve main clock oscillator state.
As calibration is done from user space daemon,
the kernel access function permit read only.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 include/linux/timex.h     | 1 +
 kernel/time/timekeeping.c | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index ce0859763670..03bc63bf3073 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -153,6 +153,7 @@ extern unsigned long tick_nsec;		/* SHIFTED_HZ period (nsec) */
 
 extern int do_adjtimex(struct __kernel_timex *);
 extern int do_clock_adjtime(const clockid_t which_clock, struct __kernel_timex * ktx);
+extern int adjtimex(struct __kernel_timex *txc);
 
 extern void hardpps(const struct timespec64 *, const struct timespec64 *);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f388a83f..2248fa257ff8 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2372,6 +2372,15 @@ int do_adjtimex(struct __kernel_timex *txc)
 	return ret;
 }
 
+int adjtimex(struct __kernel_timex *txc)
+{
+	if (txc->modes != 0)
+		return -EINVAL;
+
+	return do_adjtimex(txc);
+}
+EXPORT_SYMBOL_GPL(adjtimex);
+
 #ifdef CONFIG_NTP_PPS
 /**
  * hardpps() - Accessor function to NTP __hardpps function
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/7] Functions to fetch POSIX dynamic clock object
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
  2020-10-01 20:51 ` [PATCH 1/7] POSIX clock ID check function Erez Geva
  2020-10-01 20:51 ` [PATCH 2/7] Function to retrieve main clock state Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 23:35   ` Thomas Gleixner
  2020-10-01 20:51 ` [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check Erez Geva
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Add kernel functions to fetch a pointer to a POSIX dynamic clock
using a user file description dynamic clock ID.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 include/linux/posix-clock.h | 39 +++++++++++++++++++
 kernel/time/posix-clock.c   | 76 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)

diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 468328b1e1dd..e90bd90d3a01 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -116,4 +116,43 @@ int posix_clock_register(struct posix_clock *clk, struct device *dev);
  */
 void posix_clock_unregister(struct posix_clock *clk);
 
+/**
+ * posix_clock_get_clock() - get reference to a posix clock
+ * @id: A user clockid that uses a posix clock
+ *
+ * Used by kernel code to get a reference to a posix clock.
+ * Increase the reference count, ensure the referece is not removed.
+ */
+struct posix_clock *posix_clock_get_clock(clockid_t id);
+
+/**
+ * posix_clock_put_clock() - release a reference to a posix clock
+ * @clk: The reference to a posix clock to release
+ *
+ * Release a reference to a posix clock.
+ * Reduce the reference count.
+ */
+int posix_clock_put_clock(struct posix_clock *clk);
+
+/**
+ * posix_clock_gettime() - get time from posix clock
+ * @clk: A reference to a posix clock
+ * @ts: pointer to a time structure used to store the time from the posix clock
+ *
+ * Retrieve the time from a posix clock.
+ * In case the clock device was removed, the function return error.
+ */
+int posix_clock_gettime(struct posix_clock *clk, struct timespec64 *ts);
+
+/**
+ * posix_clock_adjtime() - get tune parameters from posix clock
+ * @clk: A reference to a posix clock
+ * @tx: pointer to a kernel timex structure used to store
+ *      the tune parameters from the posix clock
+ *
+ * Retrieve the tune parameters from a posix clock.
+ * In case the clock device was removed, the function return error.
+ */
+int posix_clock_adjtime(struct posix_clock *clk, struct __kernel_timex *tx);
+
 #endif
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index 77c0c2370b6d..1e205eea6ebd 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -315,3 +315,79 @@ const struct k_clock clock_posix_dynamic = {
 	.clock_get_timespec	= pc_clock_gettime,
 	.clock_adj		= pc_clock_adjtime,
 };
+
+struct posix_clock *posix_clock_get_clock(clockid_t id)
+{
+	int err;
+	struct posix_clock_desc cd;
+
+	/* Verify we use posix clock ID */
+	if (!is_clockid_fd_clock(id))
+		return ERR_PTR(-EINVAL);
+
+	err = get_clock_desc(id, &cd);
+	if (err)
+		return ERR_PTR(err);
+
+	get_device(cd.clk->dev);
+
+	put_clock_desc(&cd);
+
+	return cd.clk;
+}
+EXPORT_SYMBOL_GPL(posix_clock_get_clock);
+
+int posix_clock_put_clock(struct posix_clock *clk)
+{
+	if (IS_ERR_OR_NULL(clk))
+		return -EINVAL;
+	put_device(clk->dev);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(posix_clock_put_clock);
+
+int posix_clock_gettime(struct posix_clock *clk, struct timespec64 *ts)
+{
+	int err;
+
+	if (IS_ERR_OR_NULL(clk))
+		return -EINVAL;
+
+	down_read(&clk->rwsem);
+
+	if (clk->zombie)
+		err = -ENODEV;
+	else if (clk->ops.clock_gettime)
+		err = clk->ops.clock_gettime(clk, ts);
+	else
+		err = -EOPNOTSUPP;
+
+	up_read(&clk->rwsem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(posix_clock_gettime);
+
+int posix_clock_adjtime(struct posix_clock *clk, struct __kernel_timex *tx)
+{
+	int err;
+
+	/* Allow read only */
+	if (tx->modes != 0)
+		return -EINVAL;
+
+	if (IS_ERR_OR_NULL(clk))
+		return -EINVAL;
+
+	down_read(&clk->rwsem);
+
+	if (clk->zombie)
+		err = -ENODEV;
+	else if (clk->ops.clock_adjtime)
+		err = clk->ops.clock_adjtime(clk, tx);
+	else
+		err = -EOPNOTSUPP;
+
+	up_read(&clk->rwsem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(posix_clock_adjtime);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
                   ` (2 preceding siblings ...)
  2020-10-01 20:51 ` [PATCH 3/7] Functions to fetch POSIX dynamic clock object Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 22:44   ` Thomas Gleixner
  2020-10-01 20:51 ` [PATCH 5/7] Traffic control using high-resolution timer issue Erez Geva
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

   - As all parameters are unsigned.

   - If 'expires' is bigger than 'last_expires' then the left expression
     overflows.

   - It is better to use addition and check both ends of the range.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 net/sched/sch_api.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2a76a2f5ed88..ebf59ca1faab 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -632,7 +632,8 @@ void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
 		/* If timer is already set in [expires, expires + delta_ns],
 		 * do not reprogram it.
 		 */
-		if (wd->last_expires - expires <= delta_ns)
+		if (wd->last_expires >= expires &&
+		    wd->last_expires <= expires + delta_ns)
 			return;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/7] Traffic control using high-resolution timer issue
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
                   ` (3 preceding siblings ...)
  2020-10-01 20:51 ` [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 23:07   ` Thomas Gleixner
  2020-10-01 20:51 ` [PATCH 6/7] TC-ETF code improvements Erez Geva
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

  - Add new schedule function for Qdisc watchdog.
    The function avoids reprogram if watchdog already expire
    before new expire.

  - Use new schedule function in ETF.

  - Add ETF range value to kernel configuration.
    as the value is characteristic to Hardware.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 include/net/pkt_sched.h |  2 ++
 net/sched/Kconfig       |  8 ++++++++
 net/sched/sch_api.c     | 33 +++++++++++++++++++++++++++++++++
 net/sched/sch_etf.c     | 10 ++++++++--
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index ac8c890a2657..4306c2773778 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -78,6 +78,8 @@ void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc);
 
 void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
 				      u64 delta_ns);
+void qdisc_watchdog_schedule_soon_ns(struct qdisc_watchdog *wd, u64 expires,
+				     u64 delta_ns);
 
 static inline void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd,
 					      u64 expires)
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a3b37d88800e..0f5261ee9e1b 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -195,6 +195,14 @@ config NET_SCH_ETF
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_etf.
 
+config NET_SCH_ETF_TIMER_RANGE
+	int "ETF Watchdog time range delta in nano seconds"
+	depends on NET_SCH_ETF
+	default 5000
+	help
+	  Specify the time range delta for ETF watchdog
+	  Default is 5 microsecond
+
 config NET_SCH_TAPRIO
 	tristate "Time Aware Priority (taprio) Scheduler"
 	help
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ebf59ca1faab..80bd09555f5e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -645,6 +645,39 @@ void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
 }
 EXPORT_SYMBOL(qdisc_watchdog_schedule_range_ns);
 
+void qdisc_watchdog_schedule_soon_ns(struct qdisc_watchdog *wd, u64 expires,
+				     u64 delta_ns)
+{
+	if (test_bit(__QDISC_STATE_DEACTIVATED,
+		     &qdisc_root_sleeping(wd->qdisc)->state))
+		return;
+
+	if (wd->last_expires == expires)
+		return;
+
+	/**
+	 * If expires is in [0, now + delta_ns],
+	 * do not program it.
+	 */
+	if (expires <= ktime_to_ns(hrtimer_cb_get_time(&wd->timer)) + delta_ns)
+		return;
+
+	/**
+	 * If timer is already set in [0, expires + delta_ns],
+	 * do not reprogram it.
+	 */
+	if (hrtimer_is_queued(&wd->timer) &&
+	    wd->last_expires <= expires + delta_ns)
+		return;
+
+	wd->last_expires = expires;
+	hrtimer_start_range_ns(&wd->timer,
+			       ns_to_ktime(expires),
+			       delta_ns,
+			       HRTIMER_MODE_ABS_PINNED);
+}
+EXPORT_SYMBOL(qdisc_watchdog_schedule_soon_ns);
+
 void qdisc_watchdog_cancel(struct qdisc_watchdog *wd)
 {
 	hrtimer_cancel(&wd->timer);
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c48f91075b5c..48b2868c4672 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -20,6 +20,11 @@
 #include <net/pkt_sched.h>
 #include <net/sock.h>
 
+#ifdef CONFIG_NET_SCH_ETF_TIMER_RANGE
+#define NET_SCH_ETF_TIMER_RANGE CONFIG_NET_SCH_ETF_TIMER_RANGE
+#else
+#define NET_SCH_ETF_TIMER_RANGE (5 * NSEC_PER_USEC)
+#endif
 #define DEADLINE_MODE_IS_ON(x) ((x)->flags & TC_ETF_DEADLINE_MODE_ON)
 #define OFFLOAD_IS_ON(x) ((x)->flags & TC_ETF_OFFLOAD_ON)
 #define SKIP_SOCK_CHECK_IS_SET(x) ((x)->flags & TC_ETF_SKIP_SOCK_CHECK)
@@ -128,8 +133,9 @@ static void reset_watchdog(struct Qdisc *sch)
 		return;
 	}
 
-	next = ktime_sub_ns(skb->tstamp, q->delta);
-	qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
+	next = ktime_sub_ns(skb->tstamp, q->delta + NET_SCH_ETF_TIMER_RANGE);
+	qdisc_watchdog_schedule_soon_ns(&q->watchdog, ktime_to_ns(next),
+					NET_SCH_ETF_TIMER_RANGE);
 }
 
 static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/7] TC-ETF code improvements
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
                   ` (4 preceding siblings ...)
  2020-10-01 20:51 ` [PATCH 5/7] Traffic control using high-resolution timer issue Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-01 20:51 ` [PATCH 7/7] TC-ETF support PTP clocks Erez Geva
  2020-10-02 19:01 ` [PATCH 0/7] TC-ETF support PTP clocks series Vinicius Costa Gomes
  7 siblings, 0 replies; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

  - Change clockid to use clockid_t type.

  - Pass pointer to ETF private structure to
    local function instead of retrieving it
    from Qdisc object again.

  - fix comment.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 net/sched/sch_etf.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 48b2868c4672..c0de4c6f9299 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -33,7 +33,7 @@ struct etf_sched_data {
 	bool offload;
 	bool deadline_mode;
 	bool skip_sock_check;
-	int clockid;
+	clockid_t clockid;
 	int queue;
 	s32 delta; /* in ns */
 	ktime_t last; /* The txtime of the last skb sent to the netdevice. */
@@ -54,7 +54,7 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
 	 *
 	 *	* Dynamic clockids are not supported.
 	 *
-	 *	* Delta must be a positive integer.
+	 *	* Delta must be a positive or zero integer.
 	 *
 	 * Also note that for the HW offload case, we must
 	 * expect that system clocks have been synchronized to PHC.
@@ -77,9 +77,9 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
 	return 0;
 }
 
-static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
+static bool is_packet_valid(struct Qdisc *sch, struct etf_sched_data *q,
+			    struct sk_buff *nskb)
 {
-	struct etf_sched_data *q = qdisc_priv(sch);
 	ktime_t txtime = nskb->tstamp;
 	struct sock *sk = nskb->sk;
 	ktime_t now;
@@ -122,9 +122,8 @@ static struct sk_buff *etf_peek_timesortedlist(struct Qdisc *sch)
 	return rb_to_skb(p);
 }
 
-static void reset_watchdog(struct Qdisc *sch)
+static void reset_watchdog(struct Qdisc *sch, struct etf_sched_data *q)
 {
-	struct etf_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb = etf_peek_timesortedlist(sch);
 	ktime_t next;
 
@@ -173,7 +172,7 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 	ktime_t txtime = nskb->tstamp;
 	bool leftmost = true;
 
-	if (!is_packet_valid(sch, nskb)) {
+	if (!is_packet_valid(sch, q, nskb)) {
 		report_sock_error(nskb, EINVAL,
 				  SO_EE_CODE_TXTIME_INVALID_PARAM);
 		return qdisc_drop(nskb, sch, to_free);
@@ -198,15 +197,14 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 	sch->q.qlen++;
 
 	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
-	reset_watchdog(sch);
+	reset_watchdog(sch, q);
 
 	return NET_XMIT_SUCCESS;
 }
 
-static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
-				ktime_t now)
+static void timesortedlist_drop(struct Qdisc *sch, struct etf_sched_data *q,
+				struct sk_buff *skb, ktime_t now)
 {
-	struct etf_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *to_free = NULL;
 	struct sk_buff *tmp = NULL;
 
@@ -234,10 +232,9 @@ static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
 	kfree_skb_list(to_free);
 }
 
-static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
+static void timesortedlist_remove(struct Qdisc *sch, struct etf_sched_data *q,
+				  struct sk_buff *skb)
 {
-	struct etf_sched_data *q = qdisc_priv(sch);
-
 	rb_erase_cached(&skb->rbnode, &q->head);
 
 	/* The rbnode field in the skb re-uses these fields, now that
@@ -270,7 +267,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 
 	/* Drop if packet has expired while in queue. */
 	if (ktime_before(skb->tstamp, now)) {
-		timesortedlist_drop(sch, skb, now);
+		timesortedlist_drop(sch, q, skb, now);
 		skb = NULL;
 		goto out;
 	}
@@ -279,7 +276,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 	 * txtime from deadline to (now + delta).
 	 */
 	if (q->deadline_mode) {
-		timesortedlist_remove(sch, skb);
+		timesortedlist_remove(sch, q, skb);
 		skb->tstamp = now;
 		goto out;
 	}
@@ -288,13 +285,13 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 
 	/* Dequeue only if now is within the [txtime - delta, txtime] range. */
 	if (ktime_after(now, next))
-		timesortedlist_remove(sch, skb);
+		timesortedlist_remove(sch, q, skb);
 	else
 		skb = NULL;
 
 out:
 	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
-	reset_watchdog(sch);
+	reset_watchdog(sch, q);
 
 	return skb;
 }
@@ -423,9 +420,8 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 	return 0;
 }
 
-static void timesortedlist_clear(struct Qdisc *sch)
+static void timesortedlist_clear(struct Qdisc *sch, struct etf_sched_data *q)
 {
-	struct etf_sched_data *q = qdisc_priv(sch);
 	struct rb_node *p = rb_first_cached(&q->head);
 
 	while (p) {
@@ -448,7 +444,7 @@ static void etf_reset(struct Qdisc *sch)
 		qdisc_watchdog_cancel(&q->watchdog);
 
 	/* No matter which mode we are on, it's safe to clear both lists. */
-	timesortedlist_clear(sch);
+	timesortedlist_clear(sch, q);
 	__qdisc_reset_queue(&sch->q);
 
 	sch->qstats.backlog = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/7] TC-ETF support PTP clocks
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
                   ` (5 preceding siblings ...)
  2020-10-01 20:51 ` [PATCH 6/7] TC-ETF code improvements Erez Geva
@ 2020-10-01 20:51 ` Erez Geva
  2020-10-02  0:33   ` Thomas Gleixner
  2020-10-02 19:01 ` [PATCH 0/7] TC-ETF support PTP clocks series Vinicius Costa Gomes
  7 siblings, 1 reply; 21+ messages in thread
From: Erez Geva @ 2020-10-01 20:51 UTC (permalink / raw)
  To: linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

  - Add support for using a POSIX dynamic clock with
    Traffic control Earliest TxTime First (ETF) Qdisc.

Signed-off-by: Erez Geva <erez.geva.ext@siemens.com>
---
 include/uapi/linux/net_tstamp.h |   5 ++
 net/sched/sch_etf.c             | 100 +++++++++++++++++++++++++++++---
 2 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 7ed0b3d1c00a..ecbaac6230d7 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -167,6 +167,11 @@ enum txtime_flags {
 	SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
 				 SOF_TXTIME_FLAGS_LAST
 };
+/*
+ * Clock ID to use with POSIX clocks
+ * The ID must be u8 to fit in (struct sock)->sk_clockid
+ */
+#define SOF_TXTIME_POSIX_CLOCK_ID (0x77)
 
 struct sock_txtime {
 	__kernel_clockid_t	clockid;/* reference clockid */
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c0de4c6f9299..8e3e0a61fa58 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -15,6 +15,7 @@
 #include <linux/rbtree.h>
 #include <linux/skbuff.h>
 #include <linux/posix-timers.h>
+#include <linux/posix-clock.h>
 #include <net/netlink.h>
 #include <net/sch_generic.h>
 #include <net/pkt_sched.h>
@@ -40,19 +41,40 @@ struct etf_sched_data {
 	struct rb_root_cached head;
 	struct qdisc_watchdog watchdog;
 	ktime_t (*get_time)(void);
+#ifdef CONFIG_POSIX_TIMERS
+	struct posix_clock *pclock; /* pointer to a posix clock */
+#endif /* CONFIG_POSIX_TIMERS */
 };
 
 static const struct nla_policy etf_policy[TCA_ETF_MAX + 1] = {
 	[TCA_ETF_PARMS]	= { .len = sizeof(struct tc_etf_qopt) },
 };
 
+static inline ktime_t get_now(struct Qdisc *sch, struct etf_sched_data *q)
+{
+#ifdef CONFIG_POSIX_TIMERS
+	if (IS_ERR_OR_NULL(q->get_time)) {
+		struct timespec64 ts;
+		int err = posix_clock_gettime(q->pclock, &ts);
+
+		if (err) {
+			pr_warn("Clock is disabled (%d) for queue %d\n",
+				err, q->queue);
+			return 0;
+		}
+		return timespec64_to_ktime(ts);
+	}
+#endif /* CONFIG_POSIX_TIMERS */
+	return q->get_time();
+}
+
 static inline int validate_input_params(struct tc_etf_qopt *qopt,
 					struct netlink_ext_ack *extack)
 {
 	/* Check if params comply to the following rules:
 	 *	* Clockid and delta must be valid.
 	 *
-	 *	* Dynamic clockids are not supported.
+	 *	* Dynamic CPU clockids are not supported.
 	 *
 	 *	* Delta must be a positive or zero integer.
 	 *
@@ -60,11 +82,22 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
 	 * expect that system clocks have been synchronized to PHC.
 	 */
 	if (qopt->clockid < 0) {
+#ifdef CONFIG_POSIX_TIMERS
+		/**
+		 * Use of PTP clock through a posix clock.
+		 * The TC application must open the posix clock device file
+		 * and use the dynamic clockid from the file description.
+		 */
+		if (!is_clockid_fd_clock(qopt->clockid)) {
+			NL_SET_ERR_MSG(extack,
+				       "Dynamic CPU clockids are not supported");
+			return -EOPNOTSUPP;
+		}
+#else /* CONFIG_POSIX_TIMERS */
 		NL_SET_ERR_MSG(extack, "Dynamic clockids are not supported");
 		return -ENOTSUPP;
-	}
-
-	if (qopt->clockid != CLOCK_TAI) {
+#endif /* CONFIG_POSIX_TIMERS */
+	} else if (qopt->clockid != CLOCK_TAI) {
 		NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");
 		return -EINVAL;
 	}
@@ -103,7 +136,7 @@ static bool is_packet_valid(struct Qdisc *sch, struct etf_sched_data *q,
 		return false;
 
 skip:
-	now = q->get_time();
+	now = get_now(sch, q);
 	if (ktime_before(txtime, now) || ktime_before(txtime, q->last))
 		return false;
 
@@ -133,6 +166,27 @@ static void reset_watchdog(struct Qdisc *sch, struct etf_sched_data *q)
 	}
 
 	next = ktime_sub_ns(skb->tstamp, q->delta + NET_SCH_ETF_TIMER_RANGE);
+#ifdef CONFIG_POSIX_TIMERS
+	if (!IS_ERR_OR_NULL(q->pclock)) {
+		s64 now;
+		struct timespec64 ts;
+		int err = posix_clock_gettime(q->pclock, &ts);
+
+		if (err) {
+			pr_warn("Clock is disabled (%d) for queue %d\n",
+				err, q->queue);
+			return;
+		}
+		now = timespec64_to_ns(&ts);
+		if (now == 0) {
+			pr_warn("Clock is not running for queue %d\n",
+				q->queue);
+			return;
+		}
+		/* Convert PHC to CLOCK_TAI as that is the timer clock */
+		next = ktime_add(next, ktime_sub_ns(ktime_get_clocktai(), now));
+	}
+#endif /* CONFIG_POSIX_TIMERS */
 	qdisc_watchdog_schedule_soon_ns(&q->watchdog, ktime_to_ns(next),
 					NET_SCH_ETF_TIMER_RANGE);
 }
@@ -263,7 +317,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 	if (!skb)
 		return NULL;
 
-	now = q->get_time();
+	now = get_now(sch, q);
 
 	/* Drop if packet has expired while in queue. */
 	if (ktime_before(skb->tstamp, now)) {
@@ -354,6 +408,7 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 	struct nlattr *tb[TCA_ETF_MAX + 1];
 	struct tc_etf_qopt *qopt;
 	int err;
+	clockid_t clockid;
 
 	if (!opt) {
 		NL_SET_ERR_MSG(extack,
@@ -391,13 +446,17 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 	}
 
 	/* Everything went OK, save the parameters used. */
+	clockid = qopt->clockid;
 	q->delta = qopt->delta;
 	q->clockid = qopt->clockid;
 	q->offload = OFFLOAD_IS_ON(qopt);
 	q->deadline_mode = DEADLINE_MODE_IS_ON(qopt);
 	q->skip_sock_check = SKIP_SOCK_CHECK_IS_SET(qopt);
+#ifdef CONFIG_POSIX_TIMERS
+	q->pclock = NULL;
+#endif /* CONFIG_POSIX_TIMERS */
 
-	switch (q->clockid) {
+	switch (clockid) {
 	case CLOCK_REALTIME:
 		q->get_time = ktime_get_real;
 		break;
@@ -411,11 +470,31 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 		q->get_time = ktime_get_clocktai;
 		break;
 	default:
+#ifdef CONFIG_POSIX_TIMERS
+		q->pclock = posix_clock_get_clock(clockid);
+		if (IS_ERR_OR_NULL(q->pclock)) {
+			NL_SET_ERR_MSG(extack, "Clockid does not exist");
+			return PTR_ERR(q->pclock);
+		}
+		q->clockid = SOF_TXTIME_POSIX_CLOCK_ID;
+		/**
+		 * Posix clocks do not support high-resolution timers
+		 * Using the TAI clock is close enough (for relative sleeps)
+		 */
+		clockid = CLOCK_TAI;
+		/**
+		 * Posix clock do not provide direct ktime function
+		 * We need to call posix_clock_gettime()
+		 */
+		q->get_time = NULL;
+		break;
+#else /* CONFIG_POSIX_TIMERS */
 		NL_SET_ERR_MSG(extack, "Clockid is not supported");
 		return -ENOTSUPP;
+#endif /* CONFIG_POSIX_TIMERS */
 	}
 
-	qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid);
+	qdisc_watchdog_init_clockid(&q->watchdog, sch, clockid);
 
 	return 0;
 }
@@ -463,6 +542,11 @@ static void etf_destroy(struct Qdisc *sch)
 		qdisc_watchdog_cancel(&q->watchdog);
 
 	etf_disable_offload(dev, q);
+#ifdef CONFIG_POSIX_TIMERS
+	/* Release posix clock */
+	if (!IS_ERR_OR_NULL(q->pclock))
+		posix_clock_put_clock(q->pclock);
+#endif /* CONFIG_POSIX_TIMERS */
 }
 
 static int etf_dump(struct Qdisc *sch, struct sk_buff *skb)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] POSIX clock ID check function
  2020-10-01 20:51 ` [PATCH 1/7] POSIX clock ID check function Erez Geva
@ 2020-10-01 21:56   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-01 21:56 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Erez,

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:

thanks for your patches.

First of all subject lines follow the scheme:

 subsystem: Short description

The short description should be a sentence

> Add function to check whether a clock ID refer to
> a file descriptor of a POSIX dynamic clock.

The changelog should not primarily describe what the patch does, it
should describe why or which problem it is solving.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] Function to retrieve main clock state
  2020-10-01 20:51 ` [PATCH 2/7] Function to retrieve main clock state Erez Geva
@ 2020-10-01 22:05   ` Thomas Gleixner
  2020-10-02  0:19     ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-01 22:05 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

Erez,

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:

same comments as for patch 1 apply.

> Add kernel function to retrieve main clock oscillator state.

The function you are adding is named adjtimex(). adjtimex(2) is a well
known user space interface and naming a read only kernel interface the
same way is misleading.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check
  2020-10-01 20:51 ` [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check Erez Geva
@ 2020-10-01 22:44   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-01 22:44 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:

Fixes should be at the beginning of a patch series and not be hidden
somewhere in the middle.

>    - As all parameters are unsigned.

This is not a sentence and this list style does not make that changelog
more readable.

>    - If 'expires' is bigger than 'last_expires' then the left expression
>      overflows.

This would be the most important information and should be clearly
spelled out as problem description at the very beginning of the change
log.

>    - It is better to use addition and check both ends of the range.

Is better? Either your change is correcting the problem or not. Just
better but incorrect does not cut it.

But let's look at the problem itself. The check is about:

    B <= A <= B + C

A, B, C are all unsigned. So if B > A then the result is false.

Now lets look at the implementation:

    if (A - B <= C)
    	return;

which works correctly due the wonders of unsigned math.

For B <= A the check is obviously correct.

If B > A then the result of the unsigned subtraction A - B is a very
large positive number which is pretty much guaranteed to be larger than
C, i.e. the result is false.

So while not immediately obvious, it's still correct.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] Traffic control using high-resolution timer issue
  2020-10-01 20:51 ` [PATCH 5/7] Traffic control using high-resolution timer issue Erez Geva
@ 2020-10-01 23:07   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-01 23:07 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:

Issue? You again fail to decribe the problem.

>   - Add new schedule function for Qdisc watchdog.
>     The function avoids reprogram if watchdog already expire
>     before new expire.

Why can't the existing function not be changed to do that?

>   - Use new schedule function in ETF.
>
>   - Add ETF range value to kernel configuration.
>     as the value is characteristic to Hardware.

No. That's completely wrong. Hardware properties need to be established
at boot/runtime otherwise you can't build a kernel which runs on
different platforms.

> +void qdisc_watchdog_schedule_soon_ns(struct qdisc_watchdog *wd, u64 expires,
> +				     u64 delta_ns)

schedule soon? That sounds like schedule it sooner than later, but I
don't care.

> +{
> +	if (test_bit(__QDISC_STATE_DEACTIVATED,
> +		     &qdisc_root_sleeping(wd->qdisc)->state))
> +		return;
> +
> +	if (wd->last_expires == expires)
> +		return;

How is this supposed to be valid without checking whether the timer is
queued in the first place?

Maybe the function name should be schedule_soon_or_not()

> +	/**

Can you please use proper comment style? This is neither network comment
style nor the regular sane kernel comment style. It's kerneldoc comment
style which is reserved for function and struct documentation.

> +	 * If expires is in [0, now + delta_ns],
> +	 * do not program it.

This range info is just confusing. Just use plain english.

> +	 */
> +	if (expires <= ktime_to_ns(hrtimer_cb_get_time(&wd->timer)) + delta_ns)
> +		return;

So if the watchdog is NOT queued and expiry < now + delta_ns then this
returns and nothing starts the timer ever.

That might all be correct, but without any useful explanation it looks
completely bogus.

> +	/**
> +	 * If timer is already set in [0, expires + delta_ns],
> +	 * do not reprogram it.
> +	 */
> +	if (hrtimer_is_queued(&wd->timer) &&
> +	    wd->last_expires <= expires + delta_ns)
> +		return;
> +
> +	wd->last_expires = expires;
> +	hrtimer_start_range_ns(&wd->timer,
> +			       ns_to_ktime(expires),
> +			       delta_ns,
> +			       HRTIMER_MODE_ABS_PINNED);
> +}
> +EXPORT_SYMBOL(qdisc_watchdog_schedule_soon_ns);
> +
>  void qdisc_watchdog_cancel(struct qdisc_watchdog *wd)
>  {
>  	hrtimer_cancel(&wd->timer);
> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> index c48f91075b5c..48b2868c4672 100644
> --- a/net/sched/sch_etf.c
> +++ b/net/sched/sch_etf.c
> @@ -20,6 +20,11 @@
>  #include <net/pkt_sched.h>
>  #include <net/sock.h>
>  
> +#ifdef CONFIG_NET_SCH_ETF_TIMER_RANGE
> +#define NET_SCH_ETF_TIMER_RANGE CONFIG_NET_SCH_ETF_TIMER_RANGE
> +#else
> +#define NET_SCH_ETF_TIMER_RANGE (5 * NSEC_PER_USEC)
> +#endif
>  #define DEADLINE_MODE_IS_ON(x) ((x)->flags & TC_ETF_DEADLINE_MODE_ON)
>  #define OFFLOAD_IS_ON(x) ((x)->flags & TC_ETF_OFFLOAD_ON)
>  #define SKIP_SOCK_CHECK_IS_SET(x) ((x)->flags & TC_ETF_SKIP_SOCK_CHECK)
> @@ -128,8 +133,9 @@ static void reset_watchdog(struct Qdisc *sch)
>  		return;
>  	}
>  
> -	next = ktime_sub_ns(skb->tstamp, q->delta);
> -	qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
> +	next = ktime_sub_ns(skb->tstamp, q->delta + NET_SCH_ETF_TIMER_RANGE);
> +	qdisc_watchdog_schedule_soon_ns(&q->watchdog, ktime_to_ns(next),
> +					NET_SCH_ETF_TIMER_RANGE);

This is changing 5 things at once. That's just wrong.

patch 1: Add the new function and explain why it's required

patch 2: Make reset_watchdog() use it

patch 3: Add a mechanism to retrieve the magic hardware range from the
         underlying hardware driver and add that value to q->delta or
         set q->delta to it. Whatever makes sense. The current solution
         makes no sense at all

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] Functions to fetch POSIX dynamic clock object
  2020-10-01 20:51 ` [PATCH 3/7] Functions to fetch POSIX dynamic clock object Erez Geva
@ 2020-10-01 23:35   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-01 23:35 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:
> Add kernel functions to fetch a pointer to a POSIX dynamic clock
> using a user file description dynamic clock ID.

And how is that supposed to work. What are the lifetime rules?
  
> +struct posix_clock *posix_clock_get_clock(clockid_t id)
> +{
> +	int err;
> +	struct posix_clock_desc cd;

The core code uses reverse fir tree ordering of variable declaration
based on the length:

	struct posix_clock_desc cd;
        int err;

> +	/* Verify we use posix clock ID */
> +	if (!is_clockid_fd_clock(id))
> +		return ERR_PTR(-EINVAL);
> +
> +	err = get_clock_desc(id, &cd);

So this is a kernel interface and get_clock_desc() does:

	struct file *fp = fget(clockid_to_fd(id));

How is that file descriptor valid in random kernel context?

> +	if (err)
> +		return ERR_PTR(err);
> +
> +	get_device(cd.clk->dev);

The purpose of this is? Comments are overrated...

> +	put_clock_desc(&cd);
> +
> +	return cd.clk;
> +}
> +EXPORT_SYMBOL_GPL(posix_clock_get_clock);
> +
> +int posix_clock_put_clock(struct posix_clock *clk)
> +{
> +	if (IS_ERR_OR_NULL(clk))
> +		return -EINVAL;
> +	put_device(clk->dev);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(posix_clock_put_clock);
> +
> +int posix_clock_gettime(struct posix_clock *clk, struct timespec64 *ts)
> +{
> +	int err;
> +
> +	if (IS_ERR_OR_NULL(clk))
> +		return -EINVAL;
> +
> +	down_read(&clk->rwsem);

Open coding the logic of get_posix_clock() and having a copy here and
in the next function is really useful.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] Function to retrieve main clock state
  2020-10-01 22:05   ` Thomas Gleixner
@ 2020-10-02  0:19     ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-02  0:19 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

On Fri, Oct 02 2020 at 00:05, Thomas Gleixner wrote:
> On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:
>
> same comments as for patch 1 apply.
>
>> Add kernel function to retrieve main clock oscillator state.
>
> The function you are adding is named adjtimex(). adjtimex(2) is a well
> known user space interface and naming a read only kernel interface the
> same way is misleading.

Aside of that there is no user for this function in this series. We're
not adding interfaces just because we can.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] TC-ETF support PTP clocks
  2020-10-01 20:51 ` [PATCH 7/7] TC-ETF support PTP clocks Erez Geva
@ 2020-10-02  0:33   ` Thomas Gleixner
  2020-10-02 11:05     ` Geva, Erez
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-02  0:33 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Vladis Dronov, Sebastian Andrzej Siewior,
	Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vinicius Costa Gomes, Vedang Patel,
	Simon Sudler, Andreas Meisinger, Andreas Bucher, Henning Schild,
	Jan Kiszka, Andreas Zirkler, Ermin Sakic, An Ninh Nguyen,
	Michael Saenger, Bernd Maehringer, Gisela Greinert, Erez Geva,
	Erez Geva

On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:

>   - Add support for using a POSIX dynamic clock with
>     Traffic control Earliest TxTime First (ETF) Qdisc.

....

> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -167,6 +167,11 @@ enum txtime_flags {
>  	SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
>  				 SOF_TXTIME_FLAGS_LAST
>  };
> +/*
> + * Clock ID to use with POSIX clocks
> + * The ID must be u8 to fit in (struct sock)->sk_clockid
> + */
> +#define SOF_TXTIME_POSIX_CLOCK_ID (0x77)

Random number with a random name. 
  
>  struct sock_txtime {
>  	__kernel_clockid_t	clockid;/* reference clockid */
> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> index c0de4c6f9299..8e3e0a61fa58 100644
> --- a/net/sched/sch_etf.c
> +++ b/net/sched/sch_etf.c
> @@ -15,6 +15,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/skbuff.h>
>  #include <linux/posix-timers.h>
> +#include <linux/posix-clock.h>
>  #include <net/netlink.h>
>  #include <net/sch_generic.h>
>  #include <net/pkt_sched.h>
> @@ -40,19 +41,40 @@ struct etf_sched_data {
>  	struct rb_root_cached head;
>  	struct qdisc_watchdog watchdog;
>  	ktime_t (*get_time)(void);
> +#ifdef CONFIG_POSIX_TIMERS
> +	struct posix_clock *pclock; /* pointer to a posix clock */

Tail comments suck because they disturb the reading flow and this
comment has absolute zero value.

Comments are required to explain things which are not obvious...

> +#endif /* CONFIG_POSIX_TIMERS */

Also this #ifdeffery is bonkers. How is TSN supposed to work without
POSIX_TIMERS in the first place?

>  static const struct nla_policy etf_policy[TCA_ETF_MAX + 1] = {
>  	[TCA_ETF_PARMS]	= { .len = sizeof(struct tc_etf_qopt) },
>  };
>  
> +static inline ktime_t get_now(struct Qdisc *sch, struct etf_sched_data *q)
> +{
> +#ifdef CONFIG_POSIX_TIMERS
> +	if (IS_ERR_OR_NULL(q->get_time)) {
> +		struct timespec64 ts;
> +		int err = posix_clock_gettime(q->pclock, &ts);
> +
> +		if (err) {
> +			pr_warn("Clock is disabled (%d) for queue %d\n",
> +				err, q->queue);
> +			return 0;

That's really useful error handling.

> +		}
> +		return timespec64_to_ktime(ts);
> +	}
> +#endif /* CONFIG_POSIX_TIMERS */
> +	return q->get_time();
> +}
> +
>  static inline int validate_input_params(struct tc_etf_qopt *qopt,
>  					struct netlink_ext_ack *extack)
>  {
>  	/* Check if params comply to the following rules:
>  	 *	* Clockid and delta must be valid.
>  	 *
> -	 *	* Dynamic clockids are not supported.
> +	 *	* Dynamic CPU clockids are not supported.
>  	 *
>  	 *	* Delta must be a positive or zero integer.
>  	 *
> @@ -60,11 +82,22 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
>  	 * expect that system clocks have been synchronized to PHC.
>  	 */
>  	if (qopt->clockid < 0) {
> +#ifdef CONFIG_POSIX_TIMERS
> +		/**
> +		 * Use of PTP clock through a posix clock.
> +		 * The TC application must open the posix clock device file
> +		 * and use the dynamic clockid from the file description.

What? How is the code which calls into this guaranteed to have a valid
file descriptor open for a particular dynamic posix clock?

> +		 */
> +		if (!is_clockid_fd_clock(qopt->clockid)) {
> +			NL_SET_ERR_MSG(extack,
> +				       "Dynamic CPU clockids are not supported");
> +			return -EOPNOTSUPP;
> +		}
> +#else /* CONFIG_POSIX_TIMERS */
>  		NL_SET_ERR_MSG(extack, "Dynamic clockids are not supported");
>  		return -ENOTSUPP;
> -	}
> -
> -	if (qopt->clockid != CLOCK_TAI) {
> +#endif /* CONFIG_POSIX_TIMERS */
> +	} else if (qopt->clockid != CLOCK_TAI) {
>  		NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");
>  		return -EINVAL;
>  	}
> @@ -103,7 +136,7 @@ static bool is_packet_valid(struct Qdisc *sch, struct etf_sched_data *q,
>  		return false;
>  
>  skip:
> -	now = q->get_time();
> +	now = get_now(sch, q);

Yuck.

is_packet_valid() is invoked via:

    __dev_queue_xmit()
      __dev_xmit_skb()
         etf_enqueue_timesortedlist()
           is_packet_valid()

__dev_queue_xmit() does

       rcu_read_lock_bh();

and your get_now() does

    posix_clock_gettime()
       	down_read(&clk->rwsem);

 ----> FAIL

down_read() might sleep and cannot be called from a BH disabled
region. This clearly has never been tested with any mandatory debug
option enabled. Why am I not surprised?

Aside of accessing PCH clock being slow at hell this cannot ever work
and there is no way to make it work in any consistent form.

If you have several NICs on several PCH domains then all of these
domains should have one thing in common: CLOCK_TAI and the frequency.

If that's not the case then the overall system design is just broken,
but yes I'm aware of the fact that some industries decided to have their
own definition of time and frequency just because they can.

Handling different starting points of the domains interpretation of
"TAI" is doable because that's just an offset, but having different
frequencies is a nightmare.

So if such a trainwreck is a valid use case which needs to be supported
then just duct taping it into the code is not going to fly.

The only way to make this work is to sit down and actually design a
mechanism which allows to correlate the various notions of PCH time with
the systems CLOCK_TAI, i.e. providing offset and frequency correction.

Also you want to explain how user space applications should deal with
these different time domains in a sane way.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] TC-ETF support PTP clocks
  2020-10-02  0:33   ` Thomas Gleixner
@ 2020-10-02 11:05     ` Geva, Erez
  0 siblings, 0 replies; 21+ messages in thread
From: Geva, Erez @ 2020-10-02 11:05 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, netdev, Cong Wang,
	David S . Miller, Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko,
	Andrei Vagin, Dmitry Safonov, Eric W . Biederman, Ingo Molnar,
	John Stultz, Michal Kubecek, Oleg Nesterov, Peter Zijlstra,
	Richard Cochran, Stephen Boyd, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Vinicius Costa Gomes, Vedang Patel, Sudler, Simon, Meisinger,
	Andreas, Bucher, Andreas, henning.schild, jan.kiszka, Zirkler,
	Andreas, Sakic, Ermin, anninh.nguyen, Saenger, Michael,
	Maehringer, Bernd, gisela.greinert, Erez Geva



On 02/10/2020 02:33, Thomas Gleixner wrote:
> On Thu, Oct 01 2020 at 22:51, Erez Geva wrote:
>
>>    - Add support for using a POSIX dynamic clock with
>>      Traffic control Earliest TxTime First (ETF) Qdisc.
>
> ....
>
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -167,6 +167,11 @@ enum txtime_flags {
>>      SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
>>                               SOF_TXTIME_FLAGS_LAST
>>   };
>> +/*
>> + * Clock ID to use with POSIX clocks
>> + * The ID must be u8 to fit in (struct sock)->sk_clockid
>> + */
>> +#define SOF_TXTIME_POSIX_CLOCK_ID (0x77)
>
> Random number with a random name.
>
>>   struct sock_txtime {
>>      __kernel_clockid_t      clockid;/* reference clockid */
>> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
>> index c0de4c6f9299..8e3e0a61fa58 100644
>> --- a/net/sched/sch_etf.c
>> +++ b/net/sched/sch_etf.c
>> @@ -15,6 +15,7 @@
>>   #include <linux/rbtree.h>
>>   #include <linux/skbuff.h>
>>   #include <linux/posix-timers.h>
>> +#include <linux/posix-clock.h>
>>   #include <net/netlink.h>
>>   #include <net/sch_generic.h>
>>   #include <net/pkt_sched.h>
>> @@ -40,19 +41,40 @@ struct etf_sched_data {
>>      struct rb_root_cached head;
>>      struct qdisc_watchdog watchdog;
>>      ktime_t (*get_time)(void);
>> +#ifdef CONFIG_POSIX_TIMERS
>> +    struct posix_clock *pclock; /* pointer to a posix clock */
>
> Tail comments suck because they disturb the reading flow and this
> comment has absolute zero value.
>
> Comments are required to explain things which are not obvious...
>
>> +#endif /* CONFIG_POSIX_TIMERS */
>
> Also this #ifdeffery is bonkers. How is TSN supposed to work without
> POSIX_TIMERS in the first place?
>
>>   static const struct nla_policy etf_policy[TCA_ETF_MAX + 1] = {
>>      [TCA_ETF_PARMS] = { .len = sizeof(struct tc_etf_qopt) },
>>   };
>>
>> +static inline ktime_t get_now(struct Qdisc *sch, struct etf_sched_data *q)
>> +{
>> +#ifdef CONFIG_POSIX_TIMERS
>> +    if (IS_ERR_OR_NULL(q->get_time)) {
>> +            struct timespec64 ts;
>> +            int err = posix_clock_gettime(q->pclock, &ts);
>> +
>> +            if (err) {
>> +                    pr_warn("Clock is disabled (%d) for queue %d\n",
>> +                            err, q->queue);
>> +                    return 0;
>
> That's really useful error handling.
>
>> +            }
>> +            return timespec64_to_ktime(ts);
>> +    }
>> +#endif /* CONFIG_POSIX_TIMERS */
>> +    return q->get_time();
>> +}
>> +
>>   static inline int validate_input_params(struct tc_etf_qopt *qopt,
>>                                      struct netlink_ext_ack *extack)
>>   {
>>      /* Check if params comply to the following rules:
>>       *      * Clockid and delta must be valid.
>>       *
>> -     *      * Dynamic clockids are not supported.
>> +     *      * Dynamic CPU clockids are not supported.
>>       *
>>       *      * Delta must be a positive or zero integer.
>>       *
>> @@ -60,11 +82,22 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
>>       * expect that system clocks have been synchronized to PHC.
>>       */
>>      if (qopt->clockid < 0) {
>> +#ifdef CONFIG_POSIX_TIMERS
>> +            /**
>> +             * Use of PTP clock through a posix clock.
>> +             * The TC application must open the posix clock device file
>> +             * and use the dynamic clockid from the file description.
>
> What? How is the code which calls into this guaranteed to have a valid
> file descriptor open for a particular dynamic posix clock?
>
>> +             */
>> +            if (!is_clockid_fd_clock(qopt->clockid)) {
>> +                    NL_SET_ERR_MSG(extack,
>> +                                   "Dynamic CPU clockids are not supported");
>> +                    return -EOPNOTSUPP;
>> +            }
>> +#else /* CONFIG_POSIX_TIMERS */
>>              NL_SET_ERR_MSG(extack, "Dynamic clockids are not supported");
>>              return -ENOTSUPP;
>> -    }
>> -
>> -    if (qopt->clockid != CLOCK_TAI) {
>> +#endif /* CONFIG_POSIX_TIMERS */
>> +    } else if (qopt->clockid != CLOCK_TAI) {
>>              NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");
>>              return -EINVAL;
>>      }
>> @@ -103,7 +136,7 @@ static bool is_packet_valid(struct Qdisc *sch, struct etf_sched_data *q,
>>              return false;
>>
>>   skip:
>> -    now = q->get_time();
>> +    now = get_now(sch, q);
>
> Yuck.
>
> is_packet_valid() is invoked via:
>
>      __dev_queue_xmit()
>        __dev_xmit_skb()
>           etf_enqueue_timesortedlist()
>             is_packet_valid()
>
> __dev_queue_xmit() does
>
>         rcu_read_lock_bh();
>
> and your get_now() does
>
>      posix_clock_gettime()
>               down_read(&clk->rwsem);
>
>   ----> FAIL
>
> down_read() might sleep and cannot be called from a BH disabled
> region. This clearly has never been tested with any mandatory debug
> option enabled. Why am I not surprised?
>
> Aside of accessing PCH clock being slow at hell this cannot ever work
> and there is no way to make it work in any consistent form.
>
> If you have several NICs on several PCH domains then all of these
> domains should have one thing in common: CLOCK_TAI and the frequency.
>
> If that's not the case then the overall system design is just broken,
> but yes I'm aware of the fact that some industries decided to have their
> own definition of time and frequency just because they can.
>
> Handling different starting points of the domains interpretation of
> "TAI" is doable because that's just an offset, but having different
> frequencies is a nightmare.
>
> So if such a trainwreck is a valid use case which needs to be supported
> then just duct taping it into the code is not going to fly.
>
> The only way to make this work is to sit down and actually design a
> mechanism which allows to correlate the various notions of PCH time with
> the systems CLOCK_TAI, i.e. providing offset and frequency correction.
>
> Also you want to explain how user space applications should deal with
> these different time domains in a sane way.
>
> Thanks,
>
>          tglx
>

Thank you for your quick and depth feedback.

Erez

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] TC-ETF support PTP clocks series
  2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
                   ` (6 preceding siblings ...)
  2020-10-01 20:51 ` [PATCH 7/7] TC-ETF support PTP clocks Erez Geva
@ 2020-10-02 19:01 ` Vinicius Costa Gomes
  2020-10-02 19:56   ` Geva, Erez
  2020-10-03  0:10   ` Thomas Gleixner
  7 siblings, 2 replies; 21+ messages in thread
From: Vinicius Costa Gomes @ 2020-10-02 19:01 UTC (permalink / raw)
  To: Erez Geva, linux-kernel, netdev, Cong Wang, David S . Miller,
	Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko, Andrei Vagin,
	Dmitry Safonov, Eric W . Biederman, Ingo Molnar, John Stultz,
	Michal Kubecek, Oleg Nesterov, Peter Zijlstra, Richard Cochran,
	Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vedang Patel, Simon Sudler,
	Andreas Meisinger, Andreas Bucher, Henning Schild, Jan Kiszka,
	Andreas Zirkler, Ermin Sakic, An Ninh Nguyen, Michael Saenger,
	Bernd Maehringer, Gisela Greinert, Erez Geva, Erez Geva

Hi Erez,

Erez Geva <erez.geva.ext@siemens.com> writes:

> Add support for using PTP clock with
>  Traffic control Earliest TxTime First (ETF) Qdisc.
>
> Why do we need ETF to use PTP clock?
> Current ETF requires to synchronization the system clock
>  to the PTP Hardware clock (PHC) we want to send through.
> But there are cases that we can not synchronize the system clock with
>  the desire NIC PHC.
> 1. We use several NICs with several PTP domains that our device
>     is not allowed to be PTP master.
>    And we are not allowed to synchronize these PTP domains.
> 2. We are using another clock source which we need for our system.
>    Yet our device is not allowed to be PTP master.
> Regardless of the exact topology, as the Linux tradition is to allow
>  the user the freedom to choose, we propose a patch that will allow
>  the user to configure the TC-ETF with a PTP clock as well as
>  using the system clock.
> * NOTE: we do encourage the users to synchronize the system clock with
>   a PTP clock.
>  As the ETF watchdog uses the system clock.
>  Synchronizing the system clock with a PTP clock will probably
>   reduce the frequency different of the PHC and the system clock.
>  As sequence, the user might be able to reduce the ETF delta time
>   and the packets latency cross the network.
>
> Follow the decision to derive a dynamic clock ID from the file description
>  of an opened PTP clock device file.
> We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
> We will submit a patch to the "tc" tool from the iproute2 project
>  once this patch is accepted.
>

In addition to what Thomas said, I would like to add some thoughts
(mostly re-wording some of Thomas' comments :-)).

I think that there's an underlying problem/limitation that is the cause
of the issue (or at least a step in the right direction) you are trying
to solve: the issue is that PTP clocks can't be used as hrtimers.

I didn't spend a lot of time thinking about how to solve this (the only
thing that comes to mind is having a timecounter, or similar, "software
view" over the PHC clock).

Anyway, my feeling is that until this is solved, we would only be
working around the problem, and creating even more hard to handle corner
cases.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] TC-ETF support PTP clocks series
  2020-10-02 19:01 ` [PATCH 0/7] TC-ETF support PTP clocks series Vinicius Costa Gomes
@ 2020-10-02 19:56   ` Geva, Erez
  2020-10-03  0:10   ` Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Geva, Erez @ 2020-10-02 19:56 UTC (permalink / raw)
  To: Vinicius Costa Gomes, linux-kernel, netdev, Cong Wang,
	David S . Miller, Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko,
	Andrei Vagin, Dmitry Safonov, Eric W . Biederman, Ingo Molnar,
	John Stultz, Michal Kubecek, Oleg Nesterov, Peter Zijlstra,
	Richard Cochran, Stephen Boyd, Thomas Gleixner, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vedang Patel, Sudler, Simon, Meisinger,
	Andreas, Bucher, Andreas, henning.schild, jan.kiszka, Zirkler,
	Andreas, Sakic, Ermin, anninh.nguyen, Saenger, Michael,
	Maehringer, Bernd, gisela.greinert, Erez Geva



On 02/10/2020 21:01, Vinicius Costa Gomes wrote:
> Hi Erez,
>
> Erez Geva <erez.geva.ext@siemens.com> writes:
>
>> Add support for using PTP clock with
>>   Traffic control Earliest TxTime First (ETF) Qdisc.
>>
>> Why do we need ETF to use PTP clock?
>> Current ETF requires to synchronization the system clock
>>   to the PTP Hardware clock (PHC) we want to send through.
>> But there are cases that we can not synchronize the system clock with
>>   the desire NIC PHC.
>> 1. We use several NICs with several PTP domains that our device
>>      is not allowed to be PTP master.
>>     And we are not allowed to synchronize these PTP domains.
>> 2. We are using another clock source which we need for our system.
>>     Yet our device is not allowed to be PTP master.
>> Regardless of the exact topology, as the Linux tradition is to allow
>>   the user the freedom to choose, we propose a patch that will allow
>>   the user to configure the TC-ETF with a PTP clock as well as
>>   using the system clock.
>> * NOTE: we do encourage the users to synchronize the system clock with
>>    a PTP clock.
>>   As the ETF watchdog uses the system clock.
>>   Synchronizing the system clock with a PTP clock will probably
>>    reduce the frequency different of the PHC and the system clock.
>>   As sequence, the user might be able to reduce the ETF delta time
>>    and the packets latency cross the network.
>>
>> Follow the decision to derive a dynamic clock ID from the file description
>>   of an opened PTP clock device file.
>> We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
>> We will submit a patch to the "tc" tool from the iproute2 project
>>   once this patch is accepted.
>>
>
> In addition to what Thomas said, I would like to add some thoughts
> (mostly re-wording some of Thomas' comments :-)).
>
> I think that there's an underlying problem/limitation that is the cause
> of the issue (or at least a step in the right direction) you are trying
> to solve: the issue is that PTP clocks can't be used as hrtimers.
>
> I didn't spend a lot of time thinking about how to solve this (the only
> thing that comes to mind is having a timecounter, or similar, "software
> view" over the PHC clock).
>
> Anyway, my feeling is that until this is solved, we would only be
> working around the problem, and creating even more hard to handle corner
> cases.
>
>
> Cheers,
>

You are right.

Thanks for the insight.

Erez

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] TC-ETF support PTP clocks series
  2020-10-02 19:01 ` [PATCH 0/7] TC-ETF support PTP clocks series Vinicius Costa Gomes
  2020-10-02 19:56   ` Geva, Erez
@ 2020-10-03  0:10   ` Thomas Gleixner
  2020-10-09 11:17     ` AW: " Meisinger, Andreas
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-03  0:10 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Erez Geva, linux-kernel, netdev, Cong Wang,
	David S . Miller, Jakub Kicinski, Jamal Hadi Salim, Jiri Pirko,
	Andrei Vagin, Dmitry Safonov, Eric W . Biederman, Ingo Molnar,
	John Stultz, Michal Kubecek, Oleg Nesterov, Peter Zijlstra,
	Richard Cochran, Stephen Boyd, Vladis Dronov,
	Sebastian Andrzej Siewior, Frederic Weisbecker, Eric Dumazet
  Cc: Jesus Sanchez-Palencia, Vedang Patel, Simon Sudler,
	Andreas Meisinger, Andreas Bucher, Henning Schild, Jan Kiszka,
	Andreas Zirkler, Ermin Sakic, An Ninh Nguyen, Michael Saenger,
	Bernd Maehringer, Gisela Greinert, Erez Geva, Erez Geva

Vinicius,

On Fri, Oct 02 2020 at 12:01, Vinicius Costa Gomes wrote:
> I think that there's an underlying problem/limitation that is the cause
> of the issue (or at least a step in the right direction) you are trying
> to solve: the issue is that PTP clocks can't be used as hrtimers.

That's only an issue if PTP time != CLOCK_TAI, which is insane to begin
with.

As I know that these insanities exists in real world setups, e.g. grand
clock masters which start at the epoch which causes complete disaster
when any of the slave devices booted earlier. Obviously people came
up with system designs which are even more insane.

> I didn't spend a lot of time thinking about how to solve this (the only
> thing that comes to mind is having a timecounter, or similar, "software
> view" over the PHC clock).

There are two aspects:

 1) What's the overall time coordination especially for applications?

    PTP is for a reason based on TAI which allows a universal
    representation of time. Strict monotonic, no time zones, no leap
    seconds, no bells and whistels.

    Using TAI in distributed systems solved a gazillion of hard problems
    in one go.

    TSN depends on PTP and that obviously makes CLOCK_TAI _the_ clock of
    choice for schedules and whatever is needed. It just solves the
    problem nicely and we spent a great amount of time to make
    application development for TSN reasonable and hardware agnostic.

    Now industry comes along and decides to introducde independent time
    universes. The result is a black hole for programmers because they
    now have to waste effort - again - on solving the incredibly hard
    problems of warping space and time.

    The amount of money saved by not having properly coordinated time
    bases in such systems is definitely marginal compared to the amount
    of non-sensical work required to fix it in software.

 2) How can an OS provide halfways usable interfaces to handle this
    trainwreck?

    Access to the various time universes is already available through
    the dynamic POSIX clocks. But these interfaces have been designed
    for the performance insensitive work of PTP daemons and not for the
    performance critical work of applications dealing with real-time
    requirements of all sorts.

    As these raw PTP clocks are hardware dependend and only known at
    boot / device discovery time they cannot be exposed to the kernel
    internaly in any sane way. Also the user space interface has to be
    dynamic which rules out the ability to assign fixed CLOCK_* ids.

    As a consequence these clocks cannot provide timers like the regular
    CLOCK_* variants do, which makes it insanely hard to develop sane
    and portable applications.

    What comes to my mind (without spending much thought on it) is:

       1) Utilize and extend the existing PTP mechanisms to calculate
          the time relationship between the system wide CLOCK_TAI and
          the uncoordinated time universe. As offset is a constant and
          frequency drift is not a high speed problem this can be done
          with a userspace daemon of some sorts.

        2) Provide CLOCK_TAI_PRIVATE which defaults to CLOCK_TAI,
           i.e. offset = 0 and frequency ratio = 1 : 1

        3) (Ab)use the existing time namespace to provide a mechanism to
           adjust the offset and frequency ratio of CLOCK_TAI_PRIVATE
           which is calculated by #1

           This is the really tricky part and comes with severe
           limitations:

             - We can't walk task list to find tasks which have their
               CLOCK_TAI_PRIVATE associated with a particular
               incarnation of PCH/PTP universe, so some sane referencing
               of the underlying parameters to convert TAI to
               TAI_PRIVATE and vice versa has to be found. Life time
               problems are going to be interesting to deal with.

             - An application cannot coordinate multiple PCH/PTP domains
               and has to restrict itself to pick ONE disjunct time
               universe.

               Whether that's a reasonable limitation I don't know
               simply because the information provided in this patch
               series is close to zero.

             - Preventing early timer expiration caused by frequency
               drift is not trivial either.

      TBH, just thinking about all of that makes me shudder and my knee
      jerk reaction is: NO WAY!

Why the heck can't hardware people and system designers finally
understand that time is not something they can define at their
own peril?

The "Let's solve it in software so I don't have to think about it"
design approach strikes again. This caused headaches for the past five
decades, but people obviously never learn.

That said, I'm open for solutions which are at least in the proximity of
sane, but that needs a lot more information about the use cases and the
implications and not just some handwavy 'we screwed up our system design
and therefore we need to inflict insanity on everyone' blurb.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* AW: [PATCH 0/7] TC-ETF support PTP clocks series
  2020-10-03  0:10   ` Thomas Gleixner
@ 2020-10-09 11:17     ` Meisinger, Andreas
  2020-10-09 15:39       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Meisinger, Andreas @ 2020-10-09 11:17 UTC (permalink / raw)
  To: tglx, vinicius.gomes, Geva, Erez, linux-kernel, netdev,
	xiyou.wangcong, davem, kuba, jhs, jiri, avagin, 0x7f454c46,
	ebiederm, mingo, john.stultz, mkubecek, oleg, peterz,
	richardcochran, sboyd, vdronov, bigeasy, frederic, edumazet
  Cc: jesus.sanchez-palencia, vedang.patel, Sudler, Simon, Bucher,
	Andreas, henning.schild, jan.kiszka, Zirkler, Andreas, Sakic,
	Ermin, anninh.nguyen, Saenger, Michael, Maehringer, Bernd,
	gisela.greinert, Geva, Erez, ErezGeva2, guenter.steindl

Hello Mr Gleixner,
thanks for your feedback we'll fix the issues not related to the time scale topic as soon as possible.

Regarding your concerns about not using TAI timescale, we do admit that in many situations TAI makes a lot of things way more easy and therefore is the way to go.

Yet we do already have usecases where this can't be done. Additionally a lot of discussions at this topic are ongoing in 60802 profile creation too.
Some of our usecases do require a network which does not depend on any external timesource. This might be due to the network not being connected (to the internet) or just because the network may not be able to rely on or trust an external timesource. Some reasons for this might be safety, security, availability or legal implications ( e.g. if a machine builder has to guarantee operation of a machine which depends on an internal tsn network).

About your question if an application needs to be able to sync to multiple timescales. A small count of usecases even would require multiple independent timesources to be used. At the moment they all seem to be located in the area of extreme high availability. There's ongoing evaluation about this issues and we're not sure if there's a way to do this without special hardware so we didn't address it here.

Additionally to these special cases at least "reading" different timesources should be possible in all cases, e.g. to be able to log based on TAI while network operation relies on it's own clock. Of course TAI timescale wouldn't the same level of trust in this scenario.

Best regards
Andreas Meisinger

Siemens AG
Digital Industries
Process Automation
DI PA DCP TI
Gleiwitzer Str. 555
90475 Nürnberg, Deutschland
Tel.: +49 911 95822720
mailto:andreas.meisinger@siemens.com

www.siemens.com/ingenuityforlife

-----Ursprüngliche Nachricht-----
Von: Thomas Gleixner <tglx@linutronix.de>
Gesendet: Samstag, 3. Oktober 2020 02:10
An: Vinicius Costa Gomes <vinicius.gomes@intel.com>; Geva, Erez (ext) (DI PA CI R&D 3) <erez.geva.ext@siemens.com>; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Cong Wang <xiyou.wangcong@gmail.com>; David S . Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Jamal Hadi Salim <jhs@mojatatu.com>; Jiri Pirko <jiri@resnulli.us>; Andrei Vagin <avagin@gmail.com>; Dmitry Safonov <0x7f454c46@gmail.com>; Eric W . Biederman <ebiederm@xmission.com>; Ingo Molnar <mingo@kernel.org>; John Stultz <john.stultz@linaro.org>; Michal Kubecek <mkubecek@suse.cz>; Oleg Nesterov <oleg@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Richard Cochran <richardcochran@gmail.com>; Stephen Boyd <sboyd@kernel.org>; Vladis Dronov <vdronov@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Frederic Weisbecker <frederic@kernel.org>; Eric Dumazet <edumazet@google.com>
Cc: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>; Vedang Patel <vedang.patel@intel.com>; Sudler, Simon (DI PA DCP TI) <simon.sudler@siemens.com>; Meisinger, Andreas (DI PA CI R&D 3) <andreas.meisinger@siemens.com>; Bucher, Andreas (DI PA DCP R&D 3) <andreas.bucher@siemens.com>; Schild, Henning (T RDA IOT SES-DE) <henning.schild@siemens.com>; Kiszka, Jan (T RDA IOT SES-DE) <jan.kiszka@siemens.com>; Zirkler, Andreas (T RDA IOT INN-DE) <andreas.zirkler@siemens.com>; Sakic, Ermin (T RDA IOT INN-DE) <ermin.sakic@siemens.com>; Nguyen, An Ninh (DI FA TIP AAT 2) <anninh.nguyen@siemens.com>; Saenger, Michael (DI PA CI R&D 4) <michael.saenger@siemens.com>; Maehringer, Bernd (DI PA CI R&D 4) <bernd.maehringer@siemens.com>; Greinert, Gisela (DI PA CI R&D 4) <gisela.greinert@siemens.com>; Geva, Erez (ext) (DI PA CI R&D 3) <erez.geva.ext@siemens.com>; Erez Geva <ErezGeva2@gmail.com>
Betreff: Re: [PATCH 0/7] TC-ETF support PTP clocks series

Vinicius,

On Fri, Oct 02 2020 at 12:01, Vinicius Costa Gomes wrote:
> I think that there's an underlying problem/limitation that is the
> cause of the issue (or at least a step in the right direction) you are
> trying to solve: the issue is that PTP clocks can't be used as hrtimers.

That's only an issue if PTP time != CLOCK_TAI, which is insane to begin with.

As I know that these insanities exists in real world setups, e.g. grand clock masters which start at the epoch which causes complete disaster when any of the slave devices booted earlier. Obviously people came up with system designs which are even more insane.

> I didn't spend a lot of time thinking about how to solve this (the
> only thing that comes to mind is having a timecounter, or similar,
> "software view" over the PHC clock).

There are two aspects:

 1) What's the overall time coordination especially for applications?

    PTP is for a reason based on TAI which allows a universal
    representation of time. Strict monotonic, no time zones, no leap
    seconds, no bells and whistels.

    Using TAI in distributed systems solved a gazillion of hard problems
    in one go.

    TSN depends on PTP and that obviously makes CLOCK_TAI _the_ clock of
    choice for schedules and whatever is needed. It just solves the
    problem nicely and we spent a great amount of time to make
    application development for TSN reasonable and hardware agnostic.

    Now industry comes along and decides to introducde independent time
    universes. The result is a black hole for programmers because they
    now have to waste effort - again - on solving the incredibly hard
    problems of warping space and time.

    The amount of money saved by not having properly coordinated time
    bases in such systems is definitely marginal compared to the amount
    of non-sensical work required to fix it in software.

 2) How can an OS provide halfways usable interfaces to handle this
    trainwreck?

    Access to the various time universes is already available through
    the dynamic POSIX clocks. But these interfaces have been designed
    for the performance insensitive work of PTP daemons and not for the
    performance critical work of applications dealing with real-time
    requirements of all sorts.

    As these raw PTP clocks are hardware dependend and only known at
    boot / device discovery time they cannot be exposed to the kernel
    internaly in any sane way. Also the user space interface has to be
    dynamic which rules out the ability to assign fixed CLOCK_* ids.

    As a consequence these clocks cannot provide timers like the regular
    CLOCK_* variants do, which makes it insanely hard to develop sane
    and portable applications.

    What comes to my mind (without spending much thought on it) is:

       1) Utilize and extend the existing PTP mechanisms to calculate
          the time relationship between the system wide CLOCK_TAI and
          the uncoordinated time universe. As offset is a constant and
          frequency drift is not a high speed problem this can be done
          with a userspace daemon of some sorts.

        2) Provide CLOCK_TAI_PRIVATE which defaults to CLOCK_TAI,
           i.e. offset = 0 and frequency ratio = 1 : 1

        3) (Ab)use the existing time namespace to provide a mechanism to
           adjust the offset and frequency ratio of CLOCK_TAI_PRIVATE
           which is calculated by #1

           This is the really tricky part and comes with severe
           limitations:

             - We can't walk task list to find tasks which have their
               CLOCK_TAI_PRIVATE associated with a particular
               incarnation of PCH/PTP universe, so some sane referencing
               of the underlying parameters to convert TAI to
               TAI_PRIVATE and vice versa has to be found. Life time
               problems are going to be interesting to deal with.

             - An application cannot coordinate multiple PCH/PTP domains
               and has to restrict itself to pick ONE disjunct time
               universe.

               Whether that's a reasonable limitation I don't know
               simply because the information provided in this patch
               series is close to zero.

             - Preventing early timer expiration caused by frequency
               drift is not trivial either.

      TBH, just thinking about all of that makes me shudder and my knee
      jerk reaction is: NO WAY!

Why the heck can't hardware people and system designers finally understand that time is not something they can define at their own peril?

The "Let's solve it in software so I don't have to think about it"
design approach strikes again. This caused headaches for the past five decades, but people obviously never learn.

That said, I'm open for solutions which are at least in the proximity of sane, but that needs a lot more information about the use cases and the implications and not just some handwavy 'we screwed up our system design and therefore we need to inflict insanity on everyone' blurb.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: AW: [PATCH 0/7] TC-ETF support PTP clocks series
  2020-10-09 11:17     ` AW: " Meisinger, Andreas
@ 2020-10-09 15:39       ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-10-09 15:39 UTC (permalink / raw)
  To: Meisinger, Andreas, vinicius.gomes, Geva, Erez, linux-kernel,
	netdev, xiyou.wangcong, davem, kuba, jhs, jiri, avagin,
	0x7f454c46, ebiederm, mingo, john.stultz, mkubecek, oleg, peterz,
	richardcochran, sboyd, vdronov, bigeasy, frederic, edumazet
  Cc: jesus.sanchez-palencia, vedang.patel, Sudler, Simon, Bucher,
	Andreas, henning.schild, jan.kiszka, Zirkler, Andreas, Sakic,
	Ermin, anninh.nguyen, Saenger, Michael, Maehringer, Bernd,
	gisela.greinert, Geva, Erez, ErezGeva2, guenter.steindl

Andreas,

On Fri, Oct 09 2020 at 11:17, Andreas Meisinger wrote:

please do not top-post and trim your replies.

> Yet we do already have usecases where this can't be done. Additionally
> a lot of discussions at this topic are ongoing in 60802 profile
> creation too.  Some of our usecases do require a network which does
> not depend on any external timesource. This might be due to the
> network not being connected (to the internet) or just because the
> network may not be able to rely on or trust an external
> timesource. Some reasons for this might be safety, security,
> availability or legal implications ( e.g. if a machine builder has to
> guarantee operation of a machine which depends on an internal tsn
> network).

I'm aware of the reasons for these kind of setups.

> About your question if an application needs to be able to sync to
> multiple timescales. A small count of usecases even would require
> multiple independent timesources to be used. At the moment they all
> seem to be located in the area of extreme high availability. There's
> ongoing evaluation about this issues and we're not sure if there's a
> way to do this without special hardware so we didn't address it here.

Reading several raw PTP clocks is always possible through the existing
interfaces and if the coordidation between real TAI and the raw PTP
clocks is available, then these interfaces could be extended to provide
time normalized to real TAI.

But that does not allow to utilize the magic clocks for arming timers so
these have to be based on some other clock and the application needs to do
the conversion back and forth.

Now I said that we could abuse time name spaces for providing access to
_one_ magic TAI clock which lets the kernel do that work, but thinking
more about it, it should be possible to do so for all of them even
without name spaces.

The user space daemon which does the correlation between these PTP
domains and TAI is required in any case, so the magic clock TAI_PRIVATE
is not having any advantage.

If that correlation exists then at least clock_nanosleep() should be
doable. So clock_nanosleep(clock PTP/$N) would convert the sleep time to
TAI and queue a timer internally on the CLOCK_TAI base.

Depending on the frequency drift between CLOCK_TAI and clock PTP/$N the
timer expiry might be slightly inaccurate, but surely not more
inaccurate than if that conversion is done purely in user space.

The self rearming posix timers would work too, but the self rearming is
based on CLOCK_TAI, so rounding errors and drift would be accumulative.
So I'd rather stay away from them.

If there is no deamon which manages the correlation then the syscall
would fail.

If such a coordination exists, then the whole problem in the TSN stack
is gone. The core can always operate on TAI and the network device which
runs in a different time universe would use the same conversion data
e.g. to queue a packet for HW based time triggered transmission. Again
subject to slight inaccuracy, but it does not come with all the problems
of dynamic clocks, locking issues etc. As the frequency drift between
PTP domains is neither fast changing nor randomly jumping around the
inaccuracy might even be a mostly academic problem.

Thoughts?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-10-09 15:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 20:51 [PATCH 0/7] TC-ETF support PTP clocks series Erez Geva
2020-10-01 20:51 ` [PATCH 1/7] POSIX clock ID check function Erez Geva
2020-10-01 21:56   ` Thomas Gleixner
2020-10-01 20:51 ` [PATCH 2/7] Function to retrieve main clock state Erez Geva
2020-10-01 22:05   ` Thomas Gleixner
2020-10-02  0:19     ` Thomas Gleixner
2020-10-01 20:51 ` [PATCH 3/7] Functions to fetch POSIX dynamic clock object Erez Geva
2020-10-01 23:35   ` Thomas Gleixner
2020-10-01 20:51 ` [PATCH 4/7] Fix qdisc_watchdog_schedule_range_ns range check Erez Geva
2020-10-01 22:44   ` Thomas Gleixner
2020-10-01 20:51 ` [PATCH 5/7] Traffic control using high-resolution timer issue Erez Geva
2020-10-01 23:07   ` Thomas Gleixner
2020-10-01 20:51 ` [PATCH 6/7] TC-ETF code improvements Erez Geva
2020-10-01 20:51 ` [PATCH 7/7] TC-ETF support PTP clocks Erez Geva
2020-10-02  0:33   ` Thomas Gleixner
2020-10-02 11:05     ` Geva, Erez
2020-10-02 19:01 ` [PATCH 0/7] TC-ETF support PTP clocks series Vinicius Costa Gomes
2020-10-02 19:56   ` Geva, Erez
2020-10-03  0:10   ` Thomas Gleixner
2020-10-09 11:17     ` AW: " Meisinger, Andreas
2020-10-09 15:39       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).